370 likes | 484 Vues
1.3 Trends in Data. Due now: p. 20–24 #1, 4, 9, 11, 14 Learning goal : Describe the trend and correlation in a scatter plot and construct a median-median line MSIP / Home Learning: p . 37 #2, 3, 6, 8. Variables. Variable (Mathematics)
E N D
1.3 Trends in Data Due now: p. 20–24 #1, 4, 9, 11, 14 Learning goal: Describe the trend and correlation in a scatter plot and construct a median-median line MSIP / Home Learning: p. 37 #2, 3, 6, 8
Variables • Variable (Mathematics) • a symbol denoting a quantity or symbolic representation • an unknown quantity • Variable (Statistics) • A measurable attribute; these typically vary over time or between individuals • E.g. Height, Weight, Age, Favourite Hockey Team • Can be Discrete, Continuous or neither • Continuous: Weight (digital scale) • Discrete: Number of siblings • Neither: Hair colour
The Two Types of Variables • Independent Variable • horizontal axis • Time is independent (why?) • Timing is dependent • e.g., time to run a race vs. length of race • Dependent Variable • values depend on the independent variable • vertical axis • Format: “dependent vs. independent” • e.g., a graph of arm span vs. height means arm span is the dependent variable and height is the independent
Scatter Plots • a graph that shows two numeric variables • each axis represents a variable • each point indicates a pair of values (x, y) • may show a trend
What is a trend? • the ‘direction’ of the data • a pattern of average behavior that occurs over time • e.g., costs tend to increase over time (inflation) • need two variables to exhibit a trend
An Example of a trend • U.S. population from 1780 to 1960 • Describe the trend
Correlations • Strength is… • None – no clear pattern in the data • Weak – data loosely follows a pattern • Strong – data follows a clear pattern • For strong/weak, direction is… • Positive - data rises from left to right (overall) • As x increases, y increases • Negative: data drops from left to right (overall) • As x increases, y decreases • http://www.seeingstatistics.com/seeing1999/gallery/CorrelationPicture.html Strong, positive linear correlation
Line of Best Fit • A straight line that represents the trend in the data • Can be used tomake predictions (graph or equation) • Can be drawn or calculated • Fathom has 3: movable, median-median, least squares • Gives no measurement of the strength of the trend (that’s tomorrow!)
An example of the line of best fit • this is temperature recycling data with a median-median line added • what type of trend are we looking at?
Creating a Median-Median Line • Divide the points into 3 symmetric groups • If there is 1 extra point, include it in the middle group • If there are 2 extra points, include one in each end group • Calculate the median x- and y-coordinates for each group and plot the 3 median points (x, y) • If the median points are in a straight line, connect them • Otherwise, line up the two outer points, move 1/3 of the way to the other point and draw a line of best fit
Lines of Best Fit – why 3? • Drawing a line of best fit is arbitrary • Hit as many points as possible • Have the same number of points above and below the line • Outliers tend to be ignored • The median-median line is easy to construct and takes the spread of the data into consideration • The least-squares line takes every point into consideration but is based on a complicated formula
AGENDA for Wed-Thu • 1.3 Median-Median Line • Using a regression equation • Fathom Activity - Predict your weight as an NHL player • 1.4 Trends With Technology • Correlation coefficient • Coefficient of Determination • Residuals • Least-Squares Line • Fathom Investigation: finding the Least Squares Line
Scatter Plots - Summary • A graph that compares two numeric variables • One is dependent on the other • May show a trend / correlation • positive/negative and strong/weak • A line may be a good model • Median-Median and Least-Squares • If not, non-linear (can be quadratic, exponential, logarithmic, etc.)
Using a regression equation • For a line of best fit, the equation will be in the form y = mx + b • e.g., W = 7.25 H – 332 • Mr. Lieff is 71.5 in tall. His weight would be: • W = 7.25(71.5) – 332 = 186
Fathom Activity – Predict your weight as an NHL player! • Click http://www.nhl.com/ice/playerstats.htm • Under TEAM: Pick your favourite • You can also change Position, Country, Status • Under REPORT: BIOS • Click GO> • Copy the URL • Run FathomFileImportImport From URLPaste • Create a scatter plot of Weight vs. Height • Add a median-median line • Use the equation to: • predict your weight based on your height • Is it accurate? Discuss with a neighbour. • MSIP / Home Learning: p. 51 #1-6, 7 bcd, 8
1.4 Trends in Data Using Technology Learning goal: Describe and measure the strength of trends Due now:p. 37 #2, 3, (6-7 or 8) MSIP / Home Learning: p. 51 #1-6, 7 bcd, 8 use Fathom and Excel
Regression • The process of fitting a line or curve to a set of data • A line is linear regression (Excel or Fathom) • A curve can be quadratic, cubic, exponential, logarithmic, etc. (Excel) • We do this to generate a mathematical model (equation) • We can use the equation to make predictions • Interpolation – within the span of the data • Extrapolation – outside of the span of the data
Example • armspan = 0.87 height + 22 • y = 0.87 x + 22 • What is the arm span of a student who is 175 cm tall? • y = 0.87(175) + 22 • = 174.25 cm • How tall is a student with a 160 cm arm span? • y = 0.87x + 22 • 160 = 0.87x + 22 • 160 – 22 = 0.87x • 138 = 0.87x • x = 138 ÷ 0.87 • = 158.6 cm
Correlation Coefficient • The correlation coefficient, r, is an indicator of the strength and direction of a linear relationship • r = 0 no relationship • r = 1 perfect positive correlation • r = -1 perfect negative correlation • r2 is the coefficient of determination • Takes on values from 0 to 1 • r2 is the percent of the change in the y-variable that is due to the change in x • if r2 = 0.85, that means that 85% of the variation in y is due to x
Residuals • a residual is the vertical distance between a point and the line of best fit • if the model you are considering is a good fit, the residuals should be small and have no noticeable pattern • The least-squares line minimizes the sum of the squares of the residuals http://www.math.csusb.edu/faculty/stanton/m262/regress/
Least Squares LineWeight vs. Height (NHL) • w = 7.23h – 325
Using the equation • How much does a player who is 71 in tall weigh? • w = 7.23(71) – 325 • = 188.33 lbs • How tall is a player who weighs 180 lbs? • w = 7.23h – 325 h = (w + 325) ÷ 7.23 • So h = (180 + 325) ÷ 7.23 • = 69.85” or 177.4cm
1.5 Comparing Apples to Oranges • http://www.smarter.org/research/apples-to-oranges/
The Power of Data Chapter 1.5 – The Media Mathematics of Data Management (Nelson) MDM 4U There are 3 kinds of lies: lies, damn lies and statistics.
Example 1 – Changing the scale on the axis • Why is the following graph misleading?
Example 1 – Scale from 0 • Consider that this is a bar graph – could it still be misleading?
Example 2 – Using a Small Sample • For the following surveys, consider: • The sample size • If there is any (mis)leading language
Example 2 – Using a Small Sample • “4 out of 5 dentists recommend Trident sugarless gum to their patients who chew gum.” • “In the past, we found errors in 4 out of 5 of the returns people brought infor a Second Lookreview.” (H&R Block) • “Did you know that 1 in 4 women can misread a traditional pregnancy test result?” (Clearblue Easy Digital Pregnancy Test) • “Using Pedigree® DentaStix® daily can reduce the build up of tartar by up to 80%.” • “Did you know that the average Canadian wastes $500 of food in a year?” (Zip-Lock Freezer bags)
Details on the Trident Survey • How many dentists did they ask? • Actual number: 1200 • 4 out of 5 is convincing but reasonable • 5 out of 5 is preposterous • 3 out of 5 is good but not great • Actual statistic 85% • Recommend Trident over what? • There were 2 other options: • Chewing sugared gum • Not chewing gum
Misleading Statements(?) • How could these statements be misleading? • “More people stay with Bell Mobility than any other provider.” • “Every minute of every hour of every business day, someone comes back to Bell.”
“More people stay with Bell Mobility than any other provider.” • Does not specify how many more customers stay with Bell. • e.g. Percentage of customers renewing their plan: Bell: 30% Rogers: 29% Telus: 25% Fido: 28% • Did they compare percentages or totals? • What does it mean to “stay with Bell”? Honour entire contract? Renew contract at the end of a term? • Are early terminations factored in? If so, does Bell have a higher cost for early terminations? • Competitors’ renewal rates may have decreased due to family plans / bundling • Does the data include Private / Corporate plans?
“Every minute of every hour of every business day, someone comes back to Bell.” • 60 mins x 7 hours x 5 days = 2 100/wk • What does it mean to “Come back to Bell”? • How many hours in a business day?
How does the media use (misuse) data? • To inform the public about world events in an objective manner • It sometimes gives misleading or false impressions to sway the public or to increase ratings • It is important to: • Study statistics to understand how information is represented or misrepresented • Correctly interpret tables/charts presented by the media
MSIP / Homework • Read pp. 57 – 60 Ex. 1-2 • Complete p. 60 #1-6 • Final Project Example – Manipulating Data (on wiki) • Examples • http://junkcharts.typepad.com/ • http://www.coolschool.ca/lor/AMA11/unit1/U01L02.htm • http://mediamatters.org/research/200503220005