Shows a relationship between two variables. Scatterplots: Response Variables: Variable on the y-axis. Response to a variable Explanatory Variables: Variable on the x-axis. Influences the response
(DFS!) Describe For Scatter! Looking at Scatterplots: Positive • Direction: as x increases, y increases Negative as x increases, y decreases Is there a linear relationship between the two variables? • Form: Do the points follow a single stream that is tight to the line or is there considerable spread (or variability) around the line? • Strength:
The example in the text shows a negative association between central pressure and maximum wind speed • As the central pressure increases, the maximum wind speed decreases.
Calculator Tip: Diagnostics On! Catalog – Alpha “D” – Diagnostics On - Enter
Calculator Tip: Scatterplots L1: Explanatory Variable L2: Response Variable Use statplot to graph
Scientists are interested in seeing if global temperature has been increasing. They measured the average global temperature per year (in Celsius). What graph should they make?
Example #1: Suppose you were to collect data for each pair of variables below. Which variable is the explanatory and which is the response? Determine the likely direction and strength of the relationship. • T-shirts at a store: Price of each, Number Sold response explanatory y 100 D: negative # sold S: strong 1 x $5 $50 Price of shirt
Example #1: Suppose you were to collect data for each pair of variables below. Which variable is the explanatory and which is the response? Determine the likely direction and strength of the relationship. 2. Drivers: Reaction Time, Blood Alcohol Level response explanatory y 10 D: positive Time S: strong 1 x .01 .5 BAC
Example #1: Suppose you were to collect data for each pair of variables below. Which variable is the explanatory and which is the response? Determine the likely direction and strength of the relationship. 3. Cars: Age of Owner, Weight of the Car Makes no sense!!!
Example #2: “I have never found a quantifiable predictor in 25 years of grading that was anywhere as strong as this one. If you just graded them based on length without ever reading them, you’d be right over 90 percent of the time.” The table below shows the data set that Dr. Perlman used to draw his conclusions. Carry out your own analysis of the data. Then write a few sentences in response to each of Dr. Perlman’s conclusions. Essay score and length for a sample of SAT essays
positive F: Linear, one unusual point S: strong D:
Example #3: Regraph #2 with score as the dependent variable now. Do you see any differences in the graph? **You may want to store these lists for tomorrow…
Correlation: “r” Measures the direction and strength of the linear relationship (DF only) Must be quantitative
Attributes of the Correlation • The correlation coefficient is a unit-less measurement, denoted with the letter r, and has values between -1 and 1. 2. When r = 1 all the data points form a perfect straight line relationship with a positive slope. 3. When r = -1 all the data points form a perfect straight line relationship with a negative slope.
Attributes of the Correlation 4. Correlation treats x and y symmetrically: • The correlation of x with y is the same as the correlation of y with x. 5. Correlation is not affected by changes in the center or scale of either variable. • Correlation depends only on the z-scores, and they are unaffected by changes in center or scale.
Attributes of the Correlation 6. Values of r close to 0 means that the linear relationship is weak. There is a general linear trend, but there is a lot of variability around that trend. 7. When r = 0 there is no relationship between the two variables. In other words, the best fitting line has a slope of zero.
Attributes of the Correlation 8. Outliers have a large influence on the correlation coefficient. The correlation is NOT resistant to outliers. 9. Correlation does not describe curved relationships! (ONLY LINEAR)
Guidelines: How strong is the linear relationship? 0 < r < 0.3 = weak positive -0.3 < r < 0 = weak negative 0.4 < r < 0.7 = moderate positive -0.4 < r < -0.7 = moderate negative 0.8 < r < 1 = strong positive -0.8 < r < -1 = strong negative
Data collected from students in Statistics classes included their heights (in inches) and weights (in pounds):
If we had to put a number on the strength, we would not want it to depend on the units we used. • A scatterplot of heights (in centimeters) and weights (in kilograms) doesn’t change the shape of the pattern:
Example #4 Types of Correlation: r = 0 r = -0.7 r = 0.5 r = -0.99 r = -0.3 r = 0.9 r = 0 r = -0.3 r = 0.5 r = -0.7 r = 0.9 r = -0.99
Don’t assume the relationship is linear just because the correlation coefficient is high. • Here the correlation is 0.979, but the relationship is actually bent.
Example #5: What is wrong with the following statements? • There is a strong correlation between the gender of American workers and their income. Gender is categorical
Example #5: What is wrong with the following statements? b. We found a high correlation (r = 1.09) between students’ rating of faculty teaching and ratings made by other faculty members. r can’t be bigger than 1
Example #5: What is wrong with the following statements? c. We found a very weak correlation (r = -0.95) which suggests little relationship between income and hours spent at casinos. r = -0.95 is a strong negative relationship
Example #5: What is wrong with the following statements? d. We found a very weak correlation (r = 0.01) which suggests little relationship between age and death rate. Should be a very strong relationship!
HOW TO CALCULATE THE CORRELATION COEFFICIENT Remember how to calculate the z-score? We used this calculation to determine how many standard deviations our observations was from the mean. RECALL:
In this case, we were only concerned with one variable. Now, we are considering two variables and each must be standardized.
Example #4: Step #1: Find the following summary statistics: n = ___ SPEED: Sx = _____ MPG Sy = _____ 3 30 10 10 35
Calculator Tip: Correlation L1: Explanatory Variable L2: Response Variable Stat-calc-LinReg(a+bx), L1, L2 (make sure your diagnostic is on!!!)
Example #7: Use your calculator to find the correlation to #2. Comment on what it means. D: positive r = 0.888 S: strong
straight line that describes the linear relationship between an explanatory variable and a response variable. Regression line:
LEAST SQUARES REGRESSION LINE: • This is the best-fitting line to the data. • The goal is to minimize the (vertical) distances • of your observations (data) from your line. • Again, we must square the distances (like the • calculation of the variance) because some data • points will be larger than the mean (positive) • and some are smaller than the mean (negative) • and they will cancel each other out. So to • compensate, they are squared.
We can use this line to predict a response, y, from a given explanatory variable, x.
Remember graphing?? Slope-Intercept formula for a line: y = mx + b where m = ____________ and b = ____________ slope y-intercept In statistics, we write it Do you remember the SLOPE?
Facts about Least Squares Regression: 1. The distinction between explanatory and response variables is essential (which variable is used to predict which?). 2. It always passes through the point (x, y). 3. Correlation ‘r’ describes the direction and strength of the straight line, but doesn’t tell us anymore about the slope than if it is positive or negative, or zero.
Extrapolation: Predicting outside the range of the x values