STA291

STA291 Statistical Methods Lecture 11

LINEar Association • r measures “closeness” of data to the “best” line. What line is that? And best in what terms of what? • In terms of least squared error:

“Best” line: least-squares, or regression line Observed point: (xi, yi) Predicted value for givenxi: (interpretation in a minute) “Best” line minimizes , the sum of the squared errors.

Interpretation of the b0, b1 b0Intercept: predicted value of y when x = 0. b1Slope: predicted change in y when x increases by 1.

Calculation of the b0, b1 where and

Least Squares, or Regression Line, Example • b1= • b0= Interpretation? STA291 study time example: (Hours studied, Score on First Exam) • Data: (1,45), (5, 80), (12, 100) • In summary:

Properties of the Least Squares Line • b1, slope, always has the same sign as r, the correlation coefficient—but they measure different things! • The sum of the errors (or residuals), , is always 0 (zero). • The line always passes through the point .

About those residuals • When we use our prediction equation to “check” values we actually observed in our data set, we can find their residuals: the difference between the predicted value and the observed value • For our STA291 study data earlier, one observation was (5, 80). Our prediction equation was: • When we plug in x = 5, we get a predicted y of 70.24—our residual, then, is

Residuals • Earlier, pointed out the sum of the residuals is always 0 (zero) • Residuals are positive when the observed y is above the regression line; negative when it is below • The smaller (in absolute value) the individual residual, the closer the predicted y was to the actual y.

R-squared??? • Gives the proportion of the variation of the y’s accounted for in the linear relationship with the x’s • So, this means?

Why “regression”? • Sir Francis Galton (1880s): correlation between x=father’s height and y=son’s height is about 0.5 • Interpretation: If a father has height one standard deviation below average, then the predicted height of the son is 0.5 standard deviations below average • More Interpretation: If a father has height two standard deviations above average, then the predicted height of the son is 0.5 x 2 = 1 standard deviation above average • Tall parents tend to have tall children, but not so tall • This is called “regression toward the mean” statistical term “regression”

Looking back • Best-fit, or least-squares, or regression line • Interpretation of the slope, intercept • Residuals • R-squared • “Regression toward the mean”

STA291

STA291

Presentation Transcript

STA291

STA291

STA291

STA291

STA291

STA291

STA291

STA291

STA291

STA291

STA291

STA291

STA291

STA291

STA291

STA291

STA291

STA291

STA291

STA291 Fall 2009

STA291 Fall 2009

STA291 Fall 2009