1 / 50

Residuals

Residuals. A continuation of regression analysis. Lesson Objectives. Continue to build on regression analysis . Learn how residual plots help identify problems with the analysis. Case X Y 1 73 175 2 68 158 3 67 140 4 72 207 5 62 115. ^.

Télécharger la présentation

Residuals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Residuals A continuation ofregression analysis

  2. Lesson Objectives • Continue to build on regression analysis. • Learn how residual plotshelp identify problems with the analysis.

  3. CaseXY 1 73 175 2 68 158 3 67 140 4 72 207 5 62 115 ^ Wt = – 332.73 + 7.189 Ht Example 1: Sample of n = 5 students,Y = Weight in pounds,X = Height in inches. continued … Prediction equation: To be foundlater. r-square = ? Std. error = ?

  4. ^ Y = – 332.7 + 7.189X Example 1, continued 220 · 200 · 180 · 160 WEIGHT Residuals = distance from point to line, measuredparallel to Y- axis. · 140 · 120 100 60 64 68 72 76 HEIGHT

  5. Calculation: For each case, residual = observed value estimated mean ^ ei = yi - yi For the ith case,

  6. Compute the fitted value and residual for the 4th person in the sample; i.e., X = 72 inches, Y = 207 lbs. ^ y = fitted value = 4 ^ y4 - y4 Example 1, continued -332.73 + 7.189() = _________ residual = e4 = = = __________

  7. ResidualPlots Scatterplot of residuals vs. the predicted means of Y, Y; or an X-variable. ^

  8. ^ Y = – 332.7 + 7.189X Example 1, continued e4 = +22.12. 220 · 200 · 180 · 160 WEIGHT Residuals = distance from point to line, measuredparallel to Y- axis. · 140 · 120 100 60 64 68 72 76 HEIGHT

  9. Example 1, continued · 24 e4 is theresidual for the 4th case,= +22.12. Residual Plot 16 8 · · 0 Residuals · -8 Regression line from previous plot is rotated to horizontal. · -16 -24 60 64 68 72 76 HEIGHT

  10. Residual Plot Scatterplot of residuals versus the predicted means of Y, Y; or an X-variable, or Time. ^ Expect random dispersion around a horizontal line at zero. Problems occur if: • Unusual patterns • Unusual cases

  11. Residuals versus X l l l l l l l l l l l Residuals l 0 l l l l l l l l l Good random pattern X, or time

  12. Residuals versus X l l l l l l l l l l l l l Residuals l l l 0 l l l l l l l Next step: ________ to determineif a recording error has occurred. l Outliers? X, or time

  13. Residuals versus X Next step: Add a “quadratic term,”or use “______.” l l l l l l l l l l l l Residuals l l 0 l l l l l l l l l l l l l l Nonlinear relationship X, or time

  14. Residuals versus X l Next step: Stabilize variance by using “________.” l l l l l l l l l l l l l l l l Residuals l 0 l l l l l l l l l l l l l l l l l l l l Variance is increasing X, or time

  15. Unusual patterns: qPossible curvature in the data. qVariances that are not constant as X changes. Unusual cases: qOutliers q High leverage cases q Influential cases Residual Plots help identify

  16. Three properties of Residuals illustrated with somecomputations.

  17. 73 175 68 158 67 140 72 207 62 115 ^ ^ e = Y – Y Y .01 Property 1. Y = Weight X = Height ^ Y = – 332.73 + 7.189 X Residuals XY –17.07 192.07 Find the sum of the residuals. 156.12 1.88 . . .  round-off error

  18. 1. Residuals always sum to zero. Properties of Least Squares Line Sei = 0.

  19. 73 175 68 158 67 140 72 207 62 115 ^ ^ e = Y – Y Y 867.98 .01 Property 2. Y = Weight X = Height ^ Y = – 332.73 + 7.189 X e2 XY 192.07 156.12 148.93184.88112.99 –17.07 1.88 –8.93 22.12 2.01 291.38 3.53 79.74489.29 4.04 Find the sum of squaresof the residuals.

  20. 1.Residuals always sum to zero. “SSE for any other line”. Sei2= SSE = 867.98 < Properties of Least Squares Line 2. This “least squares” line produces a smaller “Sum of squared residuals” than any other straight line can.

  21. X = 68.4, Y = 159 Y Property 3. 220 · 200 · 180 · 160 WEIGHT · 140 · 120 100 60 64 68 72 76 X HEIGHT

  22. 1. Residuals always sum to zero. 2. This “least squares” line produces a smaller “Sum of squared residuals” than any other straight line can. 3. Line always passes through the point ( x, y ). Properties of Least Squares Line

  23. Illustration of unusual cases: • Outliers • Leverage • Influential

  24. X Y outlier l l l l l l “Unusual point” does not follow pattern. It’s near the X-mean; the entire line pulled toward it. l l l l l l l l l l X

  25. X l Y l “Unusual point” does not follow pattern. The line is pulled down and twistedslightly. l l l l l l l l l l l outlier l l l X

  26. X “Unusual point” is farfrom the X-mean, but still follows the pattern. Y l Highleverage l l l l l l l l l l l l l l X

  27. influential X “Unusual point” is far from the X-mean, but does not follow the pattern.Line really twists! Y l l l l l l l l l l l l l l l l leverage & outlier, X

  28. High Leverage Case: An extreme X value relative to the other X values. Definitions: Outlier: An unusual y-value relative to the pattern of the other cases.Usually has a large residual.

  29. has an unusually largeeffecton the slope of the least squares line. Definitions: continued Influential Case

  30. High leverage Definitions: continued Conclusion: potentially influential. High leverage & Outlier influential!!

  31. The least squares regression line is not resistantto unusual cases. Why do we care about identifying unusual cases?

  32. RegressionAnalysisin Minitab

  33. Lesson Objectives • Learn two ways to use Minitab to runa regression analysis. • Learn how to read output from Minitab.

  34. Example 3, continued … Can height be predicted using shoe size? Step 1? DTDP

  35. Female Male Example 3, continued … Can height be predicted using shoe size? Graph Scatterplot Plot … “Jitter” added in X-direction. The scatter for eachsubpopulation is about the same; i.e., there is“constant variance.”

  36. Example 3, continued … Stat Method 1 Regression Regression … Y = a + bX

  37. Example 3, continued … Copied from “Session Window.” Can height be predicted using shoe size? Regression Analysis: Height versus Shoe Size The regression equation is Height = 50.5 + 1.87 Shoe Size Predictor Coef SE Coef T P Constant 50.5230 0.5912 85.45 0.000 Shoe Siz 1.87241 0.06033 31.04 0.000 S = 1.947 R-Sq = 79.1% R-Sq(adj) = 79.0% Analysis of Variance Source DF SS MS F P Regression 1 3650.0 3650.0 963.26 0.000 Error 255 966.3 3.8 Total 256 4616.3

  38. Least squares estimated coefficients. Example 3, continued … Can height be predicted using shoe size? Regression Analysis: Height versus Shoe Size The regression equation is Height = 50.5 + 1.87 Shoe Size Predictor Coef SE Coef T P Constant 50.5230 0.5912 85.45 0.000 Shoe Siz 1.87241 0.06033 31.04 0.000 S = 1.947 R-Sq = 79.1% R-Sq(adj) = 79.0% Analysis of Variance Source DF SS MS F P Regression 1 3650.0 3650.0 963.26 0.000 Error 255 966.3 3.8 Total 256 4616.3 Total “Degrees of Freedom”= Number of cases - 1

  39. SSRTSS 3650.04616.3 R-Sq = = Example 3, continued … Can height be predicted using shoe size? Regression Analysis: Height versus Shoe Size The regression equation is Height = 50.5 + 1.87 Shoe Size Predictor Coef SE Coef T P Constant 50.5230 0.5912 85.45 0.000 Shoe Siz 1.87241 0.06033 31.04 0.000 S = 1.947 R-Sq = 79.1% R-Sq(adj) = 79.0% Analysis of Variance Source DF SS MS F P Regression 1 3650.0 3650.0 963.26 0.000 Error 255 966.3 3.8 Total 256 4616.3

  40. 3.8 S = MSE = Example 3, continued … Can height be predicted using shoe size? Regression Analysis: Height versus Shoe Size The regression equation is Height = 50.5 + 1.87 Shoe Size Predictor Coef SE Coef T P Constant 50.5230 0.5912 85.45 0.000 Shoe Siz 1.87241 0.06033 31.04 0.000 S = 1.947 R-Sq = 79.1% R-Sq(adj) = 79.0% Analysis of Variance Source DF SS MS F P Regression 1 3650.0 3650.0 963.26 0.000 Error 255 966.3 3.8 Total 256 4616.3 Standard Error of Regression.Measure of variation around the regression line. Sum of squared residuals Mean Squared ErrorMSE

  41. Example 3, continued … Can height be predicted using shoe size? Are there anyproblems visiblein this plot? ___________ No “Jitter” added.

  42. Height = 50.52 + 1.872 Shoe Example 3, continued … Can height be predicted using shoe size? Least squares regression equation: Std. error = 1.947 inches r-square = 79.1%, The two summary measuresthat should always begiven with the equation.

  43. Example 3, continued … Can height be predicted using shoe size? Stat Method 2 This program gives a scatterplot with the regression superimposed on it. Regression Fitted Line Plot … Y = a + bX

  44. Example 3, continued … Can height be predicted using shoe size? The fit looks

  45. Example 3, continued … Can height be predicted using shoe size? Regression Analysis: Height versus Shoe Size The regression equation is Height = 50.5 + 1.87 Shoe Size Predictor Coef SE Coef T P Constant 50.5230 0.5912 85.45 0.000 Shoe Siz 1.87241 0.06033 31.04 0.000 S = 1.947 R-Sq = 79.1% R-Sq(adj) = 79.0% Analysis of Variance Source DF SS MS F P Regression 1 3650.0 3650.0 963.26 0.000 Error 255 966.3 3.8 Total 256 4616.3 What information do these values provide?

  46. 1 How do you determine if theX-variable is a useful predictor? Use the“t-statistic”or the F-stat. “t” measures how many standard errors the estimated coefficient is from “zero.” “F” = t2 for simple regression.

  47. 2 How do you determine if theX-variable is a useful predictor? A “P-value” is associated with “t” and “F”. The further “t” and “F” are from zero,in either direction, the smaller the corresponding P-value will be. P-value: a measure of the “likelihoodthat the true coefficient IS ZERO.”

  48. If the P-value IS SMALL (typically “<0.10”), 3 then conclude: 1. It is unlikely that the true coefficient is really zero, and therefore, 2.  The X variable IS a useful predictor for the Y variable. Keep the variable! If the P-value is NOT SMALL (i.e., “> 0.10”), then conclude: 1.   For all practical purposes the true coefficient MAY BE ZERO; therefore 2.   The X variable IS NOT a useful predictor of the Y variable. Don’t use it.

  49. Example 3, continued … Can height be predicted using shoe size? Could “shoe size”have a truecoefficient thatis actually “zero”? Regression Analysis: Height versus Shoe Size The regression equation is Height = 50.5 + 1.87 Shoe Size Predictor Coef SE Coef T P Constant 50.5230 0.5912 85.45 0.000 Shoe Siz 1.87241 0.06033 31.04 0.000 S = 1.947 R-Sq = 79.1% R-Sq(adj) = 79.0% Analysis of Variance Source DF SS MS F P Regression 1 3650.0 3650.0 963.26 0.000 Error 255 966.3 3.8 Total 256 4616.3 “t” measures how many standard errors the estimated coefficient is from “zero.” P-value: a measure of the likelihoodthat the true coefficient is “zero.” The P-value for Shoe Size IS SMALL (< 0.10). Conclusion: The “shoe size” coefficient is NOT zero!“Shoe size” IS a useful predictor of the mean of “height”.

  50. The logic just explained is statistical inference. This will be covered in more detail during the last three weeks of the course.

More Related