1 / 13

Lecture 21: Review

Lecture 21: Review. Review a few points about regression that I went over quickly concerning coefficient of determination, regression diagnostics and transformation. Review ANOVA problem. Review regression problem. Administrative Info for Midterm II.

clare
Télécharger la présentation

Lecture 21: Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 21: Review • Review a few points about regression that I went over quickly concerning coefficient of determination, regression diagnostics and transformation. • Review ANOVA problem. • Review regression problem.

  2. Administrative Info for Midterm II • Time and Location: Wednesday, April 2, 6-8 p.m. Steinberg Hall-Dietrich Hall 351. • Closed book, allowed one 8.5 x 11 double sided note sheet. • Bring calculator • All necessary tables will be provided but nothing additional (e.g., Tukey’s bulging rule will not be provided). • Office hours: Today after class (12:10-2:30), Wednesday 9-11:30

  3. Material Covered • Focus is on Chapter 15 and Chapter 18 (we covered everything except 15.6 and 18.8) • Chapters 13.5-13.6 are not covered. • Be prepared that questions could draw on your knowledge of material from first midterm in context of Chapter 15 and Chapter 18.

  4. Coefficient of Determination (R2) • R2 measures the strength of the linear relationship between Y and X • Formulas for R2: • Square of correlation between X and Y (thus if Cor(X,Y)=-0.5, then R2=0.25) • R2=1-(SSE/SSTOT)=SSR/SSTOT. SSR is called sums of square due to model in JMP output. Information about SSE, SSR, SSTOT can be obtained from Analysis of Variance section of output for regression in JMP.

  5. JMP output for Example 18.2

  6. Impact of Large Sample Sizes • R2 will on average be the same, no matter what the sample size. • However, if there is a linear relationship between X and Y, the p-value for the test for whether the slope is zero will tend to become smaller as the sample size increases. Even if the linear relationship between Y and X is weak (but the slope is not zero), the test will have a small p-value for a large sample size.

  7. Prediction Intervals vs. Confidence Intervals • Prediction Interval: Used when we want to predict one particular value of y given a specific value of x, e.g., a used car dealer wants to predict price of a particular Ford Taurus given that it has 40,000 miles. • Confidence Interval for estimator of expected value of y: Used when we want to estimate the mean of y given x, e.g., a used car dealer wants to bid on a lot of 200 Ford Tauruses with 40,000 miles and wants to know the mean price of a Ford Taurus given that has 40,000 miles.

  8. The prediction interval • The confidence interval Prediction Intervals vs. Confidence Intervals Cont. As the sample size becomes large, the width of the confidence interval tends to zero but the width of the prediction interval tends to

  9. Regression Assumptions and Diagnostics

  10. Influential Points and Outliers • In addition to doing the previous diagnostics, you should check residual plots for influential points and outliers (in y, x and direction of scatterplot). • Influential point: Outlier in direction of x (has high leverage) and does not fall into exactly the same pattern of relationship between y and x as the other points. • Investigate whether outliers and influential points are properly recorded and are representative of the population we are interested in.

  11. Diagnosing Nonlinearity • Check residual plot vs. x to see if there is a pattern.

  12. Transformations • If there is nonlinearity, one possible way to correct for it is to apply a transformation to y or x. • Tukey’s bulging rule (see handout) Match curvature in data to shape of one of the curves drawn in the four quadrants. Apply one of the transformations listed.

  13. Tukey’s Bulging Rule • Curvature appears to match top left quadrant. Try transformation to log X.

More Related