1 / 18

Stat 324 – Day 15

Stat 324 – Day 15. Review. Last Time – Assessing Independence. Detection Residuals vs. order (for data in order) Auto-correlation graph (different lags) Durbin-Watson statistic (like corr ( r i , r i -1 )) Values near 2 when no lag one correlation in the residuals

lbush
Télécharger la présentation

Stat 324 – Day 15

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stat 324 – Day 15 Review

  2. Last Time – Assessing Independence • Detection • Residuals vs. order (for data in order) • Auto-correlation graph (different lags) • Durbin-Watson statistic (like corr(ri, ri-1)) • Values near 2 when no lag one correlation in the residuals • Can approximate a two-sided p-value for H0: r = 0 • To fix • Time series methods (Stat 416) • Transformed variables (e.g., Cochran-Orcutt) • Including new variables…

  3. Announcements • HW 3 graded • (problem 1 24 vs. 29 points) • Access to annotated handouts • Future discussion board posts • Project • Data file • Appendix of output in report? • HW 4 posted this weekend

  4. Big Picture • Have a response variable, with variability • Believe it is associated with another variable • But not a deterministic relationship • DATA = MODEL + ERROR • Is the variability explained by the model large compared to the random chance variability? • Special model = Means follow a line (constant rate of change)

  5. Big Picture • Describing the relationship between x and y • Scatterplot: Direction, form, strength, outliers • Finding a model/Making predictions • Smoothers (nonparametric models) • Least Squares regression • Median-Median line • Transformations • Quadratic • Weighted least squares

  6. Least Square Regression Model • Validating the model • LINE conditions and how to check (graph, p-value) • Interpreting the model • Slope interpretation • Log transformations • R2, s – model performance • Making inferences about slope, intercept, predictions • p-values, confidence intervals for slope, intercept • prediction intervals for future (individual, mean) values • Back transformations • Consequences, Properties • Resistant procedures, No intercept models, (xbar, ybar)

  7. p-values

  8. The Translations • Describe the relationship • Scatterplot, corr coefficient: form, direction, strength, unusual observations, context • Fit a model, Prediction Eq • Run a regression (consider transformations) • Increase in y with each increase in x • Slope • Response when x = 0 • Intercept • Is the model valid? Is the model appropriate? • Residual analysis • Is the model accurate • R2, s • Is the relationship statistically significant? • p-value (watch for one-sided), hypotheses, df • Is the model useful? • R2, significance • Estimate y from x* • Point estimate vs interval estimate • Individual vs. mean • Generalizabilty, cause and effect • Look at study design

  9. Unusual Observations • Outlier in x, outlier in y, outlier in regression • Standardized residuals • Studentized residual – p-value • Leverage/hat values • Influence: Cook’s Distance

  10. Weighted Regression • The “model” is nice because all x values inform the estimate for y at x*. • But is that always a good thing? • Could downweight some observations because we don’t think they are as reliable • Smaller sample size • More variability in responses • Not as recent

  11. Weighted Regression • When we included all counties but weighted the regression by the number of people living in each county, the statistical significance of the opposite effects in Wisconsin and Texas both evaporated. • And I weighted the regression by FiveThirtyEight’s aggregate polling weight in each state, so that Ohio and Florida (for example) are much more influential than West Virginia or Hawaii. 

  12. Distance vs. time Decrease y power (.5, log, -1) or increase x power (2)

  13. Distance vs. time

  14. LINE • Linearity between E(Y) and x • Want no pattern in residuals vs fits • Lack of fit F test (if replication) • How to fix: • If monotonic, power transformation • Box-Cox to minimize SSE using power • If turns, quadratic • Other • Independence of residuals • Residuals vs. order, ACF • Durbin-Watson

  15. LINE • Normality of residuals • Histogram, normal probability plot (linear) • Anderson Darling and/or Shapiro Wilks p-value • How to fix: Transformation of Y • Equal variability in residuals • Want no fanning in residuals vs. fits • How to fix: Transformation of Y • If variability increases with Y, lower power on Y • If variability increasing with X, weighted regression

  16. Interpreting log transformations • Log(x): multiplying x by base is associated with slope change in (predicted) mean of y • Log(y): increasing x by 1 is associated with multiplying (predicted) median of y by baseslope • Log(x) and Log (y): Multiplying x by C is associated with multiplying (predicted) median of y by Cslope • See extra details handout online

  17. Study Advice • Review homework problems, solutions • Be ready to explain your reasoning • Be ready to apply your knowledge • Work problems • Ask review questions • Study as if a closed book exam • Know all the different confidence intervals • Know all the different standard deviations

  18. Test Taking Advice • Point allocation • Mix of easy and challenge questions • Partial Credit • Get something written down • Parts of a problem usually do not have to be answered in order • Or give suitable symbol and move on • Quickly read through all questions first? Any expected topics not there?

More Related