 Download Download Presentation Lecture 9: Diagnostics & Review

# Lecture 9: Diagnostics & Review

Télécharger la présentation ## Lecture 9: Diagnostics & Review

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Lecture 9:Diagnostics & Review February 10, 2014

2. Question A least squares regression line is determined from a sample of values for variables x and y where x = size of a listed home (in sq feet) y = selling price of the home (in \$) Which of the following is true about the model b0 + b1x? • If there is positive correlation r between x and y, then b1 must be positive • The units of the intercept and slope will be the same as the response variable, y. • If r2 = 0.85, then it is appropriate to conclude that a change in x will cause a change in y • None of the above, more than one of the above, or not enough information to tell.

3. Question A least squares regression line is determined from a sample of values for variables x and y where x = size of a listed home (in sq feet) y = selling price of the home (in \$) Which of the following is true about the model b0 + b1x? • If there is positive correlation r between x and y, then b1 must be positive b1 = r * sy / sx So if r> 0, then b1 is positive because syand sx> 0

4. Administrative • Problem set 4 due (9am) • How was it? • Next week: Multiple Regression • Exam Wednesday • Sample question • Taken from Exam 1 - #37 last year

5. Last time • What did we talk about? • Outliers • Sensitivity analysis • Heteroscedasticity

6. Common problems and fixes: Say we’re estimating price of a lease by the size of the house: Price = β0 + β1 * SqFt + ε Interpretation of the estimates? • β0would be fixed costs and • β1would be marginal costs

7. Common Problems:Heteroscedasticity Heteroscedasticity: What does that mean for your analysis? • Point estimates for β’s? • Still OK. No bias. • Prediction and Confidence intervals? • Not reliable; too narrow or too wide. • Hypothesis tests regarding β0 and β1 are not reliable.

8. Common Problems:Heteroscedasticity Fixing the problem: • Revise the model: how will depend on the substance. • Try revising the model to estimate Price/SqFt by dividing the original eq by SqFt: • Notice the change in the • intercept and slope: • Don’t be locked into thinking the intercept is fixed cost • How to interpret them depends • Think about the data!

9. Common Problems:Heteroscedasticity Fixing the problem: Price/SqFt = M + F * (1/SqFt) + ε • Revise by thinking about the substance • Here it was predict price per sqft directly. • Don’t revise by doing weird things • Use theory! • After revising, check if the residuals have similar variances? • Sometimes they won’t. • In this case they do:

10. Common Problems:Heteroscedasticity Comparing the revised and original model: • Revised model may have different (and smaller) R2. • Again, so? R2 is great but it’s only one notion of fit. • In the example, the revised model provides a narrower (hence better) confidence interval for fixed and variable costs: Original Model Revised Model Original Model Revised Model

11. Common Problems:Heteroscedasticity Comparing the revised and original model: • It also provides a more sensible prediction interval • The data originally indicated that large homes varied in price more:

12. Common Problems:Heteroscedasticity How do you know how to remodel the problem? • Practice • Creativity; try different things. • There is no magic bullet; sometimes you can’t.

13. Common Problems:Correlated Errors Problem: Dependence between residuals (autocorrelation) • The amount of error (detected by the size of the residual) you make at observation x+ 1 is related to the amount of error you make at observation x. • Why is this a problem? • SRM assumes that the errors, ε, are independent. • Common problem for time series data, but not just a time series problem. • Recall the u-shaped pattern in one of the residual plots before

14. Common Problems:Correlated Errors Detecting the problem: • Easier with time series data: • plot the residuals versus time and look for a pattern (is t+1 related to t?). Not guaranteed to find it but often helpful. • Use the Durbin-Watson statistic to test for correlation between adjacent residuals (aka serial- or auto-correlation) • With time series data adjacency is temporal. • In non time series data, we’re still talking about errors next to one another being related. • For things like spatial autocorrelation, there are more advanced things like mapping the residuals and tests we can do

15. Durbin-Watson Statistic • Tests to see if the correlation between the residuals is 0 • Null hypothesis: H0: ρε = 0 • It’s calculated as: • From the Durbin-Watson, D,statistic and sample size you can calculate the p-value for the hypothesis test • You’ll see this more in multiple regression and forecasting

16. Common Problems:Correlated Errors Consequences of Dependence: • With autocorrelation in the errors the estimated standard errors are too small • Estimated slope and intercept are less precise than as indicated by the output

17. Common Problems:Correlated Errors How do you fix it? • Try to model it directly or transform the data. • Example: number of mobile phone users: • Growth rate isn’t linear; try different transformations Original data Transformed data

18. Common Problems:Correlated Errors Does this fix the problem? • Linear pattern looks better • You still need to check the other SRM conditions!! • Omitted variables? • Analysis of residuals. Might still be a problem. Original data Transformed data

19. Exam Review • Download diamonds.xlsx • Regress price on weight • Are the residuals distributed Normal? • Yes • No • Maybe? • I have no idea how to verify that

20. Exam Review • Using your regression model from the last slide, predict the price of a diamond that weighs 0.44 carats • What is the approximate 95% confidence interval? • [\$877.75, \$1558.61] • [\$2324.80, \$3014.69] • [\$-97.97, \$184.95] • [\$2330.41, \$3009.09] • I have no idea

21. Exam Review • Using your regression model from the last slide, predict the price of a diamond that weighs 0.28 carats • What is the prediction interval? • [\$877.75, \$1558.61] • [\$452.57, \$1129.46] • [\$764.38, \$1058.25] • [\$345.61, \$678.34] • I have no idea

22. Exam Review • Question about transformations: • Again, no magic bullet. Try different ones. • How do you decide if you transform the X or Y? • Often depends on the substance.

23. Exam Review • Transformations • A common mistake is to forget to convert back to the appropriate units. • Say your data and interest is in km/l and you transform the response to be liters / 100 km. Don’t forget to transform back to the correct units. Similarly for ln(x) [ in excel e is =exp() ]

24. Exam Review • Conditions for the SRM • Know them. • Don’t be hesitant to try to fit a model if they are violated; just be cautious. • Some of you might think a regression model is inappropriate if you don’t see a pattern in the data, i.e.,: • Totally fine to try to fit a model • The slope will probably be 0.

25. Exam Review Check list: • Is the association between y and x linear? • Maybe one could exist but you don’t obviously see it (much more common in multiple regression) • Have omitted/lurking variables been ruled out? • In the exam, I’ll try to give you the necessary info. • Are the errors evidently independent? • How do you verify this? • Are the variances of the residuals similar? • How do you verify this? • Are the residuals nearly normal? • How do you verify this?

26. Exam Review • What do you need to know? • Everything from chapters 19 through 22… • No CAPM; we’ll come back to it. • What do you need to know from last semester? • Statistics builds on itself. I’ll assume you’re comfortable with some basic concepts (confidence intervals, hypothesis tests, z-scores, means, etc., etc.) • Will there be decision problems like those on Quiz 1? Maybe, but probably not. I want this to be more applied data analysis.

27. Exam Review • Types of Questions? • Possibly homework like. • Some business related decision making • Some non-business related analysis • Best way to study? • Do the problems. Then do more.