1 / 24

Regression Models

Regression Models. Residuals and Diagnosing the Quality of a Model. Visualizing Regression Models. Criteria of quality. Residuals (or what we don’t explain) should be “noise” Independent variables measure different phenomena We haven’t left out something important.

kjudge
Télécharger la présentation

Regression Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression Models Residuals and Diagnosing the Quality of a Model

  2. Visualizing Regression Models

  3. Criteria of quality • Residuals (or what we don’t explain) should be “noise” • Independent variables measure different phenomena • We haven’t left out something important.

  4. Diagnosing the Quality of a Regression Model Using the Residuals • Regression models assume that the errors of prediction are: • homoscedastic, • not autocorrelated, • normally distributed, and • not correlated with the independent variables.

  5. Regression Models assume… • The independent variables measure different phenomena, that is the independent variables are not themselves correlated. • If they are, we have a problem of “collinearity” or “multicolinearity.”

  6. Collinearity

  7. An Omitted Variable?

  8. Models • A Model: A statement of the relationship between a phenomenon to be explained and the factors, or variables, which explain it. • Steps in the Process of Quantitative Analysis: • Specification of the model • Estimation of the model • Evaluation of the model

  9. Thus far… • We’ve discussed… • The specification of a model, • The estimation of a model and how to read and interpret the statistics we’ve produced: coefficients, t tests, F tests, R Square • Now we need to evaluate the model for problems and further elaboration.

  10. We need to evaluate • The variation in the predicted values and the difference between the Yi and the predicted Y. That difference is called a “residual.” • We can analyze the residuals to see how good the equation is, and whether there are problems with the model that need correction or improvement.

  11. More statistics… • Standard Error of the Estimate: The square root of the average squared error of prediction is used as a measure of the accuracy of prediction. • For the population: • For the sample:

  12. Standard Error of the Estimate • Used to calculate a confidence interval around the predicted y. • As a rule of thumb, multiply the SEE by 2 and add and subtract from the predicted Ys to determine a measure of the variability of the prediction at a 95% confidence level. • At the mean of the independent variable: the standard error of the prediction = SEE/(square root of n).

  13. residual is 6.2 60 55 50 predicted value is 48.8 40 Y 30 20 10 X 0 10 20 Hypothetical Example

  14. Example from last week…. Newval = a + b1(Newsize) + b2(Families) + b3(Eastside) + b4(South) Dep Var: NEWVAL N: 467 Multiple R: 0.75 Squared multiple R: 0.56 Adjusted squared multiple R: 0.55 Standard error of estimate: 19.61 Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail) CONSTANT -3.32 2.95 0.00 . -1.13 0.26 NEWSIZE 23.60 1.32 0.67 0.68 17.88 0.00 FAMILIES -5.27 2.15 -0.08 0.87 -2.46 0.01 EASTSIDE 14.06 2.53 0.20 0.78 5.56 0.00 SOUTH 6.08 2.75 0.08 0.81 2.21 0.03

  15. To understand the principles, let’s simplify…. • We return to the bivariate case: • House value is a function of the size of the building. • Regression models assume that the errors of prediction are homoscedastic, not autocorrelated, normally distributed, and not correlated with the independent variables. • That is, the error term should be noise. • Now we ask: • 1. how accurate our prediction is, • 2. what are the characteristics of the residuals or the error term.

  16. Model of Housing Values and Building Size Dep Var: NEWVAL N: 467 Multiple R: 0.719 Squared multiple R: 0.517 Adjusted squared multiple R: 0.516 Standard error of estimate: 20.419 Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail) CONSTANT -8.667 2.012 0.000 . -4.307 0.000 NEWSIZE 25.381 1.138 0.719 1.000 22.312 0.000 Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P Regression 207571.306 1 207571.306 497.842 0.000 Residual 193878.246 465 416.942

  17. Scatterplot of Newsize and Newval

  18. Scatterplot, cont.

  19. 95% Confidence Intervals for Mean Predictions of Y (left) and Individual Predictions of Y (right)

  20. residual is 6.2 60 55 50 predicted value is 48.8 40 Y 30 20 10 X 0 10 20 Hypothetical Example

  21. Analysis of Residuals • ESTIMATE NEWVAL RESIDUAL • N of cases 467 467 467 • Minimum -2.647 6.400 -56.140 • Maximum 157.129 399.600 242.471 • Range 159.777 393.200 298.611 • Sum 14463.200 14463.200 0.000 • Median 25.391 24.000 -0.092 • Mean 30.970 30.970 0.000 • 95% CI Upper 32.963 33.639 1.775 • 95% CI Lower 28.977 28.301 -1.775 • Std. Error 1.014 1.358 0.903 • Standard Dev 21.917 29.351 19.522 • Variance 480.353 861.480 381.127 • C.V. 0.708 0.948 9.54775E+14 • Skewness(G1) 1.337 6.756 7.030 • SE Skewness 0.113 0.113 0.113 • Kurtosis(G2) 2.875 67.925 79.001 • SE Kurtosis 0.225 0.225 0.225

  22. Visualizing Regression Models

  23. Collinearity

  24. An Omitted Variable?

More Related