Diagnostics Checking Assumptions and Bad Data
What is the linearity assumption? How can you tell if it seems met? What is homoscedasticity (heteroscedasticity)? How can you tell if it’s a problem? What is an outlier? What is leverage? What is a residual? How can you use residuals in assuring that the regression model is a good representation of the data? Why consider a standardized residual? What is a studentized residual? Questions
Linear Model • Linear relations b/t X and Y • Normal distribution of error of prediction • Homoscedasticity (homogeneity of error in Y across levels of X)
Good-Looking Graph No apparent departures from line.
Same Data, Different Graph No systematic relations between X and residuals.
Problem with Heteroscedasticity Common problem when Y = $
Outliers Outlier = pathological point
Review • What is the linearity assumption? How can you tell if it seems met? • What is homoscedasticity (heteroscedasticity)? How can you tell if it’s a problem? • What is an outlier?
Residuals • Zresid • Look for large values (some say |z|>2) • Studentized residual (Student Residual): The studentized residual considers the distance of the point from the mean. The farther X is from the mean, the smaller the standard error and the larger the residual. Look for large values. Also, studentized deleted residual (RStudent).
Influence Analysis • Leverage: • Leverage is an index of the importance of an observation to a regression analysis. • Function of X only • Large deviations from mean are influential • Maximum is 1; min is 1/N • Average value is (k+1)/N, where k is the number of IVs
Influence Analysis (2) • DFBETA and standardized DFBETA • Change in slope or intercept resulting when you delete the ith person. • Allow for influence of both X and Y
Example r = .82; r2 = .67; p < .05. X Y SX = 1.95, SY = 2.41 b=1.01, a=-1.34 M=
Remedies • Fit Curves if needed. • Note heteroscedasticity for applied problems. • Investigate all outliers. May delete them or not, depending. Report your actions.
Review • What is leverage? • What is a residual? • How can you use residuals in assuring that the regression model is a good representation of the data? • Why consider a standardized residual? • What is a studentized residual?