1 / 36

Topic 19: Remedies

Topic 19: Remedies. Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping. Regression Diagnostics Summary. Check normality of the residuals with a normal quantile plot

licia
Télécharger la présentation

Topic 19: Remedies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic 19: Remedies

  2. Outline • Review regression diagnostics • Remedial measures • Weighted regression • Ridge regression • Robust regression • Bootstrapping

  3. Regression DiagnosticsSummary • Check normality of the residuals with a normal quantile plot • Plot the residuals versus predicted values, versus each of the X’s and (when appropriate) versus time • Examine the partial regression plots • Use the graphics smoother to see if there appears to be a curvilinear pattern

  4. Regression DiagnosticsSummary • Examine • the studentized deleted residuals (RSTUDENT in the output) • The hat matrix diagonals • Dffits, Cook’s D, and the DFBETAS • Check observations that are extreme on these measures relative to the other observations

  5. Regression DiagnosticsSummary • Examine the tolerance for each X • If there are variables with low tolerance, you need to do some model building • Recode variables • Variable selection

  6. Remedial measures • Weighted least squares • Ridge regression • Robust regression • Nonparametric regression • Bootstrapping

  7. Maximum Likelihood

  8. Weighted regression • Maximization of L with respect to β’s is equivalent to minimization of • Weight of each observation: wi=1/σi2

  9. Weighted least squares • Least squares problem is to minimize the sum of wi times the squared residual for case i • Computations are easy, use the weight statement in proc reg • bw = (X΄WX)-1(X΄WY) where W is a diagonal matrix of the weights • The problem now becomes determining the weights

  10. Determination of weights • Find a relationship between the absolute residual and another variable and use this as a model for the standard deviation • Similarly for the squared residual and another variable • Use grouped data or approximately grouped data to estimate the variance

  11. Determination of weights • With a model for the standard deviation or the variance, we can approximate the optimal weights • Optimal weights are proportional to the inverse of the variance

  12. KNNL Example • KNNL p 427 • Y is diastolic blood pressure • X is age • n = 54 healthy adult women aged 20 to 60 years old

  13. Get the data and check it data a1; infile ‘../data/ch11ta01.txt'; input age diast; proc print data=a1; run;

  14. Plot the relationship symbol1 v=circle i=sm70; proc gplot data=a1; plot diast*age / frame; run;

  15. Diastolic bp vs age Strong linear relationship but non-constant variance

  16. Run the regression proc reg data=a1; model diast=age; output out=a2 r=resid; run;

  17. Regression output

  18. Regression output Estimators still unbiased but no longer have minimum variance Prediction interval coverage often lower or higher than 95%

  19. Use the output data set to get the absolute and squared residuals data a2; set a2; absr=abs(resid); sqrr=resid*resid;

  20. Do the plots with a smooth proc gplot data=a2; plot (resid absr sqrr)*age; run;

  21. Absolute value of the residuals vs age

  22. Squared residuals vs age

  23. Model the std dev vs age (absolute value of the residual) proc reg data=a2; model absr=age; output out=a3 p=shat; Note that a3 has the predicted standard deviations (shat)

  24. Compute the weights data a3; set a3; wt=1/(shat*shat);

  25. Regression with weights proc reg data=a3; model diast=age / clb; weight wt; run;

  26. Output

  27. Output Reduction in std err of the age coeff

  28. Ridge regression • Similar to a very old idea in numerical analysis • If (X΄X) is difficult to invert (near singular) then approximate by inverting (X΄X+kI). • Estimators of coefficients are biased but more stable. • For some value of k ridge regression estimator has a smaller mean square error than ordinary least square estimator. • Can be used to reduce number of predictors • Ridge = k is an option for model statement . • Cross-validation used to determine k

  29. Robust regression • Basic idea is to have a procedure that is not sensitive to outliers • Alternatives to least squares, minimize • sum of absolute values of residuals • median of the squares of residuals • Do weighted regression with weights based on residuals, and iterate

  30. Nonparametric regression • Several versions • We have used i=sm70 • Interesting theory • All versions have some smoothing or penalty parameter similar to the 70 in i=sm70

  31. Bootstrap • Very important theoretical development that has had a major impact on applied statistics • Based on simulation • Sample with replacement from the data or residuals and repeatedly refit model to get the distribution of the quantity of interest

  32. Background Reading • We used programs topic19.sas • This completes Chapter 11 • This completes the material for the midterm

More Related