1 / 17

Clinical Research Training Program 2021

Clinical Research Training Program 2021. REGRESSION DIAGNOSTICS II. Fall 2004. www.edc.gsph.pitt.edu/faculty/dodge/clres2021.html. OUTLINE. Purpose of Regression Diagnostics Residuals Ordinary residuals, standardized residuals, studentized residuals, Jackknife residuals Leverage points

Télécharger la présentation

Clinical Research Training Program 2021

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Clinical Research Training Program 2021 REGRESSION DIAGNOSTICS II Fall 2004 www.edc.gsph.pitt.edu/faculty/dodge/clres2021.html

  2. OUTLINE • Purpose of Regression Diagnostics • Residuals • Ordinary residuals, standardized residuals, studentized residuals, Jackknife residuals • Leverage points • Diagonal elements of the hat matrix • Influential observations • Cook’s distance • Collinearity • Alternate Strategies of Analysis

  3. COOK’S DISTANCE • Cook’s distance is a measure of the influenceof an observation. • It measures the total change in all regression coefficients when the particular observation in data is deleted. • Cook’s distance • A observation with Cook’s distance value > 4/n is an influential observation.

  4. SBP=220, Age=47 • regress SBP Age • predict cooksd, cooksd • list SBP Age cooksd if cooksd>(4/32) • graph cooksd Age SBP Age cooksd 70. 220 47 .1838258

  5. overparametrization COLLINEARITY • Collinearity indicates that one of the predictors is an exact linear combination of some other predictors.

  6. COLLINEARITY • Collinearity means that within the set of covariates, some of the covariates are totally predicted by the other covariates. • Collinearity occurs when independent variables are so highly correlated that it becomes difficult or impossible to distinguish their individual influences on the response variable. • Near collinearity arises if the multiple R2 of one predictor with the remaining predictors is nearly 1.

  7. COLLINEARITY . regress X3 X1 X2 Source | SS df MS Number of obs = 46 -----------+------------------------- F( 2, 43) = 110.11 Model | 16554.94 2 8277.47 Prob > F = 0.0000 Residual | 3232.56 43 75.18 R-squared = 0.9366 -----------+-------------------------- Adj R-squared = 0.9290 Total | 19787.50 45 439.72 Root MSE = 8.6704 ------------------------------------------------------------------ X3 | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------+--------------------------------------------------------- X1 | 1.0849 .247256 4.39 0.000 .5863 1.5835 X2 | 3.5074 .256353 13.68 0.000 2.9904 4.0244 _cons | -204.29 27.7126 -7.37 0.000 -260.1786 -148.4031 ------------------------------------------------------------------

  8. COLLINEARITY Collinearity among predictors will have the following effects: • Regression coefficients will change dramatically according to whether other variables are included or excluded from the model. • The standard errors of the regression coefficients will be large. • In the worst cases, regression coefficients for collinear variables will be large in magnitude with signs that seem to be assigned at random. • Predictors with known, strong relationships to the response will not have their regression coefficients achieve statistical significance. • The interpretation between Y and Xs (the effect of change in one variable on Y, while other predictor variables are held constant) may not be possible in reality.

  9. COLLINEARITY • The variance inflation factor (VIF) is used to measure collinearity in a linear regression analysis. It is defined as where • A rule of thumb for evaluating VIFs is to be concerned with any value larger than 10.

  10. . regress SBP Age Gender Weight EduGtHS X1Z1 X1Z2 X2Z1 X2Z2 . vif Variable | VIF 1/VIF -------------+---------------------- X2Z2 | 73.35 0.013633 X2Z1 | 70.10 0.014264 EduHS | 54.80 0.018249 X1Z2 | 50.54 0.019785 EduGtHS | 43.48 0.022999 X1Z1 | 17.07 0.058569 Weight | 7.29 0.137182 Age | 3.96 0.252689 Gender | 2.66 0.375642 -------------+---------------------- Mean VIF | 35.92

  11. COLLINEARITY • A subtler form of near collinearity occurs with the following set of predictors: head-of-household income, education, number of years in work force, and age. Since these variables tend to be highly positively correlated with one another, one of the four is likely to be nearly perfectly predicted from the remainder.

  12. COLLINEARITY • Problems with regression calculations due to collinearity problems may not be easy to detect. • Collinearity problems can arise if particularly extreme data values are incorrectly included in the data set via errors in data collection. • Overused interaction terms might create collinearity problems.

  13. How to Deal with Collinearity? • Select or combine variables. • Use another type of regression (ridge regression, robust regression, and regression on principal components). • Use another type of analysis (path analysis, structural equation modeling, etc.). • Orthogonal polynomials is a solution to avoid collinearities when powers of a continuous variable X, X2, X3,…, etc. are used in a regression model.

  14. Summary • Normality • Linearity • Homoscedasticity • Collinearity • Outlier • Leverage • Influence

  15. Alternate Strategies of Analysis • The weighted least-squares method of analysis is a modification of standard regression analysis procedures that is used when a regression model is to be fit to a set of data for which the assumptions of homoscedasitivity and/or independence do not hold.

  16. Alternate Strategies of Analysis • Transformation on the dependent variable Y is used • to stabilize the variance of the dependent variable, if the homoscedasticity assumption is violated; • to normalize the dependent variable, if the normality assumption is noticeably violated; and • to linearize the regression model, if the original data suggest a model that is nonlinear in either the regression coefficients or the original variables (dependent or independent). (all can be fixed together by the same transformation)

  17. Alternate Strategies of Analysis • Regression on principal components, replaces the original predictor variables with uncorrelated linear combinations of them. (One might be the sum of the first three predictors, another might be the difference between the second and fourth, etc.) The scales are constructed not to be collinear. • Robust regression involves weighting or transforming the data so as to minimize the effects of extreme observations. The goal is to make the analysis more robust (that is, less sensitive) to any particular observation and also less sensitive to the basic assumptions of regression analysis.

More Related