190 likes | 301 Vues
Exploring testing conditions, normality, homoscedasticity, and more in RM ANOVA and multilevel modeling. Learn about residuals, normality tests, and dealing with outliers and influential observations.
E N D
Key concepts • Testing conditions of applications in complex study design • Residuals • Tests of normality • Residuals plots • Residuals vs. fitted • QQ plots • Cook’s distance
Conditions of applications • RM ANOVA and multilevel modeling have 2 conditions of application in common: • Normality of the DV by cell of the IV • Few outliers • Homoscedasticity (equality of variance) • (Linearity: trivial in ANOVA since we only estimate mean differences)
Problems with checking normality by cell • Number of cells grow with number of IV • What about continuous IV • How to deal with number of tests
Problems with checking homoscedasticity by pair of cells • Number of cells grow with number of IV • What about continuous IV • How to deal with number of tests
Residuals: definition • Yi = b0 + b1X + e • Thus, • Where e are the residuals, and correspond to the distance between the observed value and the best predicted value
Residuals: what to look for • Residuals should have a normal distribution across (or irrespective of) groups since differences in IV have been subtracted. • Residuals should have equal variances, similarly to observed DV by cell • There should be no remaining structure in the residuals (allow to check for linearity
Normality tests • Many normality tests exist. By order of type I and type II error: • Shapiro-Wilk: • Where a depends on the parameters of a normal distribution and xi are the value of x from the smallest to the largest • Anderson-Darling: same idea of ordering data • Kolmogorov-smirnov • …
But… • All of these tests are known to be incorrect. • When data are in fact from a normal distribution, they reject the null too often or too rarely • When data are in fact not from a normal distribution, they do not reject the null often enough (low power)
Residual plots: residuals vs. fitted or vs. each IV • Scatterplot of the predicted values (Yi hat) against the residuals or against each IV. • There are different versions of this type of plot (e.g., residuals can be divided by their estimated standard deviation or not) • They allow to examine • homoscedasticity, • Linearity of relationship between IV and DV, • Normality of residuals (should have ellipsoid shape), • outliers
Residual plot: Quantile-Quantileplot • Graphical method for comparing two probability distributions • Compare the quantiles of the normal distribution with mean 0 and variance s2 to the values (ordered) of the residuals • All the points should align on the diagonal from bottom left to top right
Outliers • Outliers are extreme values either on the IV or on the DV or both. • Leverage observations are extreme on the X-axis (IV). But may not influence too much the estimation of the parameters. • Influential observations are extreme on the X and Y axes, and influence greatly the estimation of the parameters
Cook’s distances Where Yj are the predicted values of Y, and Yj(i) are the predicted values of Y if observation i was removed and the model was estimated again. p is the number of parameters of the model and MSE is the mean square error. Cutoff: 1 or 4/n or Fp,n-p
An example of a residual analysis • Back to autism data again. • Step 1: obtain the residualsuse the option save in the mixed linear model • Step 2: check normality (analysisexplore) • Step 3: look at residuals plot • Residuals vs fitted • Residuals vs time • (Standardized residuals vs fitted) • QQ plots • (Cook’s distance)