1 / 62

Multilevel Models

Multilevel Models. Other names for the same basic thing. hierarchical linear models multilevel models mixed-effects models mixed models variance-components models random-effects regression models random-coefficients regression models . Multilevel models. Common situations:

arvid
Télécharger la présentation

Multilevel Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multilevel Models

  2. Other names for the same basic thing • hierarchical linear models • multilevel models • mixed-effects models • mixed models • variance-components models • random-effects regression models • random-coefficients regression models

  3. Multilevel models • Common situations: • Individuals nested within groups • Random assignment done at the group level rather than at the individual level • Timepoints nested within subjects

  4. The traditional (wrong) way to analyze this • Linear or logistic regression analysis • IV  DV • Ignore the clustering • People did this for years because there weren’t good computer programs to do it any other way.

  5. What’s wrong with this? • It violates one of the main assumptions of the regression model! • Observations are supposed to be independent • Each person’s residual is independent of everyone else’s residual

  6. Why the observations are not independent • Kids in the same school are similar. • My residual is likely to be similar to the residuals of the other kids in my school. • (residual=difference between predicted value and actual value)

  7. Students in the same school are similar • Similar backgrounds (SES, urban/rural, ethnic mix of neighborhoods, community resources) • Similar experiences (same teachers, same school climate, shared events in school)

  8. Here, each individual represents an independent observation …and traditional data analytic techniques are appropriate Hierarchical Data Structures in a Group Randomized Trial In many research studies, we start by drawing a sample of individuals… and randomly assign them to either treatment or control

  9. However, we are not always able to separate people from their contexts. Students learn in schools Children grow up in neighborhoods Patients are treated in hospitals When the cluster is a necessary part of a research design, the resultant data will be nested, or hierarchically structured. Hierarchical Data Structures in a Group Randomized Trial

  10. The unit of assignment is an identifiable group (e.g., cluster). Different groups are allocated to each condition. The units of observation are members of the groups. The number of groups allocated to each condition is usually limited. Characteristics of Hierarchical Data Structures in a Group Randomized Trial

  11. What do the data look like?

  12. Scatterplot of number of nutrition lessons completed vs. number of days student brought fruit for lunch Source: I made this up.

  13. Draw an overall regression line Are the points evenly scattered around the line???

  14. Draw a separate regression line for each school

  15. How these lines differ • Each has its own Y-intercept. • Each has its own slope. • (Each could have its own amount of scatter, but in this example they’re all the same.) • (Each could be a different polynomial curve, but in this example they’re all lines.)

  16. If each school has its own regression line, is it appropriate to draw just one overall line to represent all the schools?

  17. You can improve the predictive power of the model if you add in information about the school. The points are closer to the regression line if you have a separate regression line for each school.

  18. School-level information helps you predict the individual-level scores better

  19. Two different equations that help predict an individual’s score: • Level 1: The individual-level equation Y = β0 + β1 X + ε • Level 2: The school-level equation • Within each school…. β0j = ψ00 + μ0j β1j = ψ10 + μ1j • (j represents the school)

  20. Multilevel models let you use all these equations at the same time!

  21. How to do it • Programs specifically designed for multilevel modeling • HLM • M+ • MLwiN • Other programs • SAS PROC MIXED

  22. What if you just ignore the multilevel structure and use PROC REG or GLM? • OLS regression model assumes that you have N unique pieces of information to estimate the regression line. • If my responses are partially explained by the responses of everyone else in my school, then there aren’t really N unique pieces of information. • (We’re cheating)

  23. The accuracy of the betas (and our confidence in them) is shown by their standard errors. • OLS model assumes there are N independent pieces of information when it computes the standard errors (minus the d.f.) • If observations are correlated, there really aren’t N independent pieces of information. • Estimated standard errors will be too small.

  24. Each individual’s score consists of two components: • Variance due to the group • E.g., overall SES of the school • Variance due to the individual within the group • E.g., each kid’s personality

  25. If you ignore the grouping, you’re attributing all the variance to the between-individuals component. • You’re saying that all the causes of variation exist across individuals, and you’re ignoring the effect of the group.

  26. What happens if the standard errors of the betas are too small? Type I errors (Conclude that an effect is significant when it’s really not)

  27. Intraclass Correlation (ICC) • Proportion of the total variance that is due to the group membership

  28. Equation for ICC σ2g(variance due to the group) ----------------- σ2m + σ2g (total variance: member + group) English: The proportion of the total variance that’s due to the grouping variable

  29. ICC in school-based studies is usually small • Typically around .02 for substance use, kids within schools • (David Murray is the expert on this) • Varies across DVs, samples, etc.

  30. How to calculate ICC • There is a macro on the SAS website • http://ftp.sas.com/techsup/download/stat/intracc.html • Paste the macro into your SAS program, and replace their variable names with your variable names.

  31. Another way to calculate ICC • Use PROC MIXED to calculate the unconditional means model PROC MIXED METHOD = ML COVTEST ; CLASS school ; MODEL dv= / SOLUTION ; RANDOM INT / TYPE=UN SUBJECT=school ; RUN ;

  32. Covariance Parameter Estimates (MLE) Cov Parm Subject Estimate Std Error Z Pr > |Z| UN(1,1) School 129.19 25.48 5.07 0.0001 Residual 321.56 10.63 30.25 0.0001 Variance component for school is 129.19 Variance component left over after variance due to school has been explained is 321.56. ICC = variance due to clustering variable / (variance due to clustering variable + variance remaining) 129 / (129 + 321) = .29 Source: http://www.utexas.edu/its/rc/answers/sas/sas97.html

  33. Small ICCs can have big effects! • Variance inflation factor (VIF) • Also known as design effect • 1 + (m-1) ICC • m=number of members per group • So with an ICC of .02 and 100 kids per group, VIF=2.98

  34. DEFT • Square root of VIF • In the example, • DEFT= √2.98 = 1.73 • The standard error of beta is really 1.73 times higher than what you get if you run a traditional OLS regression model!

  35. If you don’t account for the ICC • Beta will be about the same • But its standard error will be too small • So it will look more significant than it really is • So you conclude that there’s a significant effect, when maybe there really isn’t!

  36. Underestimating your ICC can undermine statistical power Real power of cluster randomized trials according to discrepancy between a priori postulated and a posteriori estimated intraclass correlation coefficients Effect size=.25 Power=80% g=number of clusters M=average cluster size N=total number of subjects Source: Guittet, L., Giraudeau, B., & Ravaud, P. (2005). A priori postulated and real power in cluster randomized trials: mind the gap. BMC Medical Research Methodology 2005, 5:25     

  37. A study design issue • For the same total N, you’ll have more power if you have a large number of clusters (schools) with few individuals (students) per cluster. • Example: It’s better to have 100 schools with 10 students per school than 10 schools with 100 students per school. • But that’s more difficult logistically!

  38. Increasing Power: More g or More m?

  39. So how do we fix it? • Need to account for the clustering in the regression model • Can reduce the ICC somewhat by including covariates that explain part of the group effect (e.g., proxy measures of SES) • But that doesn’t completely eliminate the problem

  40. PROC MIXED • Lets you include fixed effects (your regular IVs) and random effects (group effects) in the model • Proc mixed; • Class school; • Model dv = var1 var2; • Random intercept / subject=school; • Run;

  41. /Solution option • Gives you the stats you usually want to report: • Parameter estimate • Standard error • Degrees of freedom • T-value • P-value • Model dv = fixedIV1 fixedIV2 / solution;

  42. Example • Association between SES and smoking • Hypothesis: smoking is inversely associated with SES. • In this example…. • SES is the median income in the adolescent’s zip code • Smoking is a standardized average of ever tried smoking, ever smoked a whole cigarette, days in past month smoked, cigarettes per day • IRP Year 3 data

  43. PROC REG ignores the clustering procreg; model smkscale3=income1000/stb; run;

  44. REG output Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 12.53389 12.53389 30.12 <.0001 Error 1902 791.60096 0.41619 Corrected Total 1903 804.13485 Root MSE 0.64513 R-Square 0.0156 Dependent Mean 0.20623 Adj R-Sq 0.0151 Coeff Var 312.82168

  45. REG output Parameter Estimates Parameter Standard Standardized Variable DF Estimate Error t Value Pr > |t| Estimate Intercept 1 0.21036 0.01480 14.21 <.0001 0 income1000 1 -0.08135 0.01482 -5.49 <.0001 -0.12485

  46. Use PROC MIXED to take into account students clustered within schools. procmixed; class sch3; model smkscale3=income1000/solution; random intercept/sub=sch3 solution; run;

  47. Covariance Parameter Estimates Cov Parm Subject Estimate Intercept sch3 0.003753 Residual 0.4131 (If you had run the unconditional means model, this is where you would get the numbers to calculate the ICC.)

  48. The unconditional means model procmixed method=ml covtest; class sch3; model smkscale3= /solution; random intercept/type=un sub=sch3; run;

  49. Output from the unconditional means model Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr Z UN(1,1) sch3 0.0078 0.0037 2.11 0.0174 Residual 0.4276 0.0132 32.52 <.0001 ICC= .0078 / (.0078+.4276) = .0179 David Murray was right—it is around .02!

  50. Back to our model….

More Related