300 likes | 320 Vues
This statistical analysis overview covers the concepts of nesting, intraclass correlation, and hierarchical linear models. Topics include the violation of linear model assumptions due to nesting, the impact of non-independence in educational data, and the use of ICC in measuring relatedness or dependence of clustered data. Additionally, the overview explains the application of hierarchical linear models in analyzing nested data and provides examples of 2-level HLM with random intercepts.
E N D
Statistical Analysis Overview ISession 2 Peg Burchinal Frank Porter Graham Child Development Institute, University of North Carolina-Chapel Hill
Overview: Statistical analysis overview I-b • Nesting and intraclass correlation • Hierarchical Linear Models • 2 level models • 3 level models
Nesting • Nesting implies violation of the linear model assumptions of independence of observations • Ignoring this dependency in the data results in inflated test statistics when observations are positively correlated • CAN DRAW INCORRECT CONCLUSIONS
Nesting and Design • Educational data often collected in schools, classrooms, or special treatment groups • Lack of independence among individuals -> reduction in variability • Pre-existing similarities (i.e., students within the cluster are more similar than a students who would be randomly selected) • Shared instructional environment (i.e., variability in instruction greater across classroom than within classroom) • Educational treatments often assigned to schools or classrooms • Advantage: To avoid contamination, make study more acceptable (often simple random assignment not possible) • Disadvantage: Analysis must take dependencies or relatedness of responses within clusters into account
Intraclass Correlation (ICC) • For models with clustering of individuals • “cluster effect”: proportion of variance in the outcomes that is between clusters (compares within-cluster variance to between-cluster variance) • Example – clustering of children in classroom. ICC describes proportion of variance associated with differences between classrooms
Intraclass Correlation • Intraclass correlation (ICC) – measure of relatedness or dependence of clustered data • Proportion of variance that is between clusters • ICC or r = s2b / (s2b + s2w) • ICC = 0 } no correlation among individuals within a cluster = 1 } all responses within the clusters are identical
Nesting, Design, and ICC • Taking ICC into account results in less power for given sample size • less independent information • Design effect = mk / (1 + r (m-1)) • m= number of individuals per cluster • K=number of clusters • r =ICC • Effective sample size is number of clusters (k) when ICC=1 and is number of individuals (mk) when ICC=0
ICC and Hierchical Linear Models • Hierarchical linear models (HLM) implicitly take nesting into account • Clustering of data is explicitly specified by model • ICC is considered when estimating standard errors, test statistics, and p-values
2 level HLM • One level of nesting • Longitudinal: Repeated measures of individual over time • Typically - Random intercepts and slopes to describe individual patterns of change over time • Clusters: Nesting of individuals within classes, families, therapy groups, etc. • Typically - Random intercept to describe cluster effect
2 level HLM Random-intercepts models • Corresponds to One-way ANOVA with random effects (mixed model ANOVA) • Example: Classrooms randomly assigned to treatment or control conditions • All study children within classroom in same condition • Post treatment outcome per child (can use pre-treatment as covariate to increase power) • Level 1 = children in classroom Level 2 = classroom ICC reflects extent the degree of similarity among students within the classroom.
2 Level HLMRandom Intercept Model • Level 1 – individual students within the classroom • Unconditional Model: Yij = B0j + rij • Conditional Model: Yij = B0j + B1 Xij + rij • Yij= outcome for ith student in jth class • B0j= intercept (e.g., mean) for jth class • B1= coefficient for individual-level covariate, Xij • rij= random error term for ith student in jth class, E ( rij) = 0, var (rij) = s2
2 Level HLMRandom Intercept Model • Level 2 – Classrooms • Unconditional model: B0j = g00 + u 0j • Conditional model: B0j = g00 + g01 Wj1 + g02 Wj2 + u 0j • B0j j= intercept (e.g., mean) for jth class • g00 = grand mean in population • g01 = treatment effect for Wj, dummy variable indicating treatment status -.5 if control; .5 if treatment • g02 coefficient for Wj2, class level covariate • u 0j = random effect associated with j-th classroom E (uij) = 0, var (uij) = t00
2 Level HLMRandom Intercept Model • Combined (unconditional) • Yij = g00 + u 0j + rij • Yij = B0j + rij • B0j = g00 + u 0j • Combined (conditional) • Yij = g00 + g01 Wj + g02 Wj2 + B1 Xij + u 0j + rij • Yij = B0j + B1 Xij + rij • B0j = g00 + g01 Wj + g02 Wj2 + u 0j • Var (Yij ) = Var ( u 0j + rij ) = (t00 + s2) • ICC = r = t00 / (t00 + s2)
Example2 level HLM Random Intercepts • Purdue Curriculum Study (Powell & Diamond) • Onsite or Remote coaching • 27 Head Start classes randomly assigned to onsite coaching and 25 to remote coaching • Post-test scores on writing • Onsite: n=196, M=6.70, SD=1.54 Remote: n=171, M=7.05, SD=1.64
Example2 level HLM Random Intercepts • Level 1: Writingij = B0j + B1 Writing-preij + rij B1 =.56, se=.05, p<.001 E ( rij) = 0, var (rij) = 1.67 • Level 2: B0j = g00 + g01 Onsitej + u 0j g00 (intercept- remote group adjusted mean) = 3.74, se =.31 g01(Onsite-Remote difference) = -.37, se=.17, p=.03 E (uij) = 0, var (uij) = .137 • ICC = t00 / (t00 + s2) = .137 / (.137 + 1.66) = .076
2 Level HLM - Longitudinal (random-slopes and –intercepts models) • Corresponds NOT to One-way ANOVA with random effects • Example: Longitudinal assessment of children’s literacy skills during Pre-K years • Level 1 = individual growth curve Level 2 = group growth curve
Level 1- Longitudinal HLM • Level 1 – individual growth curve • Unconditional Model: Yij = B0j + B1j Ageij + rij • Conditional Model: Yij = B0j + B1j Ageij + B2 Xij + rij • Yij= outcome for ith student on the jth occasion • Ageij = age at assessment for ith student on the jth occasion • B0j= intercept for ith student • B1j= slope for Age for ith student • B2= coefficient for tiem-varying covariate, Xij\ • rij= random error term for ith student on the jth occasion E ( rij) = 0, var (rij) = s2
Level 2 – Longitudinal HLM • Level 2 – predicting individual trajectories • Unconditional model: B0j = g00 + u 0j B1j = g10 + u 1j • Conditional model: B0j = g00 + g01 Wj1 + g02 Wj2 + u 0j B1j = g10 + g11 Wj1 + g12 Wj2 + u 1j • B0j= intercept for ith student B1j= slope for Age for ith student • g00 = intercept in population g10 = slope in population • g01 = treatment effect on intercept for Wj, student -level covariate g11 = treatment effect on slope for Wj, student -level covariate
Level 2 – Longitudinal HLM • Level 2 – predicting individual trajectories • Unconditional model: B0j = g00 + u 0j B1j = g10 + u 1j • Conditional model: B0j = g00 + g01 Wj1 + u 0j B1j = g10 + g11 Wj1 + u 1j • u 0j = random effect for individual intercept u 0j = random effect for individual slope • E (u0j) = 0, var (u0j) = t00 E (u1j) = 0, var (u1j) = t11 • cov (u 0j, u 1j) = t10 var (u 0j, u 1j)=t00 t01 t10 t00 • level 1 and 2 error terms independent cov (rij, T) = 0
Example – Longitudinal HLM • Purdue Curriculum Study (Powell & Diamond) Level 1 – estimating individual growth curves for children in one treatment condition (Remote) • Level 2 – estimating population growth curves for Remote condition
Example • Level 1: blendingij = B0j + B1j Ageij + rij estimated s2 = 10.34 • Level 2: B0j = g00 + g01 Wj1 + u 0j B1j = g10 + u 1j Estimated results Intercept g00 = 11.86 (se=.48), t00 = 10.03** season g01 = 2.43* (se=.70) Slope g10 = 1.51* (se=.60), t11 = 4.24** t10 = -1.45**
3 level HLM • 2 levels of nesting • Examples • Longitudinal assessments of children in randomly assigned classrooms • Level 1 – child level data • Level 2 – child’s growth curve • Level 3 – classroom level data • Two levels of nesting such as children nested in classrooms that are nested in schools • Level 1 – child level data • Level 2 – classroom level data • Level 3 – school level data
3 level Model-Random Intercepts • Children nested in classrooms, classrooms nested in schools • Level 1 child-level model Yijk = pojk + eijk • Yijk is achievement of child I in class J in school K • pojk is mean score of class j in school k • eojk is random “child effect” • Classroom level model pojk = B00k + r0jk • B00k is mean score for school k • r0jk is random “class effect” • School level model B00k = g000 + u00k • g000 is grand mean score • u00k is random “school effect”
3 level Model-Random Intercepts • Children nested in classrooms, classrooms nested in schools • Level 1 child-level model Yijk = pojk + eijk • eojk is random “child effect”, E (eijk) = 0 , var(eijk) = s2 • Within classroom level model pojk = B00k + r0jk • r0jk is random “class effect”, E (r0jk ) = 0 , var(r0jk ) = tp Assume variance among classes within school is the same • Between classroom (school) B00k = g000 + g01 trt + u00k E (u00k) = 0 , var(u00k) = tb
Partitioning variance • Proportion of variance within classroom • s2 / (s2 + tp + tb) • Proportion of variance among classrooms within schools tp/ (s2 + tp + tb) • Proportion of variance among schools tb/ (s2 + tp + tb)
3 Level HLM – level 2 longitudinal and level 3 random intercepts • Typically – treatment randomly assigned at classroom level, children followed longitudinally (e.g., Purdue Curriculum Study) • (within child) Level 1: Yijk = p0j k + p1j k Ageijk + rijk E (eijk) = 0 , var(eijk) = s2 • (between child ) Level 2: p0jk = b00k + r 0jk; p1j k= b10k + r 1jk E (r0jk ) = 0 , var(r0jk ) = tp0 E (r1jk ) = 0 , var(r1jk ) = tp1 • (between classes) Level 3: B00k = g00 + u00k; B10k = g10 + u10k E (u00k) = 0 , var(u00k) = tb E (u10k) = 0 , var(u10k) = tb
Example Purdue Curriculum Study • Level 1 – individual growth curve • Level 2 – classroom growth curve • Level 3 – treatment differences in classroom growth curves
Threats • Homogeneity of variance – at each level • Nonnormal data with heavy tails • Bad data • Differences in variability among groups • Normality assumption • Examine residuals • Robust standard error (large n) • Inferences with small samples
3 Level HLMLongitudinal assessments of individual in clustered settings