Issues in Experimental Design

Issues in Experimental Design Reliability and ‘Error’

More things to think about in experimental design • The relationship of reliability and power • Treatment effect not the same for everyone • Some benefit more than others • Sounds like no big deal (or even obvious), but all of these designs discussed assume equal effect of treatment for individuals

Reliability • What is reliability? • Often thought of as consistency, but this is more of a by-product of reliability • Not to mention that you could have perfectly consistent scores lacking variability (i.e. constants) for which one could not obtain measures of reliability • Reliability may refer to a measure’s ability to capture an individual’s true score, to distinguish accurately one person from another on some measure • It is the correlation of scores on some measure with their true scores regarding that construct

Classical True Score Theory • Each subject’s score is true score + error of measurement • Obsvar = Truevar + Errorvar • Reliability = Truevar/ Obsvar = 1 – Errorvar/ Obsvar

Reliability and power • Reliability = Truevar/ Obsvar = 1 – Errorvar/ Obsvar • If observed variance goes up, power will decrease • However if observed variance goes up, we don’t know automatically what happens to reliability • Obsvar = Truevar + Errorvar • If it is error variance that is causing the increase in observed variance, reliability will decrease1 • Reliability goes down, Power goes down • If it is true variance that is causing the increase in observed variance, reliability will increase • Reliability goes up, Power goes down • The point is that psychometric properties of the variables play an important, and not altogether obvious role in how we will interpret results, and not having a reliable measure is a recipe for disaster

Error in ANOVA • Typical breakdown in a between groups design • SStot = SSb/t + SSe • Variation due to treatment and random variation (error) • The F statistic is a ratio of these variances • F = MSb/MSe

Error in ANOVA • Classical True Score Theory • Each subject’s score = true score + error of measurement • MSe can thus be further partitioned • Variation due to true differences on scores between subjects and error of measurement (unreliability) • MSe = MSer + MSes • MSer regards measurement error • MSes systematic differences between individuals • MSes comes has two sources • Individual differences • Treatment differences • Subject by treatment interaction

Error in ANOVA • The reliability of the measure will determine the extent to which the two sources of variability (MSer or MSes) contribute to the overall MSe • If Reliability = 1.00, MSer = 0 • Error term is a reflection only of systematic individual differences • If Reliability = 0.00, MSes = 0 • Error term is a reflection of measurement error only • MSer = (1-Rel)MSe • MSes = (Rel)MSe

Error in ANOVA We can actually test to see if systematic variation is significantly larger than variation due to error of measurement

Error in ANOVA • With a reliable measure, the bulk of MSe will be attributable to systematic individual differences • However with strong main effects/interactions, we might see sig F for this test even though the contribution to model is not very much • Calculate an effect size (eta-squared) • SSes/SStotal • Lyons and Howard suggest (based on Cohen’s rules of thumb) that < .33 would suggest that further investigation may not be necessary • How much of the variability seen in our data is due to systematic variation outside of the main effects? • Subjects responding differently to the treatment

Gist • Discerning the true nature of treatment effects, e.g. for clinical outcomes, is not easy, and not accomplished just because one has done an experiment and seen a statistically significant effect • Small though significant effects with not so reliable measures would not be reason to go with any particular treatment as most of the variance is due poor measures and subjects that do not respond similarly to that treatment • One reason to perhaps suspect individual differences due to the treatment would be heterogeneity of variance • For example, lots of variability in treatment group, not so much in control • Even with larger effects and reliable measures, a noticeable amount of the unaccounted for variance may be due to subjects responding differently to the treatment • Methods for dealing with the problem are outlined in Bryk and Raudenbush (hierarchical linear modeling), but one strategy may be to single out suspected covariates and control for them (ANCOVA or Blocking)

Repeated Measure and Hierarchical Linear Modeling • Another issue with ANOVA design again concerns the subject by treatment interaction, this time with regard to repeated measurements • RM design can be seen as a special case of HLM where the RM (e.g. time) is nested within subjects • The outcome is predicted by the repeated measure as before, but one can allow the intercept and slope(s) to vary over subjects, and that variance taken into account for the model • In this manner the HLM approach is specifically examining the treatment by subject interaction, getting a sense of the correlation between starting point and subsequent change

Repeated Measures and Hierarchical Linear Modeling • Briefly, HLM is a regression approach in which intercepts and/or coefficients are allowed to vary depending on other variables • As an example, the basic linear model for RM is the same • However, as an example, the intercept may be allowed to vary as a function of another variable (in this case Subject) • Which gives a new regression equation (note how this compares to RM in the GLM notes)

Example with One-way • From before, stress week before, the week of, or the week after their midterm exam • Using lmer in R1, allowing a random intercept for a linear model where time predicts stress level but the intercept is allowed to vary by subject reveals the same ANOVA • lmemod0=lmer(Score~Time+ (1|Subject),rmdata) • anova(lmemod0) Analysis of Variance Table Df Sum Sq Mean Sq F value Time 2 204.8 102.4 6.7467

Example with One-way • However, if I were allow the coefficients1 to vary, I would also note that starting point matters, in that there is a negative relation with the intercept and the general effect of time • If one starts out stressed, there is less of a jump during the midterm, and stronger decline by the end

Summary • Even though ANOVA designs may seem straightforward on the surface, and even if one has control over the administration of the variable of interest, one can see that issues remain, and that the basic approach may be inadequate to resolving the true nature of effects

Resources Zimmerman & Williams (1986) Bryk & Raudenbush (1988) Lyons & Howard (1991)

Issues in Experimental Design