Experiments

Experiments Pre and Post condition

Classic experimental design • Random assignment to control and treatment conditions • Why random assignment and control groups?

Classic experimental design • Random assignment helps with internal validity • Some threats to internal validity: • Experimenter/Subject expectation • Mortality bias • Is there an attrition bias such that subjects later in the research process are no longer representative of the larger initial group? • Selection bias • Without random assignment our treatment effects might be due to age, gender etc. instead of treatments • Evaluation apprehension • Does the process of experimentation alter results that would occur naturally? • Classic experimental design when done properly can help guard against many threats to internal validity

Classic experimental design • Posttest only control group design: • Experimental Group R X O1 • Control Group R O2 • With random assignment, groups should be largely equivalent such that we can assume the differences seen may be largely due to the treatment

Classic experimental design • Special problems involving control groups: • Control awareness • Is the control group aware it is a control group and is not receiving the experimental treatment? • Compensatory equalization of treatments • Experimenter compensating the control group's lack of the benefits of treatment by providing some other benefit for the control group • Unintended treatments • The ‘Hawthorne’ effect (as it is understood though not actually shown by the original study) might be an example

Mixed design: prepost experiments • Back to our basic control/treatment setup • A common use of mixed design includes a pre-test post test situation in which the between groups factor includes a control and treatment condition • Including a pretest allows: • A check on randomness • Added statistical control • Examination of within-subject change • 2 ways to determine treatment effectiveness • Overall treatment effect and in terms of change

Pre-test/Post-test • Random assignment • Observation for the two groups at time 1 • Introduction of the treatment for the experimental group • Observation of the two groups at time 2 • Note change for the two groups

Mixed design • 2 x 2 • Between subjects factor of treatment • Within subjects factor of pre/post • Example Pre Post treatment 20 70 treatment 10 50 treatment 60 90 treatment 20 60 treatment 10 50 control 50 20 control 10 10 control 40 30 control 20 50 control 10 10

SPSS output • Why are we not worried about sphericity here? • No main effect for treatment (though “close” with noticeable effect) • Main effect for prepost (often not surprising) • Interaction

Interaction • The interaction suggests that those in the treatment are benefiting from it while those in the control are not improving due to the lack of the treatment

Another approach • Note that if the interaction is the only thing of interest, in this situation we could have provided those results with a simpler analysis • Essentially the question regards the differences among treatment groups regarding the change from time 1 to time 2. • t-test on the gain (difference) scores from pre to post

T-test vs. Mixed output t2 = F

Different approaches • We could analyze this situation in yet another way. • Analysis of covariance would provide a description of differences among treatment groups at post while controlling for individual differences at pre • Note how our research question now shifts to one in which our emphasis is in differences at time 2, rather than describing differences in the change from time1 to time 2.

Pre-test • Special problems of before-after studies: • Instrumentation change • Variables are not measured in the same way in the before and after studies. • A common way for this to occur is when the observer/raters, through experience, become more adept at measurement. • History (intervening events) • Events not part of the study intervene between the before and after studies and have an effect • Maturation • Invalid inferences may be made when the maturation of the subjects between the before and after studies has an effect (ex., the effect of experience), but maturation has not been included as an explicit variable in the study. • Regression toward the mean • If subjects are chosen because they are above or below the mean, one would expect they will be closer to the mean on remeasurement, regardless of the intervention. For instance, if subjects are sorted by skill and then administered a skill test, the high and low skill groups will probably be closer to the mean than expected. • Test experience** • The before study impacts the after study in its own right, or multiple measurement of a concept leads to familiarity with the items and hence a history or fatigue effect.

Pre-test sensitization • So what if exposure to the pretest automatically influences posttest results, or possibly, how well the treatment will have its effect? • Example, testing the effects of a study course for GRE • One would imagine that having taken the test previously would definitely have an effect on the second go around

Solomon 4-group design • Can look at the effects of a pretest

Including a pretest can sensitize participants and create a threat to construct validity. Combining the two basic designs creates the Solomon 4-group design, which can determine if pretest sensitization is a problem: R X O R O R O X O R O O If these two groups are different, pretest sensitization is an issue. If these two groups are different, there is a testing effect.

Solomon 4-group design • Why not used so much? • Requires more groups • However, it has been show that this does not mean more subjects necessarily • Even if overall N maintained with switch to S4, may have more power than a posttest only situation • Not too many interested in pretest sensitization • Regardless one should control for it when possible, just like we’d control for other unwanted effects

Solomon 4-group design • Complexity of design and interpretation • Although understandable, as usual this is not a good reason for not doing a particular type of analysis • Lack of understanding of how to analyze • How do we analyze it?

Solomon 4-group design • We can analyze the data in different ways • One-way anova on the four post-test results • Treat all four groups as part of a 4 level factor • Contrast treatment groups vs. non • This would not however allow for us to get a sense of change/gain

Braver & Braver approach • 2 x 2 Factorial design with control/treat, pre/not as between subjects factors • Test A: significant interaction would suggest pretest effect • Effect of treatment changes depending on whether there is pretest exposure or not

Test B & C: simple effects • Sig simple effect for treatment at Prepresent but not at Preabsent • O2 > O4, O5 = O6 • In other words, treatment works but only if pretest • If this is the case, terminate analysis • Treatment effects are due to pretest

Pre-post • However, could there be a treatment effect in spite of the pretest effect? • In other words, could the pretest merely be provide an enhancement of the treatment • Ex. Kaplan/Princeton Review class helps in addition to the effect of having taken the GRE before • If test C is significant also (still assuming sig interaction) we could conclude that was the case

Pre-post • If no interaction, check main effect of treatment (test D) • If sig, then treatment effect w/o pretest effect • However this is not the most powerful course of action, and if not sig may not be indicative of no treatment effect b/c disregarding the pre data (less power)

Pre-post • Better would be to use analysis of covariance that takes into account differences among individuals at pretest (Test E) • T-test on gain scores (Test F) • Or mixed design (Test G) • As mentioned, F and the interaction in G are identical to one another • However test E will more likely have additional power

Ancova • We can interpret the ANCOVA as allowing for a test of the treatment after posttest scores have been adjusted for the pretest scores • Basically boils down to: • What difference at post would we see if the participants had scored the same at pre? • We are partialling out the effects of pre to determine the effect of the treatment on posttest scores

In SPSS • The ancova (or other tests) will only concern groups one and two as they are the only ones w/ pre-tests

If the Ancova results (or test F or G) show the treatment to still have an effect, we can conclude that the treatment has some utility beyond whatever effects the pre-test has on the post-test • If not significant, we may perform yet another test

Test H • t-test comparing groups 3 and 4 (O5 and O6) • Less power compared to others (only half the data and no pre info) but if it is significant despite the lack of power we can assume some treatment effect

Meta-analysis • Even if this test is not significant, Braver & Braver (1988) suggest a meta-analytic technique that combines the results of the previous two tests (e.g. test E and H) • Note how each is done only with a portion of the data • More power from a consideration of all the data • Take p-value from each test, convert to a one-tailed z-score, add the two z-scores and divide by √2 (i.e. the number of z-scores involved) to give zmeta • If that shows significance* then we can conclude a treatment effect • Nowadays might want to use effect size r or d for the meta-analysis (see Hunter and Schmidt) as there are obvious issues in using p-values • One might also just examine the Cohen’s d for each (without analysis) and draw their conclusion from that *A two-tailed probability is given for zmeta

Problems with the meta-analytic technique for Solomon 4 group design • Note that the meta-analytic approach may not always be the more powerful test depending on the data situation • Sawilosky and Markman (1990) show a case where the other tests are sig meta not • Also point out that by only doing the meta in the face of non sig we are forcing an inclusion criterion for the meta (selection bias)

Problems • Braver and Braver acknowledge that the meta-analytic technique should be conducted regardless of the outcomes of the previous tests • If test A & D nonsig, do all steps on the right side • However they note that the example Sawilosky used had a slightly negative correlation b/t pre and post for one setup, and an almost negligible positive corr in the other, and only one mean was significantly different from the others • Probably not a likely scenario • Since their discussion the Braver and Braver approach has been shown to be useful in the applied setting, but there still may be concerns regarding type I error rate • Gist: be cautious in interpretation, but feel free to use if suspect pre-test effects

MC’s summary/take • 1. Do all the tests on the right side if test A and D nonsig • If there is a treatment effect but not a pretest effect, the meta-analysis is more powerful for moderate and large sample sizes • With small sample sizes the classical ANCOVA is slightly more powerful • As the ANCOVA makes use of pretest scores, it is noticeably more powerful than the meta-analysis, whereas the t test is only slightly more powerful than the meta-analysis. • When a pretest either augments or diminishes the effectiveness of the treatment, the ANCOVA or t test is typically more powerful than the meta-analysis. • 2. Perhaps apply an FDR correction to the analyses conducted on the right side to control for type I error rate • 3. Focus on effect size to aid your interpretation

More things to think about in experimental design • The relationship of reliability and power • Treatment effect not the same for everyone • Some benefit more than others • Sounds like no big deal (or even obvious), but all of these designs discussed assume equal effect of treatment for individuals

Reliability • What is reliability? • Often thought of as consistency, but this is a by-product of reliability • Not to mention that you could have perfectly consistent scores lacking variability for which one could not obtain measures of reliability • Reliability really refers to a measure’s ability to capture an individual’s true score, i.e. to distinguish accurately one person from another on some measure • It is the correlation of scores on some measure with their true scores regarding that construct

Classical True Score Theory • Each subject’s score is true score + error of measurement • Obsvar = Truevar + Errorvar • Reliability = Truevar/ Obsvar = 1 – Errorvar/ Obsvar

Reliability and power • Reliability = Truevar/ Obsvar = 1 – Errorvar/ Obsvar • If observed variance goes up, power will decrease • However if observed variance goes up, we don’t know automatically what happens to reliability • Obsvar = Truevar + Errorvar • If it is error variance that is causing the increase in observed variance, reliability will decrease • Reliability goes down, Power goes down • If it is true variance that is causing the increase in observed variance, reliability will increase • Reliability goes up, Power goes down

Error in Anova • Typical breakdown • SStot = SSb/t + SSe • Variation due to treatment and random variation (error) • F = MSb/MSe

Error in Anova • Classical True Score Theory • Each subject’s score = true score + error of measurement • MSe can thus be further partitioned • Variation due to true differences on scores between subjects and error of measurement (unreliability) • MSe = MSer + MSes • MSer regards measurement error • MSes systematic differences between individuals • MSes comes has two sources • Reliable individual differences • Reliable treatment differences • Subject by treatment interaction

The reliability of the measure will determine the extent to which the two sources of variability (MSer or MSes) contribute to the overall MSe • Rel=1.00, MSer = 0 • Error term is a reflection only of systematic individual differences • Rel=0.00, MSes = 0 • Error term is a reflection of measurement error only • MSer = (1-Rel)MSe • MSes = (Rel)MSe

We can test to see if systematic variation is significantly larger than variation due to error of measurement

With a reliable measure, the bulk of MSe will be attributable to systematic individual differences • However with strong main effects/interactions, we might see sig F for this test even though the contribution to model is not very much • Calculate an effect size (eta-squared) • SSes/SStotal • Lyons and Howard suggest (based on Cohen’s rules of thumb) that < .33 would suggest that further investigation may not be necessary • How much of the variability seen in our data is due to systematic variation outside of the main effects? • Subjects responding differently to the treatment

One reason to perhaps suspect individual differences due to the treatment would be heterogeneity of variance • For example, lots of variability in treatment group, not so much in control • Methods for dealing with the problem are outlined in Bryk and Raudenbush (hierarchical linear modeling), but one strategy may be to single out suspected covariates

Resources • Zimmerman & Williams (1986) • Bryk & Raudenbush (1988) • Lyons & Howard (1991)

Experiments

Experiments

Presentation Transcript

Experiments

Experiments

EXPERIMENTS

Experiments

Experiments and Quasi-Experiments

Experiments and Quasi-Experiments

Experiments and Quasi Experiments

Experiments

Experiments

Experiments

Experiments

Experiments

Experiments

Experiments

Experiments

Experiments!

Experiments

Experiments

Experiments and Quasi-Experiments

Experiments

Experiments

Experiments and Quasi-Experiments