910 likes | 923 Vues
This article discusses the critical preconditions for conducting meaningful and rigorous evaluations of educational interventions. It covers aspects such as the well-specified intervention, theory and prior research, target population, intended outcomes, theory of change, operators' manual, rationale for the intervention, implementability, evidence of effectiveness, and amenable research sites and circumstances. The article also explores experimental design principles, including control by matching, control by randomization, and control by statistical adjustment.
E N D
Conditions for Meaningful Investigation of Educational Interventions Lipsey, 2010, IES/NCER Summer Institute on RCTs
Conditions for Meaningful Investigation of Educational Interventions • Caveat: “Rigorous evaluations” are not appropriate for every intervention or every research project involving an intervention • They require special resources (funding, amenable circumstances, expertise, time) • They can produce misleading or uninformative results if not done well • The preconditions for making them meaningful may not be met. Lipsey, 2010, IES/NCER Summer Institute on RCTs
Critical preconditions for rigorous evaluation • A well-specified, fully developed intervention with useful scope • basis in theory and prior research • identified target population • specification of intended outcomes/effects • “theory of change” explication of what it does and why it should have the intended effects for the intended population • operators’ manual: complete instructions for implementing • ready-to-go materials, training procedures, software, etc. Lipsey, 2010, IES/NCER Summer Institute on RCTs
Critical preconditions (continued) • A plausible rationale that the intervention is needed; reason to believe it has advantages over what’s currently proven and available • Clarity about the relevant counterfactual– what it is supposed to be better than • Demonstrated “implementability”– can be implemented well enough in practice to plausibly have effects • Some evidence that it can produce the intended effects albeit short of standards for rigorous evaluation. Lipsey, 2010, IES/NCER Summer Institute on RCTs
Critical preconditions (continued) • Amenable research sites and circumstances: • cooperative schools, teachers, parents, and administrators willing to participate • student sample appropriate in terms of representativeness and size for showing educationally meaningful effects • access to students (e.g., for testing), records, classrooms (e.g., for observations) Lipsey, 2010, IES/NCER Summer Institute on RCTs
Moving from Research Questions to Experimental Design and Analysis
Experimental Design • Recall that we said earlier that: • Experimental Design includes both • Strategies for organizing data collection, and • Data analysis procedures matched to those data collection strategies • Experimental design helps us to answer questions as “validly, objectively, accurately, and economically as possible” – Kerlinger • MaxMinCon principle
Principles of Experimental Design • Three basic principles for controlling background variability • Control by matching • Control by randomization • Control by statistical adjustment Hedges, 2010, IES/NCER Summer Institute on RCTs
Control by Matching • Strengths • Can be used to eliminate known sources of variability • genetic variation in animal learning experiments • organizational variation in school-based research • Limitations: • only possible on observable characteristics • perfect matching is not always possible • matching inherently limits generalizability by removing (possibly desired) variation Hedges, 2010, IES/NCER Summer Institute on RCTs
Control by Randomization • Matching controls for the effects of variation due to specific observable characteristics • Randomization controls for the effects all (observable or non-observable, known or unknown) characteristics • Randomization makes groups equivalent (on average) on all variables (known and unknown, observable or not) • Randomization also gives us a way to assess whether differences after treatment are larger than would be expected due to chance. Hedges, 2010, IES/NCER Summer Institute on RCTs
Control by Randomization • Random assignment is not assignment with no particular rule. It is a purposeful process. Assignment of schools to treatments is made at random. This does not mean that the experimenter assigns schools to treatments in any order that occurs to her, but that she carries out a physical experimental process of randomization, using means which shall ensure that each treatment will have an equal chance of being tested in any particular school (Hedges, 2007) Hedges, 2010, IES/NCER Summer Institute on RCTs
Control by Statistical Adjustment • Control by statistical adjustment is a form of pseudo-matching • It uses statistical relations to simulate matching • Statistical control is important for increasing precision but should not be relied upon to control biases that may exist prior to assignment • Statistical control is the weakest of the three experimental design principles because its validity depends on knowing a statistical model for responses Hedges, 2010, IES/NCER Summer Institute on RCTs
Using Principles of Experimental Design • You have to know a lot (be smart) to use matching and statistical control effectively • You do not have to be smart to use randomization effectively But • Where all are possible, randomization is not as efficient (requires larger sample sizes for the same power) as matching or statistical control Hedges, 2010, IES/NCER Summer Institute on RCTs
Randomization • Randomization facilitates causal inferences involving interventions in education • Randomization allows chance to enter into intervention studies in a known, i.e., controlled, way • Randomization is “the only method for controlling all possible extraneous variables”
Research Design Specification of: • The means by which the units of observation (e.g., students, teachers, classrooms, schools) come to be observed (i.e., the method of selection) • The conditions under which observations are made • The means by which the units of observation find themselves in their respective conditions (i.e., the method of assigning units to conditions)
Moving from Research Questions to Design: An Expanded List of Questions to Consider • Who will be assessed/observed? (selection) • What will be assessed/observed? (measurement) • When and how often will assessments/observations take place? • Under what conditions will assessments/observations take place and what are the means by which units and conditions become paired (assignment) • How will the assessments/observations be analyzed so as to address the specific aims of the study?
Who will be assessed (selection)? • Clearly relevant for external validity • Distinguish between target and accessible populations • Concerns selection, but also much more • Heterogeneity of target population • Multitude of factors and individuals affecting student outcomes • Sampling units – students, teachers, schools, locations (contexts) • Cost - Should all assessments be collected on entire sample, or planned missingness?
What will be assessed? • What skill domains need to be assessed – • Only those likely to be affected by the treatment, or also • Skills not likely to be affected by the treatment? • Proximal and distal effects of the treatment? • Skills or factors which mediate the effect of treatment? • Skills or factors which moderate the effect of treatment?
What will be assessed (cont.)? • What kind of assessment/observation is needed? • NRT? • CRT? • Growth measure? • In what language(s)? • Individual or group based? • Paper & pencil, performance based, survey/self-report? • Observations? • Treatment fidelity?
When and how often do I assess? • Beginning of year and/or end of year? • Multiple times (> 2) per year? • What is the window of opportunity for assessment – i.e. how much time can elapse between the assessment of the first subject and last subject at a given occasion of measurement? • Before treatment begins? If so, how many times before treatment begins?
Under what conditions will assessments/observations take place? • What conditions are the most important to study given the aims of the research? • Should these conditions be studied within person, or between person? • Will there be contamination, or carry-over if individuals receive more than one condition? • Will individual differences mitigate against failures to deliver precisely the same conditions to all individuals?
What are the means by which units and conditions become paired (assignment)? • Is random assignment possible? • Will random assignment be feasible? • Is random assignment desirable? • If random assignment is not possible, can assignment be controlled or can factors affecting assignment be measured prior to assignment (propensity score matching)?
General points to consider • No one study on its own will completely reduce uncertainty about the effects of an intervention. • Knowledge results from an accumulation of evidence about an intervention. • Answers to the questions like the ones posed on the preceding slides embody a theory of change for the intervention under study. • This theory of change guides the individual investigations and the overall body of work on an intervention.
General Points to Consider • The IES Goal Structure for intervention development specifies three phases of intervention development and testing: Development, Efficacy, and Scale Up (i.e., Effectiveness) • These three stages of development are analogous, but not identical to the different phases of clinical trials outlined by the FDA. • The phases differ in the scale and focus of the research, and the experimental control that is needed.
IES Goal Structure • Goal 2 (intervention development) for advancing intervention concepts to the point where rigorous evaluation of its effects may be justified • Goal 3 (efficacy studies) for determining whether an intervention can produce worthwhile effects; RCT evaluations preferred. • Goal 4 (effectiveness studies) for investigating the effects of an intervention implemented under realistic conditions at scale; RCT evaluations preferred with independent evaluators.
Theory of Change • Theory of change in intervention research is analogous to the mechanism of action in pharmaceutical research, but is considerably more extensive. • The mechanism of action is generally restricted to the biological, chemical, and physiological processes through which a drug exerts its influence. • In educational interventions, the list of potential actors is greater and context potentially plays a greater role that must be taken into account.
Specifying the Theory of Change • Nature of the need addressed • what and for whom (e.g., 2nd grade students who don’t read well) • why (e.g., poor decoding skills, limited vocabulary) • where the issues addressed fit in the developmental progression (e.g., prerequisites to fluency and comprehension, assumes concepts of print) • rationale/evidence supporting these specific intervention targets at this particular time Lipsey, 2010, IES/NCER Summer Institute on RCTs
Specifying the theory of change • How the intervention addresses the need and why it should work • content: what the student should know or be able to do; why this meets the need • pedagogy: instructional techniques and methods to be used; why appropriate • delivery system: how the intervention will arrange to deliver the instruction Lipsey, 2010, IES/NCER Summer Institute on RCTs
Specifying the theory of change • Most important: What aspects of the above are different from the counterfactual condition • What are the key factors or core ingredients most essential and distinctive to the intervention Lipsey, 2010, IES/NCER Summer Institute on RCTs
Logic models as theory schematics Target Population Intervention Proximal Outcomes Distal Outcomes Positive attitudes to school 4 year old pre-K children Improved pre-literacy skills Increased school readiness Greater cognitive gains in K Exposed to intervention Learn appropriate school behavior Lipsey, 2010, IES/NCER Summer Institute on RCTs
Mapping variables onto the intervention theory: Sample characteristics Positive attitudes to school 4 year old pre-K children Improved pre-literacy skills Increased school readiness Greater cognitive gains in K Exposed to intervention Learn appropriate school behavior Sample descriptors: basic demographics diagnostic, need/eligibility identification nuisance factors (for variance control) Potential moderators: setting, context personal and family characteristics prior experience Lipsey, 2010, IES/NCER Summer Institute on RCTs
Mapping variables onto the intervention theory: Intervention characteristics Positive attitudes to school 4 year old pre-K children Improved pre-literacy skills Increased school readiness Greater cognitive gains in K Exposed to intervention Learn appropriate school behavior Independent variable: T vs. C experimental condition Generic fidelity: T and C exposure to the generic aspects of the intervention (type, amount, quality) Specific fidelity: T and C(?) exposure to distinctive aspects of the intervention (type, amount, quality) Potential moderators: characteristics of personnel intervention setting, context e.g., class size
Mapping variables onto the intervention theory: Intervention outcomes Positive attitudes to school 4 year old pre-K children Improved pre-literacy skills Increased school readiness Greater cognitive gains in K Exposed to intervention Learn appropriate school behavior Focal dependent variables: pretests (pre-intervention) posttests (at end of intervention) follow-ups (lagged after end of intervention Other dependent variables: construct controls– related DVs not expected to be affected side effects– unplanned positive or negative outcomes mediators– DVs on causal pathways from intervention to other DVs
Main relationships of (possible) interest • Causal relationship between IV and DVs (effects of causes); tested as T-C differences • Duration of effects post-intervention; growth trajectories • Moderator relationships; ATIs (aptitude-Tx interactions): differential T effects for different subgroups; tested as T x M interactions or T-C differences between subgroups • Mediator relationships: stepwise causal relationship with effect on one DV causing effect on another; tested via Baron & Kenny (1986), SEM type techniques. Lipsey, 2010, IES/NCER Summer Institute on RCTs
Formulation of the research questions • Organized around key variables and relationships • Specific with regard to the nature of the variables and relationships • Supported with a rationale for why the question is important to answer • Connected to real-world education issues • What works, for whom, under what circumstances, how, and why? Lipsey, 2010, IES/NCER Summer Institute on RCTs
Selecting Outcomes: Achievement effect sizes from 124 randomized education studies Lipsey, 2010, IES/NCER Summer Institute on RCTs
Other points to Consider in Selecting Measures • Measurement Invariance in Longitudinal Studies • Differential Test and Item Functioning in Special Populations • Vertical Scales in Developmental Research • Precision of the Test within the Targeted Ability Range of the Intervention • Variability in Rater Criteria when Observers are involved in Assessment and Fidelity
Making Statistical Inferences: Moving from Question to Design and Analysis • There are 3 fundamental components to making statistical inferences about unobserved populations 1) sampling 2) estimation 3) probability
Sampling Models in Educational Research • Sampling is where the randomness comes from in social research • As such, sampling has profound consequences for statistical analysis and research design • Consider two simple random samples: • Sample A based on N=20 • Sample B based on N=200 • Which is better?
Sampling Models in Educational Research • Sample B will lead to a more precise estimate of the parameter of interest. • For example, if we are estimating the mean with the sample mean and the population has variance σT2 • Then the sample mean has variance σT2/N, which will be less for Sample B as compared to Sample A
Sampling Models in Educational Research • Simple random samples are rare in field research • Educational populations are hierarchically nested: • Students in classrooms in schools • Schools in districts in states • This structure has implications for sampling and for the precision of estimates Hedges, IES/NCER 2010, Summer Institute on RCT’s
Sampling Models in Educational Research • Suppose we use two-stage sampling to sample schools and then students within schools • Which sample will provide a more precise estimate of the mean? • Sample A, with N = 1,000 • Sample B, with N = 2,000 • We cannot tell unless we know the number of schools (m) and number of students (n) in each school Hedges, IES/NCER 2010, Summer Institute on RCT’s
Precision of Estimates Depends on the Sampling Model • In a clustered sample of n students from each of m schools • The variance of the mean is (σT2/mn)[1 + (n – 1)ρ] • The inflation factor [1 + (n – 1)ρ] is called the design effect • ρis the intra-class correlation – the proportion of total variance that is between schools. Hedges, IES/NCER 2010, Summer Institute on RCT’s
Example • Suppose ρ = 0.20 • Sample A: m = 100 and n = 10, so N = 1,000 σM2 = (σT2/100 x 10)[1 + (10 – 1)0.20] = (σT2/1000)(2.8) • Sample B: m = 20 and n = 100, so N = 2,000 σM2 = (σT2/100 x 20)[1 + (100 – 1)0.20] = (σT2/1000)(10.4) Hedges, IES/NCER 2010, Summer Institute on RCT’s
Precision of Estimates Depends on the Sampling Model • If we have students nested within classrooms nested within schools, the variance inflation factor has two components. • LetρS and ρC be the school and class-level ICCs, respectively. Then, • σM2 = (σT2/mpn)[1 + (pn – 1)ρS + (n – 1)ρC] • The three level design effect is [1 + (pn – 1)ρS + (n – 1)ρC] Hedges, IES/NCER 2010, Summer Institute on RCT’s
Precision of Estimates Depends on the Sampling Model • The total variance can be partitioned into between cluster (σB2) and within cluster (σW2) variance • We define the intraclass correlation as the proportion of total variance that is between clusters • There is typically much more variance within clusters (σW2) than between clusters (σB2) Hedges, IES/NCER 2010, Summer Institute on RCT’s
Precision of Estimates Depends on the Sampling Model • School level ICCs are from 0.10 to 0.25 • This means that (σW2) is between 9 and 3 times as large as (σB2) • So why does (σB2) have such a big effect on the variance of the mean? Hedges, IES/NCER 2010, Summer Institute on RCT’s
Precision of Estimates Depends on the Sampling Model • Because averaging (independent things) reduces variance • The variance of the mean of a sample of m clusters of size n can be written as • The cluster effects are only averaged over the number of clusters Hedges, IES/NCER 2010, Summer Institute on RCT’s