Estimating Causal Effects: Using Experimental and Observational Designs

Estimating Causal Effects: Using Experimental and Observational Designs Anna Haskins, Nick Mader, and Hilary Shager ITP Seminar October 26, 2007

Overview • What are causal effects and how can we measure them? • Randomized controlled trials—benefits and limitations • Quasi-experimental methods for estimating causal effects • Propensity score matching, regression discontinuity design, fixed effects, instrumental variables, others • Applying what we’ve learned • Benefits and limitations of using large-scale databases • Practical resources for YOU

Why focus on causal effects? • There have been many *bad* research papers produced, and we need to do a better job of conducting high quality research in education • Within the world of researchers, educators, and policymakers there is a lack of clarity regarding which analytic or methodological approaches are most appropriate for making causal inferences about the effectiveness of educational interventions

Institutional support for RCTs • IES shows clear preference for Randomized Control Trials (RCTs) • NCLB • What Works Clearinghouse • ITP grant and other gov’t funded research • NRC reports • IERI

Randomized Control Trials (RCTs) “When correctly implemented, the randomized controlled experiment is the most powerful design for detecting treatment effects…” -Schneider et al., p. 11 • Why are they so great, you ask? When implemented correctly… • Assures treatment group assignment is independent of the pretreatment characteristics of group members • Measures the effect of an intervention or “cause” by washing out every other cause (no confounding of treatment effect)

However… “My opinion about RCTs is that they are underutilized in educational research and overemphasized in political discussions.” -Gerald E. Sroufe, director of government relations for AERA

Limitations of RCTs • What you can’t learn from RCTs… • Mechanisms (how/why the treatment worked) • External validity (generalizibility) often an issue • Ignores real world selection processes • Why not always feasible? • Logistic issues • Ethical issues • Time and money constraints

In reality… • Most of us will do at least some quasi-experimental research • Quasi-experiments are comparative studies that carefully attempt to isolate the effect of an intervention through means other than randomization • There is a need for other methods to inform randomized trials • Defining relevant outcomes • Identifying promising interventions • Targeting populations of interest • Suggesting causal mechanisms

Motivation behind AERA white paper • There is an important role for quasi-experimental methods in education research • Large-scale, longitudinal databases, like those available from NCES, are excellent resources for this work • But we need to remember that we still want to strive for causal inference

Criteria for Making Causal Inferences • Causal Relativity • The effect of a cause must always be evaluated relative to another cause (causal questions ask the effectiveness of a treatment relative to some control or other treatment) • Causal Manipulation • Each participant must be potentially exposable to the causes under consideration (this excludes attributes such as race or gender as cause since these are typically not manipulable)

Criteria for Making Causal Inferences • Temporal Ordering • Exposure to a cause must occur at a specific time or within a specific time period (so that pre and post exposure measurements can be taken to determine the magnitude of the effect) • Elimination of Alternative Explanations • Alternative explanations for the relationship between possible causes (treatments) and their effects must be ruled out. This is usually done through random assignment and ensures that any outcomes between the treatment and control groups are thus attributed to differences in treatment assignment.

Methods for Observational Data • Four methods approximating RCTs using observational data + assumptions • Propensity Score Matching • Regression Discontinuity • Fixed Effects • Instrumental Variables • Control Functions (my addition)

Observed Factor Unobserved Factor The Problem in Causal Inference Confounding Influence Treatment Outcome

Observed Factor Unobserved Factor RCT Solution Confounding Influence Treatment Outcome

Propensity Score Matching • Idea • Compares outcomes of similar units where the only difference is treatment; discards the rest • Example • Low ability students will have lower future achievement, and are also likely to be retained in grade • Naïve comparison of untreated/treated students creates bias, where the untreated do better in the post period • Matching methods make the proper comparison

Observed Factor Unobserved Factor Propensity Score Matching Confounding Influence Treatment Treatment Outcome

Propensity Score Matching • Advantages • Draws inference from only proper comparisons • Focuses on population of interest • Use of propensity score solves the dimensionality problem in matching • Limitations • Cannot correct for unobserved characteristics influencing the outcome

Propensity Score Matching • Implementation • First stage: regress treatment on observables • Second stage: form individual probabilities of treatment and save observations where there is overlap • Third stage: compare outcomes of treated observations to similar non-treated observations. Less weight is given, the less the similarity (that’s all the second equation is). This can be done with “bins” or kernel functions. I.e., the weight a comparison (between treated vs. controlled units) gets in the analysis decreases as the units get less similar

Regression Discontinuity Design • Idea • Focuses on a subsample for which assignment to the treatment is random • Example • Low ability students will have lower future achievement, and are also likely to be retained in grade • Naïve comparison of untreated/treated creates bias, where the untreated do better in the post period • RDD compares the outcomes of students whose characteristics are in the neighborhood of a sharp cutoff in a retention policy (e.g., held back if scoring 49, promoted if scoring 50)

Observed Factor Unobserved Factor Regression Discontinuity Design Sample at Policy Threshold Confounding Influence Treatment Outcome

Regression Discontinuity Design • Advantages • As with RCTs, randomization is used to eliminate confounding factors • Unlike RCTs, can give priority to certain units when phasing in treatment • Limitations • Selected subsample may not be the full population of interest • Focus on select subsample reduces sample size • Need for a sharp policy assignment cutoff

Regression Discontinuity Design • Implementation • Determine trade-off between tight “bandwidth” for arguing randomness, and wide bandwidth for statistical power • We can handle “fuzzy” design • Y is the outcome, p is the probability of treatment. + indicates an outcome or prob. just above the cutoff. - indicates the same, but just below

Fixed Effects • Idea • Eliminates alternative explanations that are “fixed” across units • Example • Students with good backgrounds (family, IQ) elect to attend college, and college increases wages • If student backgrounds are not perfectly observed, there will be a residual correlation between college attendance and wages leading to a biased finding • Using FE at the family level, we can “soak up” this influence to the extent that family quality is fixed

Observed Factor Unobserved Factor Fixed Effects Solution Fixed Influences Confounding Influence Treatment Outcome

Fixed Effects • Advantages • (As with RCTs) we do not need to observe these confounding influences • Limitations • Cannot control for varying (non-fixed) influences • Family example: parents get divorced, family finances change • Data Demands: can only control for fixed influences at a level higher than the level of treatment • May reduce sample size • Can be solved with better, often longitudinal, data • Sample may no longer be representative • Bias is towards no finding. FE eliminates portions of true effect, but not noise.

Fixed Effects • Implementation notes • Only requires insertion of dummy variables at the level of the effect. In our example, a dummy variable for each family. Other examples: district level, school level, individual level • Correlation between Tjt and fj represents the trend that good (or bad) families generally take the treatment. If fj is treated as an error term, there is endogeneity of Tjt. By controlling for it, the only potential correlation is between Tjt and eijt in the sense of time-varying unobservables.

Instrumental Variables • Idea • Determines observed versus unobserved explanations for taking treatment, and only uses observed portion • Example • Students with good backgrounds (family, IQ) elect to attend college, and college increases wages • If student backgrounds are not perfectly observed, there will be a residual correlation between college attendance and wages leading to a biased finding • IV substitutes actual college attendance with college attendance predicted by observables (i.e., with unobserved factors subtracted out)

Observed Factor Unobserved Factor Instrumental Variables Confounding Influence Treatment Outcome Instrumental Variable(s)

Instrumental Variables • Advantages • Relies on trustworthy (observed) variation in treatment • Can use prior RCTs to find valid instruments, e.g., Nye et al. (2004) • Limitations • Difficult to find valid instruments • Cannot determine whether a variable is truly exogenous • Works only to the extent that the instrument is exogenous and strongly correlated with treatment

Instrumental Variables • Implementation • First stage: regress treatment* on observables and the instrument(s) • Second stage: run outcome regression, substituting treatment variable with predicted (by only observables) treatment • If the treatment is not continuous, a Heckit procedure may be more appropriate. More on this later. * In the implementation presented, the treatment should be continuous, such as “hours tutoring received” or “hours instructed with particular curriculum”. Note, however, that the logic of Instrumental Variables methods can be modified to fit any application, such as binary treatment classes.

Control Function Approach (e.g. Heckit) • Idea • Determines observed versus unobserved explanations for taking treatment, and uses this to model the confounding influence • In approach, this is very similar to IV, but plumbs information about unobserved factors • Example • Students with good backgrounds (family, IQ) elect to attend college, and college increases wages • If student backgrounds are not perfectly observed, there will be a residual correlation between college attendance and wages leading to a biased finding • Control functions study how likely a student is to obtain treatment to determine whether there’s an important unobserved influence (poor, urban minority student attending Harvard), and adjusts expectations in outcome equation

Observed Factor Unobserved Factor Control Functions Confounding Influence Treatment Outcome Factors Determining Treatment

Control Function Approach (e.g. Heckit) • Advantages • We can understand the choice process (and confirm our prior expectations) • Limitations • Parametric assumptions may be inappropriate in drawing inference on unobserved factors • Non-parametric approaches are available

Control Function Approach (e.g. Heckit) • Implementation* • First stage: regress treatment on observables and the instrument(s) • Second stage: run outcome regression, adding substituting treatment variable with predicted (by only observables) treatment * This implementation is the seminal one considered in Heckman (1979) where outcomes are observed only for units who receive treatment. The logic of this method, can be broadly applied and is similar across applications.

Return to NRC’s original questions • Is there a systematic effect? • Maybe experiments are our “best” tool to answer this question • But concerns remain… • Practicality • Access • Ethics • Timeliness • External validity

And there are two other questions… 2) What is happening? 3) Why or how is it happening? • These questions are central to the design of experiments • Also important for development of theory • Also of great interest to policy makers and educational practitioners • Large-scale database research can help us answer these questions

What are the benefits and limitations of large-scale data sets? • Benefits • Widely accessible • Wealth of contextual information and ability to consider multiple counterfactuals • Large samples allow for comparisons across sub-groups of interest • Can be linked with other datasets • Limitations • Missing data • Design/instruments often developed based on precedent rather than need

“However, even with these data, which arguably are among the best we have, the findings have not consistently yielded information that could substantially improve our schools and change the educational opportunities of students, especially those who attend high-poverty schools and whose families have limited social resources.” --Schneider et al., p. 111

AERA board recommendations • Employ decision rules to assess strength of quasi-experimental designs • Move beyond simple OLS to get at causation • E.g., see p. 113-116 of Schneider et al. or What Works Clearinghouse guidelines • Strengthen future data collection efforts • Embed RCTs within longitudinal studies • Don’t rely on precedent to develop surveys • Don’t ignore processes in favor of products

Large-scale data set resources • Data sources on campus • DISC (3308 Sewell Social Science Building) • Data sources on-line • NCES (http://nces.ed.gov/) • ICPSR (http://www.icpsr.umich.edu/) • OPR (http://opr.princeton.edu/) • High quality research collections and guidelines • WWC (http://ies.ed.gov/ncee/wwc/) • C2 (http://www.campbellcollaboration.org/) • Product from today…

A turn to the practical… • What causal question (of interest to IES…) haunts your discipline? • What database(s) might be used to answer it? • Is there any existing RCT data that might be mined? • Which quasi-experimental method(s) might be used to approach causal inference?

Examples from the policy world

References Heckman, J.J. and J.A. Smith. 1995. “Assessing the Case for Social Experiments.” The Journal of Economic Perspectives, 9(2): 85-110. Holland, P.W. 1986. “Statistics and Causal Inference.” Journal of American Statistics Association, 81: 945-970. Magnuson, K.A., Ruhm, C., and J. Waldfogel. 2007. “Does Prekindergarten Improve School Preparation and Performance?” Economics of Education Review, 26: 33-51. Morris, P., Gennetian, L., Duncan, G., and A. Huston. 2007. “How Welfare Policies Affect Child and Adolescent Development: Investigating Pathways of Influence with Experimental Data.” Presented at University of Kentucky Center for Poverty Research, 12 April. Nye, B., Konstantopoulos, S., and L.V. Hedges. 2000. “How Large Are Teacher Effects?” Educational Evaluation and Policy Analysis, 26: 237-257. Raudenbush, S.W. 2005. “Learning from Attempts to Improve Schooling: The Contribution of Methodological Diversity.” Educational Researcher, 34(5): 25-31. Schneider, B., Carnoy, M., Kilpatrick, J., Schmidt, W.H., and R.J. Shavelson. 2007. Estimating Causal Effects: Using Experimental and Observational Designs. AERA: Washington, D.C. Todd, P.E. and K.I. Wolpin. 2006. “Assessing the Impact of a School Subsidy Program in Mexico: Using a Social Experiment to Validate a Dynamic Behavioral Model of Child Schooling and Fertility.” American Economic Review, 96: 1384-1417. Viadero, D. 2007. “’Scientific’ Label in Law Stirs Debate.” Education Week, 27(8): 1, 23.

Estimating Causal Effects: Using Experimental and Observational Designs

Estimating Causal Effects: Using Experimental and Observational Designs

Presentation Transcript

Chapter 2 Engineering Costs and Cost Estimating

The Social Value of Education

Estimating Demand

Experimental Design

Group Comparison Research

Research Designs Correlational

Causal Cognition 1: learning

Experimental studies: Clinical trials, field trials, community trials, and intervention studies

3 Causal Models Part II: Counterfactual Theory and Traditional Approaches to Confounding (Bias?)

Observational Methods Part Two

Dynamic Causal Modelling for fMRI

Dynamic Causal Modelling for fMRI

Experiments and Quasi-Experiments

Use Theory

PHYS 2022: Observational Astronomy

Experiments and Observational Studies

Experimental Design and the Analysis of Variance

Single Image Haze Removal Using Dark Channel Prior

CHAPTER 3 Dams and Spillways

Observational windows of cosmological physics

An Introduction to Clinical Trials: Design Issues

Recent case law on designs