1 / 46

Generating Plausible Causal Hypotheses

Generating Plausible Causal Hypotheses. By Larry V. Hedges Northwestern University. Presented at the 2010 IES Research Conference . Goals. Provide a brief introduction to causal inference Explain why experiments provide model free estimates of causal effects

holden
Télécharger la présentation

Generating Plausible Causal Hypotheses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generating Plausible Causal Hypotheses ByLarry V. HedgesNorthwestern University Presented at the 2010 IES Research Conference

  2. Goals Provide a brief introduction to causal inference Explain why experiments provide model free estimates of causal effects Examine the possibility of causal inference from a few quasi-experimental designs -Assignment based on a covariate -Regression discontinuity design -Nonequivalent control group design Examine the difference in differences approach in more detail

  3. What is Causal Inference? We all think we know what we mean by cause and effect But a formal treatment is useful It turns out that there are several treatments of cause and effect The modern statistical approach is often called the Rubin-Holland-Rosenbaum model (But its roots go back as far as Neyman, 1923)

  4. The Rubin Holland Model Key concepts Units (e.g., individuals) Treatments (e.g., 0, 1) Responses (e.g., r0, r1) ri0 the response of unit i if it got treatment 0 ri1 the response of unit i if it got treatment 1 Causal effect of treatment 1 versus 0 on unit i τi = ri1 – ri0

  5. The Rubin Holland Model The definition of the causal effect of treatment 1 versus 0 on unit i τi = ri1 – ri0 • This is a relative definition: The effect of treatment 1 compared to treatment 0 • This is a counterfactual definition, you can’t observe both ri0 and ri1 • The (relative) causal effect of a treatment on a single unit cannot be estimated without additional assumptions (Although with additional assumptions single subject designs attempt to do so)

  6. Causal Inference and Missing Data Note that causal inference is a missing data problem You cannot observe both ri0 and ri1—one of them is always missing Not surprisingly, modern ideas for causal inference sometimes draw on modern ideas for handling missing data Missing data methods try to find conditions that reduce the missing data to be (conditionally) as if “random sampling” Methods for causal inference try to find conditions that reduce the missing data to be (conditionally) “as if” random assignment We will discuss some of these later

  7. The Rubin Holland Model Example Note that we assume that both ri0 and ri1 are known for the purposes of illustration

  8. The Rubin Holland Model Example Any particular experiment would assign some units to treatment, others to control, so some ri0’s would be observed, some ri0’s would be observed

  9. The Rubin Holland Model Example Each possible experiment would get a different average treatment effect but the average over all possible assignments would be the average treatment effect

  10. The Rubin Holland Model Example Note that assigning the best treatment to a unit does not give an unbiased estimate of the average treatment effect

  11. The Rubin Holland Model Randomized experiments Define the assignment variable Z via Z= 0 if a unit gets control and Z= 1 if a unit gets treatment Random assignment means that Therefore (r0, r1) is independent of Z (assignment)

  12. The Rubin Holland Model Randomized experiments give model free estimates of the average (relative) causal effect of a treatment Why? Because, independence of Z (assignment) and (r0, r1) implies .

  13. The Rubin Holland Model This is all very simple But this is deceptive I have already embedded assumptions into the model (as had Rubin, 1974) Why are there only 2 possible outcomes? What if the treatment I get affects your response to treatment? This assumption is called “no interference between units” (e.g., Cox, 1958) or the stable unit treatment value (SUTV) (e.g., Rubin, assumption

  14. The Rubin Holland Model SUTV can be wrong! Consider response to vaccines The response to the smallpox vaccine (or not) depends on who else is vaccinated This is how eradication is possible Consider classrooms or schools where social interaction is possible (indeed probable) Contamination is a violation of SUTV

  15. The Rubin Holland Model Some associations cannot be causal Suppose one of ri0 or ri1 does not exist • Some individuals would never accept treatment (refusers) • Some individuals would always get treatment (always takers) • Some individuals would always do the opposite of what they were assigned (defiers) This leads to the concept of compliers and complier average treatment effect

  16. The Rubin Holland Model On a more philosophical level, not all “what if” questions have causal answers The idea of a randomized experiment helps clarify what effects might be causal If you cannot imagine an experiment that assigns the treatments being compared, it may not be sensible to talk of causal effects It may not be sensible to talk of sex differences as causal effects But, it might be sensible to talk of gender (social) differences causal effects

  17. The Rubin Holland Model Similarly, it may not make sense to talk about causal effects of treatments on • Never takers • Always takers • Defiers It makes sense to explicitly limit the scope of our attempts at causal inference to the compliers

  18. Scope of Causal Inference Randomized experiments give model-free estimates of average causal effects Is there any other way to get them? No other model-free methods are known Many other methods can give estimates of causal effects given that a model is true The key problem with these methods is that the model must be assumed to be true, and the model assumptions are often difficult or impossible to verify But such methods are useful when experiments cannot be done or to suggest plausible causal hypothese

  19. Estimating Treatment Effects Consider treatment assignment (dummy variable) Z and outcome Y Regress Y on Z Yi = β0 + β1 Zi + εi The estimate of β1 is just the difference between the mean Y for Z = 1 (the treatment group) and the mean Y for Z = 0 (the control group) Thus the OLS estimate is = β1 +

  20. Estimating Treatment Effects(With Random Assignment) If the treatment is randomly assigned, thenZ is uncorrelated with ε(X is exogenous) If X is uncorrelated with εif and only if But if , then the mean difference is = β1 + = β1 This implies that standard methods (OLS) give an unbiased estimate of β1, which is the average treatment effect That is, the treatment-control mean difference is an unbiased estimate of β1,

  21. What goes wrong without randomization?(Simple Case) If we do not have randomization, there is no guarantee that Z is uncorrelated with ε(Z may be endogenous) Thus the OLS estimate is still = β1 + If Z is correlated with ε, then Hence does not estimate β1, but some other quantity that depends on the correlation of Z and ε If Z is correlated with ε, then standard methods give a biased estimate of β1

  22. Instrumental Variables One way to see this is in terms of two regression equations Yi = β0 + β1Zi + εi Zi = γ0 + γ1Xi + ηi Note that, in this model Z is endogenous (may be correlated with ε) The instrumental variables model requires that: 1. γ1 ≠ 0 so that X predicts Z, and 2. X uncorrelated with ε (X is exogenous) [Cov{ε, X} = 0]

  23. Estimating Causal Effects (IV Studies) Angrist, Imbens, & Rubin (1996) showed that IV can estimate average causal effects of Z on Y, if the following assumptions hold: • SUTVA • Random assignment of X • Exclusion restriction (exogeneity of X) • Nonzero causal effect of X on Z • Monotonicity (no defiers) Then the IV estimate is an estimate of the average treatment effect for those who comply with assignment

  24. Assignment by Covariate Value Let X be a covariate and x be the value of X Suppose that units with the same X value are randomly assigned with probability π(x), where 0 < π(x) < 1 Thus Conditional independence of Z (assignment) and (r0, r1) given X implies Thus the experiment estimates the conditional causal effect given X

  25. Assignment by Covariate Value The conditional causal effect of treatment τ(x) might be called the local average treatment effect at X = x The weighted average of local average treatment effects estimates the average causal effect of treatment Note that the overall treatment-control mean difference (even controlling for X) does not necessarily estimate the average causal effect of treatment, because there may be more

  26. Regression Discontinuity Designs Regression discontinuity designs (RDD) assign to treatment by covariate value, but assign all units with X > c to treatment but violate the principle that 0 < π(x) < 1 However, RDDs can estimate the local average causal effect of the treatment at X = x The reason is that the RDD is a randomized experiment at the cutpoint X = c More properly, the limit as x→ c is a randomized experiment.

  27. Regression Discontinuity Designs Note that the RDD design can support estimation of causal effects, The causal effect that can be estimated, τ(c), is In other words, the causal effect (local average treatment effect) at the value X = c, which is the gap or discontinuity at X = c But not every analysis of the design estimates the causal effect Analyses that use models assuming functional form (e.g., linear regression) depend on that functional form assumption

  28. Regression Discontinuity Designs Nonparametric regression methods can, in principle, provide model-free estimates of the causal effect of treatment at X = c But these methods themselves make technical assumptions (e.g., about bandwidth, etc.) Thus estimation of treatment effects in RDD are in practice somewhat model dependent Designs with multiple cutpoints can provide estimates of treatment effects at multiple points or more externally valid average causal effects

  29. Nonequivalent Control Group Designs These designs compare a treatment group with a (non-randomized) comparison group There is a huge range of quality in these designs, ranging from pretty good to awful Often matching or adjustment for covariates (a form of pseudo-matching), or both, are used Can such designs ever provide estimates of average causal effects? Yes, but essentially never estimates that are model free

  30. Nonequivalent Control Group Designs How well they work depends on how well the analytic model captures essential features of the data This is not always possible to determine empirically If we can assume conditional independence of Z (assignment) and (r0, r1) given X or even that Then the experiment can estimate the causal effect of treatment, since .

  31. Nonequivalent Control Group Designs Note that this is the equivalent of making the treatment assignment “as if random” conditional on the covariate (or matching variable) X This is the basic strategy of matching for causal inference (e.g., Rubin, Rosenbaum, Cochran) It is also the basic strategy for inference under missing data Find covariates so that, conditional on the observed covariates, the missing data is “as if random” In missing data theory, this is called “strong ignorability”

  32. Nonequivalent Control Group Designs This is all very abstract Make it concrete by considering response functions—that is r0 or r1 as a function of covariates or other effects For example, suppose that ri0 = α + βxi + εi0 ri1 = α + τ + βxi + εi1 and that εi0 and εi1are independent of x Then it easily follows that the usual estimate of the average treatment effect is unbiased

  33. Nonequivalent Control Group Designs But suppose that the response functions are a little different ri0 = α + β0xi + εi0 ri1 = α + τ + β1xi + εi1 and that εi0 and εi1are independent of x Then it easily follows that the usual estimate of the treatment effect is where is an “average” of β0 and β1

  34. Nonequivalent Control Group Designs The analysis could be “fixed up” to remove the bias if we knew the response function But that is exactly the point To get an unbiased estimate of the causal effect, you have to know the right model, so analyses will be model dependent It is not easy (maybe impossible) to know what the right model is Moreover, I choose a very simple model (homogeneous treatment effects with responses a linear function of the observed covariates)

  35. Differences in Differences The difference in differences idea can be seen as a particular kind of nonequivalent control group design It is frequently used to evaluate the effects of policies in education and elsewhere Assume that there is a series of longitudinal observations in locations (e.g., states) where a policy has been implemented at some time in some locations Crudely, we estimate the effect of a policy by comparing • the difference in outcome before and after the policy is implemented for individuals affected by the policy, compared to • the difference for individuals unaffected by the policy That is why it is a difference in differences estimator

  36. Differences in Differences More elaborate (and convincing) analyses control for location and time or model variation as random effects Let Yist be the outcome for individual i in location s at time t Let Xist be the corresponding individual level covariates Then the model might be Yist = αs + πt + γXist + βTst + εist where αsandπtare location and time fixed effects, is a vector of covariate effects,Tstis a dummy variable for treatment, andεistis a residual There may be clustering by location, which needs to be taken into account

  37. Differences in Differences Obviously the difference in differences estimator has great appeal Given a good longitudinal data set, it is easy to use It is simple to understand and explain to policy makers It is a natural analysis to learn from “natural experiments” where a policy has been tried some place and not others or has been tried at different times in different locations

  38. Differences in Differences This model may seem hard to formulate in causal model terms The treatment effect is identified by the difference between post-policy and pre-policy outcomes, in the treatment (got policy) group versus the control group Let ri0 and ri0 be the possible outcomes after treatment and X be the pretreatment variable This estimate is estimating It can estimate the average causal effect under several circumstances

  39. Differences in Differences This estimate is estimating It can estimate the average causal effect under some circumstances For example, if the response functions are ri0 = αi + xi + εi0 ri1 = αi + τ + xi + εi1 and that εi0 and εi1are independent of xi, then the difference in differences estimate does estimate τ, the average causal effect of treatment

  40. What Can Go Wrong? One big problem Z can be correlated with (r0 – X, r1 – X) • X can cause both the policy and be correlated with outcome • Something else can cause both X and Z • This is the general endogeneity problem

  41. What Can Go Wrong? Informal checks • Look at trends beyond the time of policy implementation • Estimate effects of treatment where there is no policy change as a check (you should see no effect) These are suggestive not definitive They can invalidate an analysis, not validate one

  42. What Can Go Wrong? One smaller problem The data often exhibit large autocorrelations, and this can lead to large underestimates of standard errors, making tests reject (far) too often There are three reasons for this: • Data are often based on long time series • Data are highly positively correlated over time • The treatment variable does not change much

  43. What Can Go Wrong? The standard error problem is difficult to solve Parametric analysis (generalized least squares with autocorrelation) can be done, but inference for autocorrelation is poor Randomization tests seem to perform well for problems like these Collapsing the data into two time periods is sometimes useful and improves performance of tests

  44. Conclusion Without randomization, causal inference is much harder and more model dependent

  45. References Abadie, A. (2000). Semiparametric Difference-in-Differences Estimators, Working Paper, Kennedy School of Government, Harvard University. Bertrand, M., Duflo, E., & Mullainathan, S. (2001). How much should we trust difference in differences estimators? MIT Department of Economics Working Paper Series 01-34. Meyer, B. (1995). Natural and Quasi-Natural Experiments in Economics, Journal of Business and Economic Statistics, 13, 151-162. Moulton, B. R. (1990). An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables in Micro Units, Review of Economics and Statistics, 72 , 334-338.

  46. References (cont.) Newey, W. & West, K. D. (1987). A Simple, Positive Semi-definite, Heteroskedasticity and Autocorrelation Consistent-Covariance Matrix,” Econometrica, 55, 703-708. Nickell, S. (1981). Biases in Dynamic Models with Fixed Effects, Econometrica, 49,1417-1426. Rosenbaum, P. (1993). Hodges-Lehmann Point Estimates of Treatment Effect in Observational Studies, Journal of the American Statistical Association, 88, 1250-1253. Rosenbaum, P. (1996). Observational Studies and Nonrandomized Experiments, In S. Ghosh and C.R.Rao, (Eds), Handbook of Statistics, 13. Solon, G. (1984). Estimating Auto-correlations in Fixed-Effects Models, NBER Technical Working Paper No. 32, 1984.

More Related