Motivation

What would it take to Change your Inference? Quantifying the Discourse about Causal Inferences in the Social Sciences Kenneth A. Frank Help from Yun-jiaLo and Mike Seltzer AERA workshop April 4, 2014 (AERA on-line video – cost is $95) Motivation Statistical inferences are often challenged because of uncontrolled bias. There may be bias due to uncontrolled confounding variables non-random selection into a sample. We will answer the question about what it would take to change an inference by formalizing the sources of bias and quantifying the discourse about causal inferences in terms of those sources. For example, we will transform challenges such as “But the inference of a treatment effect might not be valid because of pre-existing differences between the treatment groups” to questions such as “How much bias must there have been due to uncontrolled pre-existing differences to make the inference invalid?” Approaches In part I we will use Rubin’s causal model to interpret how much bias there must be to invalidate an inference in terms of replacing observed cases with counterfactual cases or cases from an unsampled population. In part II, we will quantify the robustness of causal inferences in terms of correlations associated with unobserved variables or in unsampledpopulations. Calculations for bivariate and multivariate analysis will be presented in the spreadsheet for calculating indices [KonFound-it!] with some links to SPSS, SAS, and Stata. Format The format will be a mixture of presentation, individual exploration, and group work. Participants may include graduate students and professors, although all must be comfortable with basic regression and multiple regression. Participants should bring their own laptop, or be willing to work with another student who has a laptop. Participants may choose to bring to the course an example of an inference from a published study or their own work, as well as data analyses they are currently conducting. Materials to download: spreadsheet for calculating indices [KonFound-it!] powerpoint with examples and calculations

overview Replacement Cases Framework (40 minutes to reflection) Thresholds for inference and % bias to invalidate and inference The counterfactual paradigm Application to concerns about non-random assignment to treatments Application to concerns about non-random sample Reflection (10 minutes) Examples of replacement framework Internal validity example: Effect of kindergarten retention on achievement (40 minutes to break) External validity example: effect of Open Court curriculum on achievement Review and Reflection Extensions Extensions of the framework Exercise and break (20 minutes: Yun-jia Lo, Mike Seltzer support) Correlational Framework (25 minutes to exercise) How regression works Impact of a Confounding variable Internal validity: Impact necessary to invalidate an inference Example: Effect of kindergarten retention on achievement Exercise(25 minutes: Mike Seltzer supports on-line) External validity (30 minutes) combining estimates from different populations example: effect of Open Court curriculum on achievement Conclusion (10 minutes)

Materials for course https://www.msu.edu/~kenfrank/research.htm#causal

Quick Survey Can you make a causal inference from an observational study?

Answer: Quantifying the Discourse Can you make a causal inference from an observational study? Of course you can. You just might be wrong. It’s causal inference, not determinism. But what would it take for the inference to be wrong?

I: Replacement of Cases Framework How much bias must there be to invalidate an inference? Concerns about Internal Validity • What percentage of cases would you have to replace with counterfactual cases (with zero effect) to invalidate the inference? Concerns about External Validity • What percentage of cases would you have to replace with cases from an unsampled population (with zero effect) to invalidate the inference?

What Would It Take to Change an Inference? Using Rubin’s Causal Model to Interpret the Robustness of Causal Inferences Abstract We contribute to debate about causal inferences in educational research in two ways. First, we quantify how much bias there must be in an estimate to invalidate an inference. Second, we utilize Rubin’s causal model (RCM) to interpret the bias necessary to invalidate an inference in terms of sample replacement. We apply our analysis to an inference of a positive effect of Open Court Curriculum on reading achievement from a randomized experiment, and an inference of a negative effect of kindergarten retention on reading achievement from an observational study. We consider details of our framework, and then discuss how our approach informs judgment of inference relative to study design. We conclude with implications for scientific discourse. Keywords: causal inference; Rubin’s causal model; sensitivity analysis; observational studies Frank, K.A., Maroulis, S., Duong, M., and Kelcey, B. 2013. What would it take to Change an Inference?: Using Rubin’s Causal Model to Interpret the Robustness of Causal Inferences. Education, Evaluation and Policy Analysis. Vol 35: 437-460.http://epa.sagepub.com/content/early/recent

% bias necessary to invalidate the inference { }

Quantifying the Discourse: Formalizing Bias Necessary to Invalidate an Inference δ =a population effect, =the estimated effect, and δ# =the threshold for making an inference An inference is invalid if: > δ # > δ. (1) An inference is invalid if the estimate is greater than the threshold while the population value is less than the threshold. Defining bias as -δ, (1) implies an estimate is invalid if and only if: Expressed as a proportion of the estimate, inference invalid if:

% bias necessary to invalidate the inference { } δ#

Interpretation of % Bias to Invalidate an Inference % Bias is intuitive Relates to how we think about statistical significance Better than “highly significant” or “barely significant” But need a framework for interpreting

Framework for Interpreting % Bias to Invalidate an Inference: Rubin’s Causal Model and the Counterfactual • I have a headache • I take an aspirin (treatment) • My headache goes away (outcome) Q) Is it because I took the aspirin? • We’ll never know – it is counterfactual – for the individual This is the Fundamental Problem of Causal Inference

Definition of Replacement Cases as Counterfactual: Potential Outcomes Definition of treatment effect for individual i: Fundamental problem of causal inference is that we cannot simultaneously observe

Fundamental Problem of Inference and Approximating the Counterfactual with Observed Data (Internal Validity) But how well does the observed data approximate the counterfactual? 6? 6? 6? 9 10 11 3 4 5

Symbolic: Fundamental Problem of Inference and Approximating the Counterfactual with Observed Data (Internal Validity) But how well does the observed data approximate the counterfactual? 6? 6? 6? Yt|X=t Yc|X=t Yc|X=c Yt|X=c

Approximating the Counterfactual with Observed Data But how well does the observed data approximate the counterfactual? Difference between counterfactual values and observed values for the control implies the treatment effect of 1 8 9 10 1 1 1 3 4 5 6 9 is overestimated as 6 using observed control cases with mean of 4

Using the Counterfactual to Interpret % Bias to Invalidate the Inference How many cases would you have to replace with zero effect counterfactuals to change the inference? Assume threshold is 4 (δ# =4): 1- δ#/ =1-4/6=.33 =(1/3) 6 6 6 0 0 0 9 10 11 3 4 5 6.00 4 6 The inference would be invalid if you replaced 33% (or 1 case) with counterfactuals for which there was no treatment effect. New estimate=(1-% replaced) +%replaced(no effect)= (1-%replaced) =(1-.33)6=.66(6)=4

Which Cases to Replace Think of it as an expectation: if you randomly replaced 1 cases, and repeated 1,000 times, on average the new estimate would be 4 Assumes constant treatment effect conditioning on covariates and interactions already in the model

% bias necessary to invalidate the inference { } δ# To invalidate the inference, replace 33% of cases with counterfactual data with zero effect

Fundamental Problem of Inference and Approximating the Unsampled Population with Observed Data (External Validity) How many cases would you have to replace with cases with zero effect to change the inference? Assume threshold is: δ# =4: 1- δ#/ =1-4/6=.33 =(1/3) 9 10 Yt|Z=p 11 3 4 Yc|Z=p 5 6 6 6 6 6 Yt|Z=p´ 6 Yc|Z=p´ 0 6 4

% bias necessary to invalidate the inference { } δ# To invalidate the inference, replace 33% of cases with cases from unsampled population data with zero effect

Review & Reflection Review of Framework Pragmatism  thresholds How much does an estimate exceed the threshold  % bias to invalidate the inference Interpretation: Rubin’s causal model • internal validity: % bias to invalidate number of cases that must be replaced with counterfactual cases (for which there is no effect) • external validity: % bias to invalidate  number of cases that must be replaced with unobserved population (for which there is no effect) Reflect Which part is most confusing to you? Is there more than one interpretation? Discuss with a partner or two

Example of Internal Validity from Observational Study : The Effect of Kindergarten Retention on Reading and Math Achievement(Hong and Raudenbush 2005) 1. What is the average effect of kindergarten retention policy? (Example used here) Should we expect to see a change in children’s average learning outcomes if a school changes its retention policy? Propensity based questions (not explored here) 2. What is the average impact of a school’s retention policy on children who would be promoted if the policy were adopted? Use principal stratification. Hong, G. and Raudenbush, S. (2005). Effects of Kindergarten Retention Policy on Children’s Cognitive Growth in Reading and Mathematics. Educational Evaluation and Policy Analysis. Vol. 27, No. 3, pp. 205–224

Data • Early Childhood Longitudinal Study Kindergarten cohort (ECLSK) • US National Center for Education Statistics (NCES). • Nationally representative • Kindergarten and 1st grade • observed Fall 1998, Spring 1998, Spring 1999 • Student • background and educational experiences • Math and reading achievement (dependent variable) • experience in class • Parenting information and style • Teacher assessment of student • School conditions • Analytic sample (1,080 schools that do retain some children) • 471 kindergarten retainees • 10,255 promoted students

Effect of Retention on Reading Scores(Hong and Raudenbush)

Possible Confounding Variables(note they controlled for these) • Gender • Two Parent Household • Poverty • Mother’s level of Education (especially relevant for reading achievement) • Extensive pretests • measured in the Spring of 1999 (at the beginning of the second year of school) • standardized measures of reading ability, math ability, and general knowledge; • indirect assessments of literature, math and general knowledge that include aspects of a child’s process as well as product; • teacher’s rating of the child’s skills in language, math, and science

Calculating the % Bias to Invalidate the Inference:Obtain spreadsheet From https://www.msu.edu/~kenfrank/research.htm#causal Choose spreadsheet for calculating indices spreadsheet for calculating indices [KonFound-it!] Access spreadsheet

Calculating % Bias to Invalidate an Inference Choose % bias to invalidate

Obtain t critical, estimated effect and standard error Estimated effect ( ) = -9.01 Standard error=.68 n=7168+471=7639; df > 500, t critical=-1.96 From: Hong, G. and Raudenbush, S. (2005). Effects of Kindergarten Retention Policy on Children’s Cognitive Growth in Reading and Mathematics. Educational Evaluation and Policy Analysis. Vol. 27, No. 3, pp. 205–224

Calculating the % Bias to Invalidate the Inference:Entering Values and Calculating =the estimated effect = -9.01 standard error =.68 t critical= -1.96 δ# =the threshold for making an inference = se x tcritical, df>230 = .68 x -1.96=-1.33 [user can specify alternative threshold] % Bias necessary to invalidate inference = 1-δ#/ =1-1.33/-9.01=85% 85% of the estimate must be due to bias to invalidate the inference. }

Using the Counterfactual to Interpret % Bias to Invalidate the Inference How many cases would you have to replace with zero effect counterfactuals to change the inference? Assume threshold is 4 (δ# =4): 1- δ#/ =1-4/6=.33 =(1/3) 6 6 6 0 0 0 9 10 11 3 4 5 6.00 4 6 The inference would be invalid if you replaced 33% (or 1 case) with counterfactuals for which there was no treatment effect. New estimate=(1-% replaced) +%replaced(no effect)= (1-%replaced) =(1-.33)6=.66(6)=4

Example Replacement of Cases with Counterfactual Data to Invalidate Inference of an Effect of Kindergarten Retention Counterfactual: No effect Retained Promoted Original cases that were not replaced Replacement counterfactual cases with zero effect Original distribution

Interpretation 1) Consider test scores of a set of children who were retained that are considerably lower (9 points) than others who were candidates for retention but who were in fact promoted. No doubt some of the difference is due to advantages the comparable others had before being promoted. But now to believe that retention did not have an effect one must believe that 85% of those comparable others would have enjoyed most (7.2) of their advantages whether or not they had been retained. This is even after controlling for differences on pretests, mother’s education, etc. 2) The replacement cases would come from the counterfactual condition for the observed outcomes. That is, 85% of the observed potential outcomes must be unexchangeable with the unobserved counterfactual potential outcomes such that it is necessary to replace those 85% with the counterfactual potential outcomes to make an inference in this sample. Note that this replacement must occur even after observed cases have been conditioned on background characteristics, school membership, and pretests used to define comparable groups. 3) Could 85% of the children have manifest an effect because of unadjusted differences (even after controlling for prior achievement, motivation and background) rather than retention itself?

Evaluation of % Bias Necessary to Invalidate Inference Compare Bias Necessary to Invalidate Inference with Bias Accounted for by Background Characteristics 1% of estimated effect accounted for by background characteristics (including mother’s education), once controlling for pretests More than 85 times more unmeasured bias necessary to invalidate the inference Compare with % Bias necessary to invalidate inference in other studies Use correlation metric • Adjusts for differences in scale

% Bias Necessary to Invalidate Inference based on Correlationto Compare across Studies t taken from HLM: =-9.01/.68=-13.25 n is the sample size q is the number of parameters estimated Where t is critical value for df>200 % bias to invalidate inference=1-.022/.150=85% Accounts for changes in regression coefficient and standard error Because t(r)=t(β)

Calculating % Bias to Invalidate in terms of Correlations to Compare Across Studies Choose impact and replacement

Compare with Bias other Observational Studies

% Bias to Invalidate Inference for observational studieson-line EEPA July 24-Nov 15 2012 Kindergarten retention effect

Exercise 1 : % Bias necessary to Invalidate an Inference • Take an example from an observational study in your own data or an article • Calculate the % bias necessary to invalidate the inference • Interpret the % bias in terms of sample replacement • What are the possible sources of bias? • Would they all work in the same direction? • Debate your inference with a partner

Application to Randomized Experiment: Effect of Open Court Curriculum on Reading Achievement • Open Court “scripted” curriculum versus business as usual • 917 elementary students in 49 classrooms • Comparisons within grade and school • Outcome Measure: Terra Nova comprehensive reading score Borman, G. D., Dowling, N. M., and Schneck, C. (2008). A multi-site cluster randomized field trial of Open Court Reading. Educational Evaluation and Policy Analysis, 30(4), 389-407.

Value of Randomization Few differences between groups But done at classroom level Teachers might talk to each other School level is expensive (Slavin, 2009)

n=27+22=49

Differences between Open Court and Business as Usual Difference across grades: about 10 units 7.95 using statistical model “statistically significant” unlikely (probability < 5%) to have occurred by chance alone if there were really no differences in the population But is the Inference about Open Court valid in other contexts?

Obtaining # parameters estimated, t critical, estimated effect and standard error 3 parameters estimated, Df=n of classrooms- # of parameters estimated= 49-3=46. t critical = t.05, df=46=2.013 Estimated effect ( ) = 7.95 Standard error=1.83

Quantifying the Discourse for Borman et al:What would it take to change the inference? δ =a population effect, =the estimated effect = 7.95, and δ # =the threshold for making an inference = se x tcritical, df=46 =1.83 x 2.013=3.68 % Bias necessary to invalidate inference = 1- δ #/ =1-3.68/7.95=54% 54% of the estimate must be due to bias to invalidate the inference

Calculating the % Bias to Invalidate the Inference:Entering Values and Calculating =the estimated effect = 7.95 standard error =1.83 t critical= 2.013 δ# =the threshold for making an inference = se x tcritical, df=46 = 1.83 x 2.013=3.68 [user can override to specify threshold] % Bias necessary to invalidate inference = 1-d#/d =1-3.68/7.95=54% 54% of the estimate must be due to bias to invalidate the inference.

% Exceeding Threshold for Open Court Estimated Effect } 54 % above threshold=1-3.68/7.95=.54 δ#=3.68 54% of the estimate must be due to bias to invalidate the inference

Motivation

Motivation

Presentation Transcript

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation

Motivation