150 likes | 267 Vues
This paper explores estimating causal effects using observational data and highlights the challenges involved when treatment assignment is not random. We define causality operationally and present counterfactuals to establish mean treatment effects. Under specific assumptions regarding covariates, we show that mean treatment effects can be estimated from observed data. We introduce a semiparametric approach, utilizing shift functions and structural nested models to better assess treatment effects. The work emphasizes the importance of satisfying assumptions to draw valid conclusions in observational studies.
E N D
Estimating a causal effect using observational data Aad van der Vaart Afdeling Wiskunde, Vrije Universiteit Amsterdam Joint with Jamie Robins, Judith Lok, Richard Gill
CAUSALITY Operational Definition: If individuals are randomly assigned to a treatment and control group, and the groups differ significantly after treatment, then the treatment causes the difference We want to apply this definition with observational data
Counter factuals treatment indicator A {0,1} outcome Y Given observations (A, Y) for a sample of individuals, mean treatment effect might be defined as E( Y | A=1 ) – E( Y | A=0 ) However, if treatment is not randomly assigned this is NOT what we want to know
Counter factuals (2) treatment indicator A {0,1} outcome Y outcome Y1 if individual had been treated outcome Y0 if individual had not been treated mean treatment effect E Y1 – E Y0 Unfortunately, we observe only one of Y1and Y0, namely: Y= YA
Counter factuals (3) ASSUMPTION: there exists a measured covariate Z with A (Y0, Y1 ) given Z Under ASSUMPTION: E Y1 – E Y0 = {E (Y | A=1, Z=z) - E (Y | A=1, Z=z) } dPZ(z) CONSEQUENCE: underASSUMPTION the mean treatment effect is estimable from the observed data (Y,Z,A) ASSUMPTION is more likely to hold if Z is “bigger” means “are statistically independent”
Longitudinal Data times: t0 < t1 < . . . . < tK treatments: a = (a0, a1, . . . , aK ) observed treatments: A = (A0, A1, . . . , AK ) counterfactual outcomes: Ya observed outcome: YA We are interested inE Yafor certaina
Longitudinal Data (2) times: t0 < t1 < . . . . < tK treatments: a = (a0, a1, . . . , aK ) observed treatments: A = (A0, A1, . . . , AK ) observed covariates: Z = (Z0, Z1, . . . , ZK ) ASSUMPTION: Ya Ak given ( Zk , Ak-1 ), for all k Under ASSUMPTION E Ya can be expressed in the distribution of the observed data (Y, Z, A ) “It is the task of an epidemiologist to collect enough information so that ASSUMPTION is satisfied”
Estimation and Testing • Under ASSUMPTION it is possible, in principle • to test whether treatment has effect • to estimate the mean counterfactual treatment effects A standard statistical approach would be to model and estimate all unknowns. However there are too many. We look for a “semiparametric approach” instead.
Shift function The quantile-distribution shift function is the (only monotone) function that transforms a variable “distributionally” into another variable. It is convenient to model a change in distribution.
treatment until time k outcome of this treatment shift map corresponding to these distributions, transforms into IDEA: modelby a parameter and estimate it Structural Nested Models
treatment until time k outcome of this treatment transforms into negative effect no effect tk-1 tk time Structural Nested Models (2) positive effect no effect negative effect
Under ASSUMPTION: • is distributed as • Make regression model for • Make model for g • Add as explanatory variable • Estimate g by the value such that does NOT add explanatory value. Estimation
Estimation (2) Example: if treatment A is binary, then we might use a logistic regression model We estimate (a , b, d ) by standard software for given g. The “true” g is the one such that the estimated d is zero. We can also test whether treatment has an effect at all by testing H0: d=0 in this model with Y instead of Yg .
End Lok, Gill, van der Vaart, Robins, 2004, Estimating the causal effect of a time-varying treatment on time-to-event using structural nested failure time models Lok, 2001 Statistical modelling of causal effects in time Proefschrift, Vrije Universiteit