380 likes | 731 Vues
Causal Inference with Multiple Treatments. Yeying Zhu Department of Statistics and QuaSSI Predoctoral Fellow. Problem. In reality, lots of observational studies often have more than two treatment groups Example:
E N D
Causal Inference with Multiple Treatments Yeying Zhu Department of Statistics and QuaSSI Predoctoral Fellow
Problem • In reality, lots of observational studies often have more than two treatment groups • Example: • A survival study about intrahepatic cholangiocarcinomas (IHC) cancer [Shinohara et al., 2008] • The data is collected from SEER database • Four treatments: No treatment (Control), Radiation only, Surgery only, Surgery and radiation • Potential Confounders: Age, Race/ethnicity, Stage of the cancer and Year of diagnosis • Traditional way: “No treatment” to “Radiation only” “Surgery only” to “Surgery and radiation” Answer: whether the use of radiation can improve the survival rate • How do we compare “Surgery only” to “No treatment” or “Surgery only” to the rest of the three treatments?
Outline • Review of Causal Inference with Binary Treatments • Rubin’s Causal Model • Assumptions • Propensity Scores • Estimations of Average Causal Effect • Causal Inference with Multiple Treatments • Parameters of Interest • Propensity Scores: definition and modeling • Estimations • Application
Rubin’s Causal Model • In an observational study: • Treatment assignment is usually not randomized; • Causal model draws inferences about the possible effect of a treatment on subjects; • A subject either receives a treatment or not. The subject gets a hypothetical (potential) outcome for both states, and the causal effect is defined as difference of these two potential outcomes; • For a subject, only one of the two potential outcomes is observed;
Diagram • Observational studies: X: pre-treatment variables that may jointly influence Treatment and Outcome; treatment and non-treatment groups are systematically different with respect to X. Treatment (T) Outcome (Y) Confounder (X)
Notation • Ti = treatment for individual i (1=treated, 0=control / untreated) Yi = the observed outcome Yi (0)= the potential outcome if untreated Yi (1)= the potential outcome if treated Di= Yi (1)- Yi (0): causal effect for subject i • The relationship between the observed outcome and the potential outcome: Yi=Ti*Yi (1)+ (1-Ti)* Yi (0)
Causal Inference • The average causal effect: ACE = E[Y (1) - Y (0)] • The average causal effect among the treated: ACET = E[Y (1)-Y (0)| T = 1].
Assumptions • Assumption 1: Common Support Condition (CSC) 0< P (Ti = 1)<1 • If P (Ti = 1) = 0 or P (Ti = 0) = 0, then it's not meaningful to speak of a causal effect for that individual. • Assumption 2: Stable Unit Treatment Value Assumption (SUTVA) • Treatment applied to one subject does not affect the outcome of any other subject; • There is no competition for resources • Assumption 3: Strong Ignorability of Treatment Assignment T {Y (0), Y (1)} | X • In a randomized study, T {Y(0), Y(1)}; • X contains all potential confounders; no unmeasured confounders left • The assumption can not be verified in reality; becomes more plausible as the set of covariates in X grows larger.
Estimation • For a randomized study, T {Y(0), Y(1)}. Therefore, is an unbiased estimator of E[Y(1)] and, is an unbiased estimator of E[Y(0)] • ACE is estimated as --- • However, for observational studies, the above properties do not hold. How can we estimate ACE and ACET in observational studies ??
Propensity Scores • Definition: Given X, the probability of receiving the treatment e (X) = P(T = 1|X) • Rosenbaum andRubin (1983): T {Y (0), Y (1)} | X implies T {Y (0), Y (1)} | e (X) bear in mind: at any value of a propensity score, the difference between the treatment and control means is an unbiased estimate of the average treatment effect at the value of that propensity score
Draw Causal Inference • Estimation of average causal effect based on strong ignorability assumption • Matching • Stratification • Inverse Probability Weighting • Tchernis, R., Horvitz-Lennon, M. and Normand, S.-L. T. (2005): Tutorial in biostatistics propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group.
Matching • Matching: • Select control/ untreated subjects who are ‘matched’ with the treated subjects on background covariates (X) • The matched subjects represent a ‘quasi-randomized’ sample • However, “curse of dimensionality” • Propensity score matching: • Select control/ untreated subjects who are ‘matched’ with the treated subjects on propensity scores (e (X)) • Allowing an investigator to match on a single scalar.
Nearest available matching on the estimated propensity score: • Randomly order the treated and control subjects • Select the first treated subject and finding the control subject with closest propensity score • Both subjects are removed from the remaining sample • Repeat 2 & 3 until all treated subjects are matched. The matched pairs form the new sample.
Stratification • Stratification • Divide the sample into K groups and in each group, the propensity scores are close to each other • Combine the treatment group and the control group, and use the quantiles of propensity scores to determine the cut-offs for different strata • Within each stratum, the ACE is estimated as the difference in the average of outcomes for the treatment group and the control group • Rosenbaum and Rubin (1984): five strata removes 90 per cent of the bias in each of these covariates.
Inverse Probability Weighting • To account for unequal probabilities of inclusion in a study, and to obtain an unbiased estimator of the causal effect: • When estimating ACE, the treated person receives a weight of 1/e(X) while the control person receives a weight of 1/(1-e(X)). • When the objective is to estimate ACET, the treated person receives a weight of 1 while the control person receives a weight of e(X)/(1-e(X)). • The estimated average causal effect (ACE) is:
Causal Inference with Multiple Treatments • In reality, lots of observational studies often have more than two treatment groups • Compared to causal inference with binary treatments, • Definition of “treatment effect” is more complicated • Modeling of the treatment assignment is not so straightforward • Multinomial models • Account for the similarity among different treatments
Extension of Rubin’s Causal Model • Ti = treatment for individual i • Ti , where is a set of treatments • For example, Yi = the observed outcome Yi (t)= the potential outcome if receiving treatment t Xi= pre-treatment covariates; potential confounders Di (t) = I{ Ti = t}
Parameters of Interest • The average causal effect of treatment t relative to treatment k: = E [Yi (t)-Yi (k)]; • The average causal effect of treatment t relative to treatment k among those who receive treatment t: = E [Yi (t) –Yi (k) |Di (t)=1]; • The average causal effect of treatment t relative to all other treatments among those who receive treatment t: = E [Yi (t) –Yi (t)| Di (t)=1];
Ignorability Assumption • Weak Ignorability of Treatment Assignment: D (t) Y (t) | X Strong Ignorability of Treatment Assignment: T Y (t) | X • Redefine propensity score as: Pt (X) = p (D (t) = 1|X) • Imbens (2000) shows: D (t) Y (t) | X implies D (t) Y (t)| pt (X)
Propensity Score Modeling • Tchernis et al. (2005): On the use of discrete choice models for causal inference • Multinomial logit model • Nested logit model • Multinomial probit model
Multinomial Logit Model • Multinomial logit model assumes • Independence from irrelevant alternatives (IIA): • The ratio of probabilities of two treatments does not depend on information on other treatments.
Violation of IIA Assumption • Three drugs: drug A, drug B and drug C • Assume drug A and drug B are identical • The probability is: 0.25, 0.25, 0.5 and the odds ratio of drug A to drug C is 1:2 • If drug B is out of the market, the odds ratio of drug A to drug C is 1:1. • IIA is violated, and the multinomial logit model is not the correct model • Tchernis et al. (2005) shows: models that do not account for potential similarity between treatments lead to causal estimates that perform poorly when treatments are indeed correlated.
Nested Logit Model • Relaxes the IIA assumption • Divide the treatments into classes and specify the treatment assignment sequentially. In each class, the treatments can be similar • Assume there are C classes, Jc is a number of treatments in class c, and , represents the dissimilarity between treatments within class. • measures the average correlations between treatments in the same class. • If , the model reduces to multinomial logit model.
Multinomial Probit Model • A latent variable U (utility) is defined: • The covariance matrix represents the correlation among different treatments and is not restricted to a diagonal matrix. • The subject receives the treatment which yields the largest utility: • The propensity score Pt(X) is estimated as:
Estimation • The average causal effect of treatment t relative to treatment k: = E [Yi (t)-Yi (k)]; • Using the following property: • Inverse probability weighting:
Estimation • The average causal effect of treatment t relative to treatment k among those who receive treatment t: = E [Yi (t)-Yi (k)| Di (t)=1]; • Obtain Pt (X) and Pk (X) using one of the three multinomial models • Calculate Pk|k,t (X) = Pk (X)/(Pt (X) + Pk (X)) • Randomly select a subject in treatment t and remove it from the sample • Match it with a subject in treatment k whose Pk|k,t (X) is the closest. • Repeat 3&4 until all subjects in treatment t are matched. • The estimated causal effect is the difference in the average among the matched pairs:
Estimation • The average causal effect of treatment t relative to all other treatments among those who receive treatment t: = E [Yi (t)-Yi (t)| Di (t)=1]; • Obtain Pt (X) for each subject using one of the three multinomial models • Randomly select a subject in treatment t • Match it with a subject in the rest of the treatment groups whose Pt (X) is the closest. Remove the matched pair from the sample • Repeat 2&3 until all subjects in treatment t are matched.
Application • Example 2: • Lechner (2002): Program Heterogeneity and Propensity Score Matching: An Application to the Evaluation of Active Labor Market Policies • Objective: study the effect of different training programs on the unemployment rate in Zurich, Switzerland • Five treatments: No participation in any program, Basic training, Further vocational training, Employment program and Temporary wage subsidy. • Population: persons unemployed on Dec 31, 1997, aged between 25 and 55, who have not participated in a program before the end of 1997 and are not disabled. • Outcome: whether or not the person is employed in Day 461.
Methods • From the table, it is obvious that there is heterogeneity with respect to program characteristics, such as duration, as well as with respect to characteristics of participants such as skills, qualifications, employment histories among others. • Focused on estimation of by matching • Since treatments are correlated, Multinomial Probit Model is used to estimate propensity scores • Covariates: Age, Gender, Marital Status, Native language, Information about local labor office…
Reference • [1] Imai, K. and Dyk, D.A. (2004). Causal inference with general treatment regimes: generalizing the propensity score. Journal of the American Statistical Association99, 854-866. • [2] Imbens, G.W. (2000). The role of the propensity score in estimating dose-response functions. Biometrika87, 706-710. • [3] Lechner M. (2001). Identification and estimation of causal effects of multiple treatments under the conditional independence assumption. Econometric Evaluation of Labour Market Policies, Physica, Springer: Heidelberg, 2001; 43-58. • [4] Lechner M. (2001). Program heterogeneity and propensity score matching: an application to the evaluation of active labor market policies. The review of economics and statistics84, 205-220. • [5] Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika70, 41-55. • [6] Ralph B. and D'agostino, Jr. (1998). Tutorial in biostatistics propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Statist. Med.17, 2265-2281. • [7] Tchernis, R., Horvitz-Lennon, M. and Normand, S.-L. T. (2005). On the use of discrete choice models for causal inference. Statistics in medicine24, 2197-2212.