520 likes | 613 Vues
Explore methods for precise estimation of causal parameters in randomized experiments with covariates, addressing challenges and offering practical solutions. Discuss the importance of covariate adjustment and its impact on results. Learn about empirical efficiency maximization and its role in enhancing outcome analysis.
E N D
Empirical Efficiency Maximization: Locally Efficient Covariate Adjustment in Randomized Experiments Daniel B. Rubin Joint work with Mark J. van der Laan
Outline • Review adjustment in experiments. • Locally efficient estimation. Problems with standard methods. • New method addressing problems. • Abstract formulation. • Back to experiments, with simulations and numerical results. • Application to survival analysis.
Randomized Experiments • Example: Women recruited. Randomly assigned to diaphragm or no diaphragm. See if they get HIV. • Example: Men recruited. Randomly assigned to circumcision or no circumcision. See if they get HIV.
Randomized Experiments • Randomization allows causal inference. • No confounding. Differences in outcomes between treatment and control groups are due to the treatment. • Unverifiable assumptions needed for causal inference in observational studies.
Randomized Experiments with Covariates • Same setup, only now demographic or clinical measurements taken prior to randomization. • Question: With extra information, can we more precisely estimate causal parameters? • Answer: Yes. (Fisher, 1932). Subject’s covariate has information about how he or she would have responded in both arms.
Covariate Adjustment (has at least two meanings) 1: Gaining precision in randomized experiments. 2: Accounting for confounding in observational studies. This talk only deals with the first meaning.
Covariate Adjustment. • Not very difficult when covariates divide subjects into a handful of strata. • Have to think with a single continuous covariate (e.g. age). Modern studies can collect a lot of baseline information. Gene expression profile, complete medical history, biometric measurements. • Important longstanding problem, but lots of confusion. Not “solved.” Recent work by others.
Covariate Adjustment • Pocock et al. (2002) recently surveyed 50 clinical trial reports. • Of 50 reports, 36 used covariate adjustment for estimating causal parameters, and 12 emphasized adjusted over unadjusted analysis. • “Nevertheless, the statistical emphasis on covariate adjustment is quite complex and often poorly understood, and there remains confusion as to what is an appropriate statistical strategy.”
Recent Work on this Problem • Koch et al. (1998). - Modifications to ANCOVA. • Tsiatis et al. (2000, 2007a, 2007b). - Locally efficient estimation. • Moore and van der Laan (2007). - Targeted maximum likelihood. • Freedman (2007a, 2007b). - Classical methods under misspecification.
Covariate Adjustment • Can choose to ignore baseline measurements. Why might extra precision be worth it? • Narrow confidence intervals for treatment effect. • Stop trials earlier. • Subgroup analysis, having smaller sample sizes.
Locally Efficient Estimation • Primarily motivated by causal inference problems in observational studies. • Origin in Robins and Rotnitzky (1992), Robins, Rotnitzky, and Zhao (1994). • Surveyed in van der Laan and Robins (2003), Tsiatis (2006).
Locally Efficient Estimation in Randomized Experiments • Working model for treatment distribution given covariates known by design. • So what does local efficiency tell us? • Model outcome distribution given (covariates, treatment). • We’ll be asymptotically efficient if working model is correct, but still asymptotically normal otherwise. • But what does this mean if there’s no reason to believe the working model? Unadjusted estimators are also asymptotically normal. What about precision?
Empirical Efficiency Maximization • Working model for outcome distribution given (treatment, covariates) typically fit with likelihood-based methods. • Often linear, logistic, or Cox regression models. • “Factorization of likelihood” means such estimates lead to double robustness in observational studies. But such robustness is superfluous in controlled experiments. • We try to find the working model element resulting in the parameter estimate with smallest asymptotic variance.
Connection to High Dimensional or Complex Data • Suppose a high dimensional covariate is related to the outcome, and we would like to adjust for it to gain precision. • Many steps in data processing can be somewhat arbitrary (e.g. dimension reduction, smoothing, noise removal). • With cross-validation, new loss function can guide selection of tuning parameters governing this data processing.
1=Unadjusted estimator ignoring covariates. 2=Likelihood-based locally efficient. 3=Empirical efficiency maximization. 4=Efficient.
1=Unadjusted difference in means. 2=Standard likelihood-based locally efficient estimator. 3=empirical efficiency maximization. 4=efficient estimator.
Sneak Peak: Survival Analysis • Form locally efficient estimate. Working model for full data distribution now likely a proportional hazards model. • For estimating (e.g. five-year survival), will be asymptotically efficient if model correct. Otherwise still asymptotically normal. • But locally efficient estimator can be worse than Kaplan-Meier if model is wrong.
Sneak Peak: Survival Analysis. • Generated Data. Want five-year survival.
1=Kaplan-Meier. 2=Likelihood-based locally efficient. 3=Empirical efficiency maximization. 4=Efficient.
Summary • Robins and Rotnitzky’s locally efficient estimation developed for causal inference in observational studies. • In experiments, estimators can gain or lose efficiency, depending on validity of working model. • Often there might be no reason to have any credence in a working model. • A robustness result implies we can better fit working model (or select tuning parameters in data processing) with a nonstandard loss function (the squared influence curve).