960 likes | 1.11k Vues
Methods to Correct Measures of Effect for Bias due to Exposure Measurement Error Donna Spiegelman, ScD Departments of Epidemiology and Biostatistics Harvard School of Public Health, Boston, MA stdls@hsph.harvard.edu Statistical Society of Canada 2013 Introductory Overview Lecture
 
                
                E N D
Methods to Correct Measures of Effect for Bias due to Exposure Measurement Error Donna Spiegelman, ScD Departments of Epidemiology and Biostatistics Harvard School of Public Health, Boston, MA stdls@hsph.harvard.edu Statistical Society of Canada 2013 Introductory Overview Lecture May 28, 2013
In this talk, I will give a brief overview of several problems which have been addressed by myself and colleagues at HSPH, motivated by ongoing environmental and occupational epidemiologic research at HSPH and elsewhere: • Regression calibration (two versions) • Regression calibration for multiple surrogates for the same exposure • Regression calibration with heteroscedastic error • Regression calibration for main study/internal validation study designs • Regression calibration for Cox models with time-varying functions of a mis-measured exposure history • All methods will be illustrated by motivating examples in environmental and occupational epidemiology
Notation Number of participants in main study Number of participants in validation study Binary health outcome “True” exposure Surrogate exposure s perfectly measured covariates (e.g. age, race, smoking status) Measured on all participants in main and validation study Main study External validation study Internal validation study
Rosner et al. regression calibration method for MS/EVS The (Rosner et al., 1989; Rosner et al., 1990; Rosner et al., 1992) version of regression calibration for MS/EVS design: 3-step algorithm In the main study, regress Y on Z and U to obtain where now Z is a vector of mis-measured continuous covariates and U is a vector of perfectly measured covariates.
Rosner et al. regression calibration method for MS/EVS In the validation study, regress X on Z and U to obtain where is a vector of regression intercepts, is a matrix of slopes for the regression of X on Z, adjusted for U, and is a matrix of slopes for the regression of X on U, adjusted for Z.
Rosner et al. regression calibration method for MS/EVS 3. Correct estimates of effect for measurement error, by or where 0 is a matrix of 0’s and I is a identity matrix,
Rosner et al. regression calibration method for MS/EVS 4. Use multivariate delta method to derive variance, e.g. See Appendices 2 and 3 of (Rosner et al., 1990) for a derivation of the variance of , again using the multivariate delta method.
Regression calibration (Carroll et al.) Given validation or reliability data, the Carroll et al. version of the regression calibration estimator follows (when ): Sketch of Algorithm 1. Estimate and in the validation study from the regression of on or in the reliability study from the regression of on 2. Estimate in the main study.
Regression calibration (Carroll et al.) • Run usual regression model for Y on X in the main study to obtain estimates of effect adjusted for measurement error, i.e. fit model in the main study, where is a link function, e.g. identity for linear regression, log for Poisson and log-binomial regression, logit for logistic regression, probit for probit regression to obtain estimates of and that are corrected for measurement error, at least ‘approximately’. • Variance must be adjusted as well and cannot be obtained from the standard regression software. It remains to show the theoretical justification for what is thus far an ad hoc procedure, and derive the measurement-error corrected variance .
Motivation for regression calibration estimator (logistic regression model) Small measurement error justification for regression calibration estimator We use a Taylor series expansion for the likelihood of the main study data for this derivation. From the Taylor expansion,
and Hence, if is small (i.e. small measurement error), this approximation should work well. This justification was first suggested by Armstrong (Armstrong, 1985) for the more general setting of generalized linear models, of which logistic regression is one example. Armstrong B. (1985) Measurement error in the generalized linear-model. Communications in Statistics-Simulation and Computation 14:529-544. Note: This approximation is not one that improves as the sample size increases, as is typically the case in statistics.
Small measurement error What is small? (Carroll and Wand, 1991; Kuha, 1994; Neaton and Bartsch, 1992; Rosner et al., 1989) all reported based upon simulation studies of regression calibration for logistic regression that the approximation works remarkably well when is small, and Kuha suggested the value of 0.5. Multivariate version of this is given in (Carroll et al., 2006) (See 4.7.1.1 and section B.3.3). Carroll R.J., Wand M.P. (1991) Semiparametric estimation in logistic measurement error models. Journal of the Royal Statistical Society Series B-Methodological 53:573-585. Kuha J. (1994) Corrections for exposure measurement error in logistic regression models with an application to nutritional data. Stat Med 13:1135-48. Neaton J.D., Bartsch G.E. (1992) Impact of measurement error and temporal variability on the estimation of event probabilities for risk factor intervention trials. Statistics in Medicine 11:1719-1729.
Likelihood-based justification For the multivariate normal measurement error model, a.k.a. linear regression, first given by Fuller for the classical measurement error model; later, given by Spiegelman et al. for the linear measurement error model with and its multivariate extensions. Fuller, W.A. (1987) Measurement Error Models, New York, Wiley Spiegelman, D., McDermott, A. and Rosner, B. (1997). Regression calibration method for correcting measurement-error bias in nutritional epidemiology. American Journal of Clinical Nutrition 65, 179s-1186s.
The main study likelihood function The critical identity, that entirely depends on the surrogacy assumption, is used to obtain the main study likelihood in the observed data, as follows Below the following notation was used: We will first consider a primary regression model where now the outcome, , is continuous (rather than the binary outcome, , previously considered).
Derivation of for multivariate linear models. Assume (A1) and (A2)
Key result In our notation: with Y, Z, X, U all scalar variance still constant but larger than in original model without measurement error.
Key result (continued) Now that we know that the likelihood of we can see that replacing with , gives the likelihood function so the estimate of the regression slope will be consistent for as desired.
The logistic regression model (rare disease) Recall logistic regression model Under the rare disease assumption, We need the likelihood of the main study data in terms of the surrogate exposure, Z. Similarly to above in the multivariate normal setting, we integrate over x as follows
The logistic regression model (rare disease) Hence, the same regression calibration estimator is obtained as in multivariate normal case, approximately. This justification was given by (Rosner et al., 1989) and generalized to the multivariate case later (Rosner et al., 1990). Rosner, B., Spiegelman, D., and Willett, W.C. (1989) Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Statistics In Medicine 8:9, 1051-69 Rosner, B., Spiegelman, D., and Willett, W.C. (1990) Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. Am J Epidemiol132, 734-745.
Home Endotoxin Exposure and Wheeze in Infants: Correction for Bias Due to Exposure Measurement Error Nora Horick, Edie Weller, Donald K. Milton, Diane R. Gold, Ruifeng Li, and Donna Spiegelman Department of Biostatistics and Department of Environmental Health, Harvard School of Public Health, Boston, Massachusetts, USA; Channing Laboratory, Harvard Medical School, Boston, Massachusetts, USA; Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, USA Environmental Health Perspectives Volume 114, Number 1, January 2006 An example
Regression calibration for logistic regression with multiple surrogates for one exposure Edie A. Weller, Donna Spiegelman, Don Milton, Ellen Eisen Departments of Biostatistics, Epidemiology, and Environmental Health Harvard School of Public Health and Dana Farber Cancer Institute Journal of Statistical Planning and Inference, 2007; 137:449-461 • Occupational exposures often characterized by numerous factors of the workplace and work duration in a particular area ==> multiple surrogates describe one exposure. • Validation study: Personal exposure is commonly measured on a subset of the subjects and these values are then used to estimate average exposure by job or exposure zone. • No adjustment for bias or uncertainty in the exposure estimates. • Current methods typically assume that there is one surrogate for each exposure (for example, Rosner et al, 1989, 1990). • Propose adjustment method which allows for multiple surrogates for one exposure using a regression calibration approach.
Main Study • To assess the relationship between exposure to metal working fluids (MWF) and respiratory function (United Automobile Workers Union and General Motors Corporation sponsored study, Greaves et al, 1997). • Outcome here is prevalence of wheeze • Job characteristics include metal working fluid (MWF) type, plant and machine operation (grinding or not). • Assembly workers are considered the non-exposed group. • Possible confounders include age, smoking status and race.
Exposure Assessment Study (generically, the validation study) • Exposure was measured in various job zones (Woskie et al, 1994). • Intensity of exposure to MWF aerosol measured by the thoracic aerosol fraction (i.e. the sum of the two smallest size fractions measured with the personal monitors). • Full shift (8 hour) personal samples of aerosol exposure in breathing zone of automobile workers were collected in various job zones.
Assumptions • True exposure and the s-vector of covariates are related to the probability of binary outcome by the logistic function: logit where • Linear regression model is appropriate to relate the r surrogates and the s covariates to the true exposure: where • is a surrogate if , that is, knowledge of the surrogates provides no additional information if the true exposure is known. • and small, • or small
Goal: To obtain point and interval estimates of and relating exposure to outcome adjusting for the covariates Problem • Quantitative measure of exposure is not measured on all subjects. • is measured on all of the subjects • and measured on subjects. • Multiple surrogates, describe exposure Solution: An extension to two closely related approaches • Rosner, Spiegelman and Willett (RSW, 1989, 1990) • Carroll, Ruppert and Stefanski (CRS, 1995)
Procedure: Propose the following approach which follows RSW and assumes normality of and rare disease, or small (parameter of the small ME approximation): • Estimate from a logistic regression model of on and in subjects in main study • Estimate from a measurement error model among the validation study subjects using ordinary least squares regression. logit SAS PROC GENMOD or PROC LOGISTIC for step 1, PROC REG for step 2
3. Optimally combine the adjusted estimates for each surrogate where is the estimated variance-covariance matrix of SAS macro downloadable from my website to accomplish step 3; input to the macro is the output from PROC LOGISTIC and PROC REG http://www.hsph.harvard.edu/faculty/spiegelman/multsurr.html
1Estimated GLS weights are 0.857 for straight, 0.127 for synthetic, 0.15 for grinding, and 0.0001 for plant
Regression Calibration With Heteroscedastic Variance Donna Spiegelman, Roger Logan, Douglas Grove International Journal of Biostatistics: 2011 Vol. 7, Issue 1, Article 4. PMCID: PMC3404553
Derivation of estimator • Let under the rare disease assumption, and • Then,
The Procedure 1. A logistic regression model of D on X and g(X) is run in the main study to obtain and and their estimated variances 2. A weighted linear regression is run in the validation study, with weights 1/g(X), to obtain and 3. and are calculated as a function of and and efficiently combined to produce a single estimate • The asymptotically minimum variance weights and their derivation, as well as the formula for the variance of , are given in the Appendix of the manuscript.
Example: ACE study prevalence of fever average weekly chemotherapeutics exposure, self-reported on questionnaire same, from on-site diary for 1-2 weeks 104 cases, 6 in validation study Valanis et al., 1993 control for = age (years), shift work (yes/no) logit
Examples ACE study Corr 0.21 (0.26 outliers out) uncorrected 1.13 (1.03 - 1.23) 1.22 (1.04 - 1.43) 1.24 (1.05 - 1.48) 52 drugs mixed/day (90th-10th) controlling for age, shift, community hospital
Simulation study of estimators under heteroscedastic measurement error variance
A comparison of regression calibration approaches for designs with internal validation data Sally W. Thurston, Paige L. Williams, Russ Hauser, Howard Hu, Mauricio Hernandez-Avila, and Donna SpiegelmanDepartment of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, P.O. Box 630, Rochester, NY 14642, USA Department of Biostatistics, Harvard School of Public Health, USA Department of Environmental Health, Harvard School of Public Health, USA Centro de Investigaciones en SaludPoblacional, InstitutoNacional de SaludPublica, Cuernavaca, Morelos, Mexico Department of Epidemiology, Harvard School of Public Health, USA Channing Laboratory, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, US Journal of Statistical Planning and Inference, 2005; 131:175-190.
We compare the asymptotic relative efficiency of several regression calibration methods of correcting for measurement error in studies with internal validation data, when a single covariate is measured with error. The estimators we consider are appropriate in main study/hybrid validation study designs, where the latter study includes internal validation and may include external validation data. Although all of the methods we consider produce consistent estimates, the method proposed by Spiegelman et al. (Statistics in Medicine, 2001; 29:139-160) has an asymptotically smaller variance than the other methods. The methods for measurement error correction are illustrated using a study of the effect of in utero lead exposure on infant birth weight.
Internal validation • Methods to compare: • “As external”: Treat internal validation (IV) data as external validation data – i.e. ignore in IV study. • “Same intercept”: Regress on for IV, for main study. • “Different intercept”: Same as (2), but allow IV study, and main study to have different intercepts. • 4. “Weighted” (Spiegelman, Carroll, Kipnis, SIM, 2001): Calculate from IV study, and from bias-corrected main study. Combine by weighting each estimate of by its inverse variance. • One can obtain closed form estimators for
The different intercept method (CRS, 1st edition, p. 46) when participantis in the internal validation study, 0 otherwise when participant is in the main study, 0 otherwise Since when sampling into the internal validation study is independent of given and , estimation of this additional parameter, if correlated with could only increase the variance of the different intercept method relative to the same intercept method. This estimator is not considered any further.
Asymptotic relative efficiencies • “As external” does same/worse than other 2 methods. • “Weighted” (SCK) method does much better than “same intercept” method when: • - is small. • - is large. • - is small. • Based on a grid search, “weighted” method never does • worse than “same intercept” method.
Fig. 1. A comparison of the asymptotic standard error of as a function of the correlation between the true exposure, x, and the exposure measured with error, w, for two values of the percentage of subjects in the internal validation study, and two values of the correlation between the outcome, y, and the true exposure, x. Plots were constructed assuming no additional covariates and equal variances of the true exposure and the “proxy” exposure.
Fig. 2. A comparison of the asymptotic standard error of as a function of the correlation between the outcome, y, and the true exposure, x, for two values of the percentage of subjects in the internal validation study, and two values of the correlation between the true exposure, x, and the “proxy” exposure, w. Plots were constructed assuming no additional covariates and equal variances of the true exposure and the “proxy” exposure.
Effect of bone lead on birth weight ( =577, =485) (Gonzalez-Cossio, 1997) . Corr(X,W) = 0.19
Summary and conclusions • With internal validation (IV) study, 3 methods were compared: • “as external”: ignores in IV data. • “Same intercept”: uses in IV data, otherwise. • “weighted” (SCK): combines from IV, from corrected • main study, weighting each by its inverse variance. • (1) same/worse than (2), (2) same/worse than (3). • Especially important to use (3) when small, • large, and/or validation sample is small relative to • main study.