Improve Sample Size and Power Estimation with Covariate Measurement Error Solutions

Sample size and power estimation when covariates are measured with error Michael Wallace London School of Hygiene and Tropical Medicine

Outline • Measurement error – what is it and what problems can it cause? • What can we do about it? • The problem of power – introducing autopower

Measurement error – a crash course • Often impossible to measure covariates accurately: e.g. Dietary intake, blood pressure, weight • Instead, we have error-prone observations • How these relate to the underlying true values is our 'measurement error model' • Common model: ”classical” error: • Observed = True + Measurement Error • ...but other models are available.

Why does it matter? • Simple linear regression: • Classical measurement error:

Why does it matter? • Simple linear regression: • Classical measurement error: • Regress Y on W to obtain an estimate of where

What can we do about it? • Need additional data to tell us about the measurement error • Validation (accurate measurements on some) • Replication (multiple measurements) • Validation 'best', but replication more practical • Huge variety of 'correction methods' available to try and remove bias induced by measurement error. • Two that are already available in Stata: • Regression calibration (Stata command: rcal) • Simulation extrapolation (Stata command: simex) • ...but these don't produce consistent effect estimates in general.

Conditional Score • If there is measurement error, then solving estimating equations as normal will give inconsistent effect estimates. • Conditional score solves modified estimating equations to avoid this. • Unlike regression calibration and simulation extrapolation, it produces consistent effect estimates for a range of models, including logistic regression. • We have produced cscore for Stata to implement this method in the case of logistic regression.

The problem of power • Measurement error hits us with a 'double whammy': • Bias • Wider confidence intervals • Bias will often remain a problem even if a correction method is used. • Sample size calculations generally impossible. • Simulation studies only recourse. • autopower aims to remove the leg work.

autopower in brief • autopower simulates datasets that suffer from measurement error. • Then sees how methods perform on these datasets. • Variety of methods available: • 'naïve', rcal, simex, cscore • Assumes: • Univariate logistic regression • Subjects are measured either once or twice

Example: specific design • “How well should regression calibration perform on this dataset?”

Example: estimating sample size • “What sample size do I need to achieve 80% power?”

Example: cost minimization • “Obtaining second observations is expensive, can I save money by considering a design where not everyone is measured twice?” • User specifies how much more it costs to measure a subject twice rather than once. • autopower then searches the 'r1-r2' space: • r1 = subjects measured once • r2 = subjects measured twice • Various tricks for practical speed.

References • General overview: Carroll, R. J., D. Ruppert, L. K. Stefanski, and C. M. Crainiceanu. 2006. Measurement Error in Nonlinear Models: A Modern Perspective, Second Edition. Chapman & Hall/CRC • Regression calibration: Stefanski, L. A., and R. J. Carroll. 1987. Conditional scores and optimal scores in generalized linear measurement error models. Biometrika 74: 703–716. • Simulation extrapolation: Cook J R and Stefanski L A. Simulation-extrapolation estimation in parametric measurement error models. Journal of the American Statistical Association, 89:1314–1328, 1994. • Conditional score: Carroll, R. J., and L. A. Stefanski. 1990. Approximate quasilikelihood estimation in models with surrogate predictors. Journal of the American Statistical Assocation 85: 652–63.

Improve Sample Size and Power Estimation with Covariate Measurement Error Solutions