Download
handling missing data on alspac n.
Skip this Video
Loading SlideShow in 5 Seconds..
Handling Missing Data on ALSPAC PowerPoint Presentation
Download Presentation
Handling Missing Data on ALSPAC

Handling Missing Data on ALSPAC

168 Views Download Presentation
Download Presentation

Handling Missing Data on ALSPAC

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Handling Missing Data on ALSPAC Paul Clarke (CMPO, University of Bristol) ALSPAC Social Science User Group meeting 21 May 2008

  2. Outline • What causes missing data? • ‘Types’ of missing data • Methods for missing data: quick overview • ALSPAC ‘Blitz’ on non-respondents • Investigating MNAR data in ALSPAC

  3. Example ‘ALSPAC’ analysis • At age 11 • Outcome: Mood (ordinal, 3 categories) • Depressive symptoms, maternally rated • Main exposure: Physical activity (score) • Measured on actigraph, 3 days • Adjustment: • BMI (score) • Sex, Age at screening • Ordinal logistic regression

  4. Missing Value (MV) pattern1 1All MV patterns < 200 cases ignored

  5. Incomplete questionnaire Refusal Follow-up Fail to attend clinic Parent characteristics Parent & child characteristics What causes missing data? Interviewer effectiveness Incentive for participant Loyalty Letter Telephone calls Interviewer visits Non-contact

  6. Result of processes leading to: • Refusal to answer questions (item) • Refusal to participate (unit) • No contact (unit) • Longitudinal-specific: attrition & drop-out • Non-response mechanism(s) - NRM

  7. Rubin’s definitions1 • Missing Completely At Random (MCAR) • Independent of observed variables • Missing At Random (MAR) • NRM depends only on observed variables • Missing Not At Random (MNAR) • NRM depends on missing variables too 1Little & Rubin (2002) Statistical Analysis with Missing Data

  8. Directed Acyclic Graph (DAG) • Rindependent  data MCAR Y X R C

  9. MAR data • R indirectly related to Y through X and C Y X R C

  10. Methods for MAR data • Complete cases analysis/Listwise deletion • Weighting • Weighting classes, post-stratification • (Single) imputation methods • e.g. regression, hot-deck/nearest-neighbour • Multiple imputation methods • e.g. Norm, MICE • Semiparametric estimators

  11. Imputation in practice: pitfalls1 • Omitting the outcome • Imputing non-normal variables • MAR completely implausible • Convergence of iterative procedures 1Sterne et al. (2008) British Medical Journal

  12. Complex methods • Analysis model • e.g. Ordinal logistic regression • Imputation model: Missing given Observed • ALL assume MAR data

  13. MAR data in reality ? • Unknown factors drive non-response Y X R C • …correlated with model predictors • …but not with Y

  14. Why is this important? • Weakness of MAR: How do we know? • Central problem: missing data is missing! • MAR is a “leap of faith”

  15. MNAR data ? • Unknowns directly correlated with Y? Y X R C

  16. Physical activity example ? • NRM is mother-driven (child age 11) • Child must wear actigraph for 3 days • Mother must assess her child’s mood Mood Phys Act R BMI, Sex, Age

  17. ALSPAC ‘Blitz’ • Co-ordinated by Family Liaison Unit • 4 tranches: Nov 2007-May 2008 • Target 5000 teenagers not in last 2 waves • Mini-clinic for difficult to persuade

  18. Proposed analysis • MAR is context dependent • Risky behaviours (Glyn Lewis, et al) • Outcomes: Cannabis use, sexual practices, etc • Risk factors: mental health, sensation seeking, etc • Basic analysis: • Compare follow-up with main sample • Still differences after adjustment?

  19. Unit non-response • 100% follow-up rate unlikely! • Directly model NRM • Continuum of non-response • Hard to contact less like main sample • Weighting scheme (Alho 1990; Wood et al. 2006) • Lower bound for MNAR bias

  20. Item non-response • Parallel qualitative post • Items: questions on risky behaviours • What mechanisms drive non-response? • Test hypotheses from this project