550 likes | 1.68k Vues
Causal Diagrams for Epidemiological Research. Eyal Shahar, MD, MPH Professor Division of Epidemiology & Biostatistics Mel and Enid Zuckerman College of Public Health The University of Arizona. What is it and why does it matter?. A tool (method) that:
E N D
Causal Diagrams for Epidemiological Research Eyal Shahar, MD, MPH Professor Division of Epidemiology & Biostatistics Mel and Enid Zuckerman College of Public Health The University of Arizona
What is it and why does it matter? A tool (method) that: • clarifies our wordy or vague causal thoughts about the research topic • helps us to decide which covariates should enter the statistical model—and which should not • unifies our understanding of confounding bias, selection bias, and information bias
What is the key question in a non-randomized study? When estimating the effect of E (“exposure”) on D (“disease”), what should we adjust for? or Confounder selection strategy
Adjusting for ConfoundersCommon Practice • The “change-in-estimate” method • List “potential confounders” • Adjust for (condition on) potential confounders • Compare adjusted estimate to crude estimate (or “fully adjusted” to “partially adjusted”) • Decide whether “potential confounders” were “real confounders” • Decide how much confounding existed • Premise: The data informs us about confounding. • Are we asking too much from the data?
Adjusting for ConfoundersCommon Practice • What is “a potential confounder”? • Typically, “a cause of the disease that is associated with the exposure” Confounder E D • What is the effect of a confounder? • Contributes to the crude (observed, marginal) association between E and D
Adjusting for ConfoundersCommon Practice • Extension to multiple confounders C1 C3 C2 E E D E D D C4 C6 C5 E E D E D D
Adjusting for ConfoundersCommon PracticeProblems • A sequence of isolated, independent, causal diagrams • but C1, C2, C3, C4, C5,.. might be connected causally • Unidirectional arrow = a causal direction • but what is the meaning of the bidirectional arrow? • Even with a single confounder, the “change-in-estimate” method could fail
Adjusting for ConfoundersProblems • An example where the “change-in-estimate” method fails U1 U2 C E D • The crude estimate may be closer to the truth than the C-adjusted estimate • To be explained
AlternativeA Causal Diagram • A method for selecting covariates • Extension of the confounder triangle • Premises displayed in the diagram • New terms: • Path • Collider on a path • Confounding path
Selected references • Pearl J. Causality: models, reasoning, and inference. 2000. Cambridge University Press • Greenland S et al. Causal diagrams for epidemiologic research. Epidemiology 1999;10:37-48 • Robins JM. Data, design, and background knowledge in etiologic inference. Epidemiology 2001;11:313-320 • Hernan MA et al. A structural approach to selection bias. Epidemiology 2004;15:615-625 • Shahar E. Causal diagrams for encoding and evaluation of information bias. J Eval Clin Pract (forthcoming)
A Causal Diagram Notation and Terms • An arrow=causal direction between two variables E D • An arrow could abbreviate both direct and indirect effects U1 E E D D could summarize U2 U3
A Causal Diagram Notation and Terms • A path between E and D: any sequence of causal arrows that connects E to D E D E U1 U2 D E U1 U2 D E U1 U2 D
A Causal Diagram Notation and Terms • Circularity (self-causation) does not exist: Directed Acyclic Graph E U1 D U2 • A collider on the path between E and D E U1 U2 D • E and U2 collide at U1
A Causal Diagram Notation and Terms • A confounding path for the effect of E on D: Any path between E and D that meets the following criteria: • The arrow next to E points to E • There are no colliders on the path C U1 V1 U2 V2 U3 E D In short: a path showing a common cause of E and D
C • The paths below are NOTconfounding paths for the effect of E on D U1 V1 U2 C V2 U3 U1 V1 E D U2 C V2 U3 U1 V1 E D U2 V2 U3 E D
What can affect the association between E and D?(Why do we observe an association between two variables?) • Causal path: E causes D • Causal path: D causes E • Confounding paths • Adjustment for colliders on a path from E to D E D D E C E D Later…
Why does a confounding path affect the crude (marginal) association between E and D? Intuitively: • Association= being able to “guess” the value of one variable (D) from the value of another (E) • ED allows us to guess D from E (and E from D) • A confounding path allows for sequential guesses along the path C U1 V1 U2 V2 U3 E D
How can we block a confounding path between E and D? • Condition on a variable on the path (on any variable) • Methods for conditioning • Restriction • Stratification • Regression C U1 V1 U2 V2 U3 E D
A point to remember • We don’t need to adjust for confounders (the top of the triangle.) Adjustment for any U or V below will do. • U and V are surrogates for the confounder C C U1 V1 U2 V2 U3 E D
Example • If the diagram below corresponds to reality, then we have several options for conditioning • For example: • On C and U2 • Only on U2 • Only on U3 C U1 V1 U2 V2 U3 E D
What can affect the association between E and D? • Causal path: E causes D • Causal path: D causes E • Confounding paths • Adjustment for colliders on a path from E to D E D D E C E D NOW!
Collider Confounder Conditioning on a ColliderA Trap • A collider may be viewed as the opposite of a confounder • Collider and confounder are symmetrical entities, like matter and anti-matter C U1 V1 U2 V2 U3 E D
Conditioning on a ColliderA Trap • A path from E to D that contains a collider is NOT a confounding path. There is no transfer of “guesses” across a collider. • A path from E to D that contains a collider does NOT generate an association between E and D • Conditioning on the collider, however, will turn that path into a confounding path. Why?
Conditioning on a ColliderA Trap C V1 U1 U2 V2 U3 E D The horizontal line indicates an association (the possibility of “guesses”) that was induced by conditioning on a collider
Properties of a ColliderIntuitive Explanation • A dataset contains three variables for N cars: • Brake condition (good/bad) • Street condition in the owner’s town (good/bad) • Involved in an accident in the owner’s town? (yes/no) Brake condition (good, bad) Accident (yes, no) Street condition (good, bad) • Accident is a collider. • Brake condition and street condition are not associated in the dataset. We cannot use the data to guess one from the other.
Properties of a ColliderIntuitive Explanation • Why can’t we make a guess from the data? • Let’s try. Suppose we are told: • Car A has good brakes and car B has bad brakes. • This information tells us nothing about the street condition in each owner’s town. • Intuition: a common effect (collider) does not induce an association between its causes (colliding variables)
Properties of a ColliderIntuitive Explanation • If, however, we condition (stratify) on the collider “accident”, we can make some guesses about the street condition from the brake condition. Stratum #1 Accident = yes
Properties of a ColliderIntuitive Explanation • Similarly, in the other stratum Stratum #2 Accident = no
Properties of a Collider In summary: • Conditioning on a collider creates an association between the colliding variables and, therefore, may open a confounding path Before conditioning on C After conditioning on C U1 U1 U2 U2 C C E E D D
Derivations • The “change-in-estimate” method could fail if we condition on colliders, and thereby open confounding paths • To (rationally) select covariates for adjustment, we must commit to a causal diagram (premises) (But we often say that we don’t know and can’t commit, and hope that the change-in-estimate method will work.) Causal inference, like all scientific inference, is conditional on premises (which may be false)—not on ignorance
Derivations • Do not condition on colliders, if possible • If you condition on a collider, • Connect the colliding variables by a line • Check if you opened a new confounding path • Condition on another variable to block that new path Conditioning on C and (U1orU2) Conditioning on C alone U1 U1 U2 U2 C C E E D D
Practical advice • Study one exposure at a time • A model that may be good for exposure A might not be good for exposure B (even if B is in the model) • Never adjust for an effect of the exposure • Never adjust for an effect of the disease • Never select covariates by stepwise regression • Never look at p-values to decide on confounding • (actually, never look at p-values…)
Extension to other problems of causal inquiry • Causation always remains uncertain, even if we deal with a single confounder Unbeknown to us the reality happens to be We draw U1 U2 C C E E D D And naively condition on C And our adjustment may fail
U U I D D E Extension to other problems of causal inquiry • Estimating the “direct” effect by conditioning on an intermediary variable, I I D E • We should remember that variable I may be a collider I E
Extension to other problems of causal inquiry • Causal diagrams explain the mechanism of selection bias • Example: What happens if we estimate the effect of marital status on dementia in a sample of nursing home residents? Assume: no effect both variables affect “place of residence” (home, or nursing home)
Extension to other problems of causal inquiry Marital status Dementia Place of residence (home, nursing home) • By studying a sample of nursing home residents, we are conditioning on a collider (on a “sampling collider”) and might create an association between marital status and dementia in that stratum
Maritalstatus Dementia Extension to other problems of causal inquiry Marital status Dementia Place of residence (home, nursing home) “Stratification” Home Nursing home
Estrogen MI E D Source cohort: no effect Selection into a case-control sample S=1 S (0,1) DS because disease status affects selection. Diseased members of the cohort are over-sampled (cases) relative to non-diseased (controls) Estrogen MI E D Suppose: F is hip fracture F Suppose: EF Suppose: Controls preferentially selected from women with hip fracture S (0,1) Extensions: control selection bias(Source: Hernan et al, Epidemiology 2004)
Extensions: control selection bias(Source: Hernan et al, Epidemiology 2004) Estrogen MI E D F S (0,1) S=1 (our case-control sample) S=0 (remainder of the source cohort) HRT MI E D Association of E and D was created
Diagnosed endometrial cancer Estrogenuse Endometrial cancer ? E D D* Z Frequency of exams Vaginal bleeding Extensions: information bias(LAST EXAMPLE)
Summary Points • The “change-in-estimate” method could fail if we condition on colliders, and thereby open confounding paths • The theory of causal diagrams extends the idea of a confounder to the multi-confounder case • Unification of confounding bias, selection bias, and information bias under a single theoretical framework
“Back-door algorithm” • Sufficient set for adjustment • Minimally sufficient set • Differential losses to follow-up • Time-dependent confounders • Interpretation of hazard ratios • Conditioning on a common effect always induced an association between its causes, but this association could be restricted to some levels of the common effect
Age (young, old) Smoking drive (low, high) Sex Physical activity (low, high) Asthma (yes, no) ? Smoking status FEV1
Age (young, old) Smoking drive (low, high) Sex Physical activity (low, high) Asthma (yes, no) ? Smoking status FEV1
Age (young, old) Smoking drive (low, high) Sex Physical activity (low, high) Asthma (yes, no) ? Smoking status FEV1
Ulcer Pneumonia Hospitalization Status hospitalized not hospitalized ? Abdominal Pain Coughing Stratification hospitalized patients other patients Ulcer Pneumonia ? Abdominal Pain Coughing
Example: Do men have higher systolic blood pressure than women? (In other words: estimate the gender effect on systolic blood pressure) The following table summarizes the answer to this question from two regression models So, which is the true estimate and which is biased?
WHR Gender SBP BMI Z1 Z2 . .
U WHR Gender SBP BMI Z1 Z2 . .
U WHR Gender SBP BMI Z1 Z2 . .