Causal Inference in Epidemiology: A Primer on Bias & Confounding

Causal Inference in Epidemiology: A Primer on Bias & Confounding Chirag V. Shah, MD, MSc Pulmonary & Allergy Associates Atlantic Health – Morristown Medical Center January 18, 2012

Causal Inference • One of the most important aspects in clinical research is the inference that an association between an exposure and outcome represents a cause-effect relationship

Causal Inference • Imagine it is the 1920 and you are asked to evaluate the relationship between alcohol consumption and lung cancer risk • 1000 people who frequent a local pub • 100 cases of lung cancer • 1000 people who go to church every Sunday • 10 cases of lung cancer • You conclude there is an association between alcohol use and lung cancer

Explanation of Associations

Bias • Any systematic error in the design, conduct, or analysis of a study that results in a mistaken estimate of an exposure’s effect on the risk of disease • Evaluate magnitude and direction of effect • Must distinguish between the research question and the question actually answered by the study • Statistics cannot fix bias in the design

Bias – Common types • Selection bias • Who is sampled • Non-responders • Self-selection • Ascertainment bias (cohort) • Information bias • Recall bias • Interviewer bias • Misclassification bias • Indication bias (confounding by indication) • Lead time, length, and over diagnosis bias are important in studies of screening tests

Selection Bias • Sampling frame • Need study population = target population • Especially problematic in case-control studies • Selection of cases/controls is related to the probability of exposure • Who is an appropriate control? • Controls should be representative of persons who would have been identified as cases if they had developed the disease of interest (outcome)

Selection Bias – Example • MacMahon B, Yen S, Trichopoulos D, Warren K, Nardi G. “Coffee and cancer of the pancreas”. N Engl J Med. 1981;304 (11):630-3. • Compared 367 pancreatic cancer patients to 647 control patients selected from hospitalized patients with the same attending as the cases • Found higher coffee consumption in cancer patients • ≤ 2 cups/day OR 1.8 (95% CI 1.0 to 3.0) • > 3 cups/day OR 2.7 (95% CI 1.6 to 4.7)

Selection Bias – Example • Bias was introduced because: • Controls were often patients with GERD/PUD and avoided caffeine/coffee • So they may not be representative of the coffee consumption of the general public (i.e., Study pop ≠ target pop) • Subsequent studies with different controls failed to confirm this finding

Selection Bias • Non-responders • In case-control studies • Example - questionnaire about high risk behaviors and risk of developing disease • Those with the “riskiest” behavior may not respond because of fear of repercussions and thus an association may be lost • In cohort studies • “loss to follow up” • Those with “riskiest” behavior may fail to return for subsequent visit to see if the disease in question has developed

Lost to Follow-up Bias in Cohort Studies Disease RR = (50/1000) / (10/1000) = 5.0 10% lost to f/u in the diseased 0% lost to f/u in the non-diseased RR= (45/995) / (9/995) = 5.02 20% diseased and 5% non-diseased lost to f/u from the exposed 10% diseased and 10% non-diseased lost to f/u in the not-exposed RR= (40/942) / (9/900) = 4.0 Exposed

Selection Bias • Self selection • Who volunteers? • Example – spouses of ill patients may be more motivated to participate but may also be more likely to have similar exposures and the ability to find meaningful associations may be lost

Selection Bias • Ascertainment bias • Form of selection bias seen in prospective cohort studies • How did the investigators “ascertain” or “obtain” their sample to study • Potential bias in ICU studies • DM is associated with a decreased risk of ARDS

Information Bias • Due to a systematic error in the way exposure or outcome data are measured/obtained after subjects have been entered into the study • Types: • Recall • Reporting • Wish • Surveillance • Interview • Misclassification

Recall Bias • When knowledge of the disease status influences the determination of the exposure status • Subjects with a disease will remember and report exposures differently than those without the disease • Can lead to either under or overestimating a true association

Recall Bias – Example • Ask IPF subjects and controls about exposures to multiple possible triggers • Many patients with lung disease assume there must be an inhalational trigger and thus have thought about anything they have inhaled • IPF subjects may over-report an exposure and thus create a false association

Interviewer Bias • Any systematic difference in the soliciting, recording, or incorporation of data from study subjects • Due to differential probing for exposures or outcomes of interest among study subjects • Can effect all types of studies • Can be at least in part avoided by blinding data abstractors/interviewers to disease status and/or study question

Misclassification • Subjects are erroneously categorized with respect to either their exposure or disease status • 2 types • Non-differential • Differential

Non-differential Misclassification • The proportion of erroneous classifications is equal between the two groups • Common because frequently measures of exposure or disease are inexact • Because this always increases the similarity between the two groups, the bias is “toward the null” or the true effect is diluted

Differential Misclassification • When proportion of measurement error are different among the two groups being studied • Can either under or over estimate the real relationship depending on where the error is made

Lead Time and Length Bias

Controlling Bias • Choosing correct study population • Rigorous data collection • Careful construction of measurement tools and information abstraction forms • Blinding abstractors/interviewers • Standardized training of interviewers/abstractors • Clearly written protocols that don’t allow room for interpretation • Build in checks for bias – ask the same info several ways, dummy variables known to be associated with exposure of interest, etc.

Evaluating the Role of Bias • Bias can have both direction and magnitude • Generally can’t assess magnitude but can postulate a direction Italicized “red” rows indicate “acceptable” effect of biases

Confounding • Due to inherent differences between study groups other than the exposure of interest • A third factor “mixed” in effecting the distribution of disease and exposure among the study groups • Can over or underestimate the association of interest • Unlike bias, confounding can be adjusted for in the analysis, in addition to being prevented by good study design

Confounding • Requirement of a confounder: • Must be associated with both the exposure and the outcome of interest • Must not be on the causal pathway between exposure and disease (e.g., mediating variable) Outcome Exposure Exposure Confounder Outcome Confounder No!

DM Obesity CAD Confounding, Causal Pathway, or Interacting Variable? • Difference between a confounding and causal pathway variable is distinguished biologically, not statistically or epidemiologically • If “interaction” or “effect modification” is suspected, one must evaluate exposure and risk with knowledge of interacting variable

Confounding – Example • Case-control to study to examine the effect of alcohol use on lung cancer OR for Ca = (a x d)/(b x c) = (90 x90)/(60 x 60) = 2.25

Confounding – Example • OR for alcohol and smoking = (120x120)/(30x30) = 16.0 • OR for lung cancer and smoking = (100x100)/(50x50) = 4.0 • Therefore smoking is related to both lung cancer and alcohol use and thus may be a confounder

Confounding – Example • OR for lung cancer and alcohol • In smokers = (80x10)/(20x40) = 1.0 • In non-smokers = (10x80)/(20x40) = 1.0 • So its safe to drink alcohol (but not smoke) if you’re worried about lung cancer

Controlling for Confounding • Study design • Randomization • Restriction • Matching • Analysis • Stratification • Multivariable analysis

Confounding • Randomization • Evenly distributes known and unknown confounders • This is really why everyone considers RCTs the gold standard study design

Confounding • Restriction • Limits entrance into the study to individuals who fall within a specified or categories of the confounder • e.g., only including smokers in your study on alcohol use and lung cancer • Obviously must know ahead of time what is a confounder

Confounding • Matching • Match cases and controls on known confounders (one or more) • Example - have one smoker in the controls for every smoker in the cases • Makes it harder to identify controls but may be useful when the confounder would otherwise be very rare in one of the groups (increases statistical power) • Can’t then study the relationship of the confounder with outcome • Matching must be maintained in both design and analysis

Confounding • Adjusted in the Analysis • Stratified analysis • Like we did above with smokers vs. non-smokers and looking at differences between the ORs between the two groups • Can’t be done easily if you have lots of confounders • Multivariable analysis • Very commonly done • Not without its issues as well • How many confounders to include • Which to include • Which model of analysis

Indication Bias or Confounding by Indication • An error introduced into a study because the “indication” for an exposure is associated with the outcome • Pulmonary artery catheters and risk of death in ICU patients • The very reason that patients receive a PAC is that they are the “sickest” and it may not be the PAC that increases their risk of death, but rather the “indication” (really sick) for needing one is what increases their risk for death. • Often use “Propensity score matching” in analysis

Bias vs. Confounding • Confounding is not an error in the study, but rather a true phenomenon that is identified in a study and must be understood • Bias is a result of an error in the way that the study has been carried out • Confounding is a valid finding that describes the nature of the relationship among several factors and the risk of disease • However, failure to account for confounding in interpreting the results of a study is indeed an error and can bias the conclusions of the study

Conclusion • Bias is a systematic error in collecting or interpreting data • Bias is a flaw in design and cannot be analyzed away • Confounders are extraneous factors that distort the relationship between the exposure and the outcome • Confounders may be adjusted away if they are measured • Confounding can sometimes be prevented by proper study design

Causal Inference in Epidemiology: A Primer on Bias & Confounding