Review: Causal Inference in Epidemiology Confounding

Review: Causal Inference in Epidemiology Confounding Beate Ritz, MD, Ph.D. EPI 200B Winter 2010 NOTE: Many of the following slides are based on the lectures notes provided by Dr. Hal Morgenstern (Epi Methods I and II)

Major Methodologic Concerns in Epidemiologic (Observational) Population Research Three biases we try to avoid or control for: • Information Bias – measurement error of exposure or disease • Selection bias – does selection of the control/reference group depend on outcome and the exposure of interest • Confounding Bias - lack of comparability (lack of exchangeability) between exposed and unexposed populations • Unexposed: • Exposed: • In addition, we try to assess differences of effect estimates in subgroups e.g. men vs. women (statistical interactions or effect measure modification)

Counterfactual Causality “What would have happened to the same fixed individual at the same fixed time under one (‘exposed’) versus another (‘unexposed’) condition” Counterfactual causal thinking • provides a useful concept of causation • allows to draw probabilistic causal inferences in observational studies • provides framework for statistical procedures to estimate causal effects • demonstrates the limitations of observational data See Hoefler. Causal inference based on counterfactuals BMC Med Research Meth. 5:28, 2005

Exploring Causes of Disease in Human Populations:Use of Counterfactual Causality In counterfactual causal thinking we imagine the consequences of changing the value of a single factor in a comprehensive (complex) causal system The counterfactual is by definition unobservable. Instead, we identify a valid comparison group, i.e. similar in every aspect except for exposure.

“Causal Models” (but NOT a causal pathway diagram (DAG)!):From: Marbury MC, Maldonado G, Waller L. The indoor air and children's health study: methods and incidence rates. Epidemiology. 1996 Mar;7(2):166-74.

Causal Inference:Rothman’s sufficient-component-cause model of causation Builds a conceptual model for inferential considerations as a bridge between meta-physics and epi studies • Similar to but finer than the counterfactual model • Entities in this model are not individuals but mechanisms of causation • A mechanism is defined as a combination of events/factors that are jointly sufficient to induce a binary outcome event (diseased / non-diseased)

Rothman’s sufficient-component-cause model • A cause of a disease is an event, condition, or characteristic that plays an essential role in producing an occurrence of the disease • Sufficient and component causes • A causal mechanism consists of a constellation of components that act in concert • A “sufficient” cause may be defined as a set of minimal conditions and events that inevitably produce disease • “Minimal” implies that none of the conditions or events are superfluous • The completion of a sufficient cause may be considered equivalent to the onset of disease • A factor present in every sufficient cause constellation/mechanism constitutes a necessary component cause

Rothman’s model of causation SUFFICIENT CAUSE I SUFFICIENT CAUSE II SUFFICIENT CAUSE III One causal mechanism Single component cause Fig. 2-1. Conceptual schematization of three sufficient causes for a disease [Rothman, 1976].

Causes of Complex Diseases in Populations Rothman’s model of causation SUFFICIENT CAUSE 0 SUFFICIENT CAUSE I SUFFICIENT CAUSE II SUFFICIENT CAUSE III Z Single causal component Single component cause e.g. toxin A, gene B, immune response, injury, drug action etc. Note: A is a necessary component in all 3 causes) One causal mechanism • NOTE: • For biologic systems, most and sometimes all of the components of a sufficient cause are unknown • Generally, there is more than one sufficient cause for a disease • Conceptual schematization of three sufficient causes for a disease [Rothman, 1976].

Examples • Suppose component causes A, B, C, in sufficient causes I-III are all factors commonly present or experienced by people and E is rare. Although all factors are causes, E would appear to be a stronger determinant of disease because those with E differ greatly in risk from those without E. Thus, the strength of a cause is determined by the relative prevalence of component causes. • G is a substance created and confined to in a laboratory. Thus, any causal pie that includes G will not cause disease until G is released in the environment. • A is a necessary but not a sufficient cause. What proportion of disease is caused by A? Note: • No disease is caused solely by A, since A is not a sufficient cause. • A single cause or category of causes that is present in every sufficient cause will have an attributable fraction of 100% • What if component C in cause III was a B instead? Z

Rothman’s sufficient-component-cause model NOTE: For biologic effects, most and sometimes all of the components of a sufficient cause are unknown Generally, there is more than one sufficient cause for a disease Example: Breast cancer causes • BRCAI and BRCAII = J • Early age at menarche = E • Late age at first pregnancy etc..……

Sufficient Cause Models SUFFICIENT CAUSE Several toxins and genes as component causes Gene SNP 1 Toxin A paraquat Gene SNP 2 Toxin B maneb

Point-Counterpoint Commentary: Positivized epidemiology and the model of sufficient and component causes Charles Poole International Journal of Epidemiology 2001;30:707-709 • The Rothman model of sufficient and component causes (SCC) gives epidemiologists engaged in etiological research on any disease a clear choice between two options at any point in time: • 1. Consider all remaining variability in the disease's occurrence, conditional on its known determinants, to be due to chance or some other source of irreducible stochastic uncertainty, and close up shop (Peto) • 2. Keep searching for additional determinants • One authority (Colditz) on cancer epidemiology very recently declared the search for cancer risk factors to be over. • For health outcome, a way of emphasizing a working agreement on option 2 is to include unlabelled slices in pie-chart depictions of sufficient causes. 1 Peto R. Cancer risk. New Scientist 1977;73:480–81. 2 Colditz G. Cancer culture: Epidemics, human behavior, and the dubious search for new risk factors. Am J Public Health 2001; 91:357–64

Figure 1. Modified pie-chart depiction of all hypothetically possible classes of sufficient causes (etiologic mechanisms) of an outcome with regard to a well-specified index condition (X = 1) and reference condition (X = 0). Each label states the specific causal contrast postulated by the hypothetical class of sufficient causes. Unlabelled slices represent known or hypothesized component causes that are unspecified in this particular analysis, as well as unknown component causes that might be discovered in future research. Example: If X = 1 is the presence of an air bag, X = 0 is its absence, and the outcome is death in an automobile collision, the first pie chart represents mechanisms in which ‘air bags kill’, the second represents mechanisms in which ‘air bags save lives’, and the third represents fatal etiologies in which air bags, by their presence or absence, play no role

Bradford Hill. The environment and disease: association or causation? Proc R Soc Med 1965;58:295-300. • The seldom quoted bottom-line of the so-called “Hill criteria” (which he called ‘viewpoints’) and fundamental questionis: • “Is there any other way of explaining the set of facts before us, is there any other answer equally, or more, likely than cause and effect?”

Confounding - definition • Confounding is bias in the estimation of the effect of exposure on disease occurrence, due to a lack of comparability(lack of exchangeability) between exposed and unexposed populations; • thus, disease risks would be different even if the exposure were absent in both populations. • Note: a confounded estimate of effect is not expected to equal the causal parameter of interest in the source population.

Confounding • To quantify the exposure effect, we compare • the # of new cases occurring in the exposed population with • the # cases that would haveoccurred in the absence of exposure (a causal parameter). • Thus, confounding occurs when the exchangeability assumption (= reference or unexposed population exhibits the risk the exposed population would have experienced, if exposure had been absent) is not met • Note: this counterfactual contrast can never be made directly i.e. the same population is never both exposed and unexposed at the same time

Confounding • In practice we compare a group of exposed subjects with another group of unexposed subjects. • Thus, the validity of this comparison depends on the assumptionthat the risk of disease in the unexposed group is equal to the risk that would have occurred in the exposed group in the absence of exposure. • When this assumption is not true, the observed comparison between exposure groups is confounded.

Confounding in experiments • Confounding may occur in any type of study, including experiments. • Randomized experiments: • Randomization tends to make assigned (treatment) groups exchangeable (comparable), thus confounding is usually not a major source of bias in well-conducted experiments, provided the sample size is not too small Group 1:  Group 2: 

Confounding in experiments • Furthermore, randomization yields known treatment probabilities, thus, confidence intervals (CI) in randomized studies actually reflect possible confounding, which might have occurred in either direction; • Note: the amount of possible bias and the CI width become smaller as the sample size increases. • Thus, the interpretation of CIs in observational studies requires the assumption of no bias, whereas in randomized studies, CIsreflect possible confounding (which in randomized studies becomes part of the random error), although they do not reflect other biases (such as measurement error or differential loss to follow up).

Causal types • We could determine whether confounding exists if we knew the counterfactual risk of disease in the exposed group in the absence of exposure (R1). • To determine the counterfactual risk, we need to know the distribution of 4 "causal types" (i.e. doomed, causative, preventive, immune).

Table 4-1 p 60 ME2 (Rothman and Greenland). An elementary model of causal types and their distribution in two distinct cohorts 1=gets disease, 0=does not get disease Causal risk difference in cohort 1: (p1+p2) - (p1+p3) = p2 - p3 get disease among exposedget disease if unexposed Causal risk ratio in cohort 1: (p1+p2) (p1+p3) Causal odds ratio in cohort 1: (p1+p2) / (p3+p4) (p1+p3) / (p2+p4) NOTE: if p2 - p3 = 0 then causal risk and odds ratio = 1 balance between causative and preventative effects

Table 4-1 p 60 ME2 (Rothman and Greenland). An elementary model of causal types and their distribution in two distinct cohorts 1=gets disease, 0=does not get disease Causal risk difference: (p1+p2) - (q1+q3) get disease in cohort 1(=exposed)get disease in cohort 0 (=unexposed) Causal risk ratio: (p1+p2) (q1+q3) Causal odds ratio: (p1+p2) / (p3+p4) (q1+q3) / (q2+q4) NOTE: if q1 + q3p1 + p3 then q1+q3 cannot be exchanged or substituted for p1+p3 the association measure (risk comparisons) are confounded by the discrepancy between these two quantities

Causal types (example from Morgenstern) Example: Frequency distribution (in %) of 4 causal types, by exposure Status (E vs. Ē), in 3 closed cohorts; R0= counterfactual risk in the unexposed group of everyone were exposed.

Confounding • In all three cohorts, we would expect to observe a risk ratio (RR) of 2. • In Cohort 1, this expected RR is biased (confounded) because the exchangeability assumption is not met – i.e., R0 does not equal R1. Thus, the expected RR = 2 does not equal the causal risk ratio in the exposed group (RR1 = 1). • In Cohorts 2 and 3, however, the expected RRs are not biased because the exchangeability assumption is met – i.e., R0 = R1. Thus, the expected RR is equal to the causal risk ratio in the exposed group (RR1 = 2).

Comments: When focusing on causal parameters in an exposedsource population (e.g., RR1 = R1/R1 = a/a0), there is no confounding if the total proportion of Type 1 and Type 3 individuals is the same in exposed and unexposed groups. • In this situation, the risk of disease in the unexposed group (R0) is equal to what the risk would have been in the exposed group in the absence of exposure (R1). • NOTE: this condition is met in Cohorts 2 and 3, but not Cohort 1. This is the usual (often implied) meaning of confounding in epidemiology.

If we were interested in what the risk would have been in the unexposed source population had they been exposed (i.e., focusing on causal parameters in the unexposed source population, e.g., RR0 = R0/R0 = c1/c), no confounding would mean that the total proportion of Type 1 and Type 2 individuals is the same in exposed and unexposed groups. • In this situation, the risk of disease in the exposed group (R1) is equal to what the risk would have been in the unexposed group in the presence of exposure (R0). • This condition is met in Cohort 3, but not Cohorts 1 and 2. Note that the causal risk ratio in Cohort 2 is different in the exposed and unexposed groups.

Confounding • If we were interested in estimating causal parameters for the total source population, no confounding would mean that both conditions described above would hold. • That is, the two exposure groups would be completely exchangeable: The same exposure-risk relation would exist if the two exposure states were exchanged (i.e., if the exposed became unexposed and the unexposed become exposed). • Note that complete exchangeability does not necessarily require that the total distribution of causal types be the same in exposed and unexposed populations (e.g., see Cohort 3; if exposure groups were reversed, RR would still be 2). • Conclusion: In practice, we do not know the distribution of the 4 causal types. Thus, we cannot measure confounding without introducing untestable assumptions!

Confounders • In practice, there is no empirical method for directly examining the correctness of the comparability (exchangeability)assumption that defines “no confounding”. What we do instead is • attempt to identify and control for empirical sources of confounding. • search for differences between exposure groups in the distribution of extraneous risk factors for the disease. • such differences could produce a violation of the exchangeability assumption, which would bias (confound) the exposure effect estimator Extraneous risk factors responsible for confounding are called confounders or confounding variables, and they serve as a means for the identification and control of confounding.

Confounders - example • Suppose age is a risk factor for the disease in the source population. • If exposed persons are older than unexposed persons, how do we know whether the estimated exposure effect (e.g, RR >1) is actually due to the effect of the exposure or to being older? • Thus, age is a confounder in this population; the two exposure groups are probably not exchangeable because of the age difference.

Confounders • If we have adequately measured confounders in all subjects, we can controlor adjustfor their distorting effect in the analysis. • Analytic control is achieved by examining the desired association within categories (or strata) of the confounders (i.e, stratified analysis). • Within strata (defined by the cross-classification of a sufficient set of accurately measured confounders), the exposure groups are exchangeable, and our causal effect estimator is not confounded.

Confounders • Although we cannot observe what the frequency of disease would have been in the exposed group in the absence of exposure, we can identify predictors of the disease in the unexposed group. • When we adjust the effect estimate for differences in these predictors between exposure groups, we are attempting to remove that portion of confounding produced by these differences. • Thus, a confounder is defined as a variable that, when properly controlled, produces an expected estimate of effect that is closer to the unknown effect parameter in the source population than when it is not controlled–i.e., bias is reduced.

Properties of a confounder • In general, a necessary (but not sufficient) characteristic of a confounder is that it be associated with both exposure status and disease occurrence. • It is difficult to assess this criterion from data, however, because data associations are influenced: • by effects of other variables on the association between the suspected confounder, the exposure, and the disease in the source population; • the manner in which subjects are selected, e.g., via restrictions; • flaws in data collection, subject classification, and data analysis.

Properties of a confounder Consequently, the assessment of confounding for a given effect in a particular study involves: • Prior (external) information of effects in the source population • evaluation of study design and conduct • statistical analysis of relevant associations in the data Study-design issues relevant to the assessment of confounding include • randomization • various selection procedures (such as restriction and matching) • identification of the source population

Properties of a confounder The direction of the bias due to a particular confounder will be • positiveif the confounder-exposure (C-E) association and the confounder-disease (C-D) association are in the same direction • negative if the C-E and C-D associations are in opposite directions NOTE: Confounding is defined in terms of the source population Recall that in a follow-up design (cohort study or experiment, but not case-control study), the source population is the baseline study cohort (and not the person-time at risk). Thus, we at least partially observe all members of the source population in a cohort study, whereas in a case-control study we do not. This difference has important implications to the identification and control of confounders in observational studies.

Direction of Biasθ is a difference or log ratio effect measure in a source population and E(θ) is the expected value of the estimator of θ

ConfoundingExample 1: Oral contraceptive use, SES and Breast cancer • Hypothesis and design: Consider a case-control study designed to estimate the possible effect of oral contraceptive (OC) use on breast cancer. • Potential confounder: Since socioeconomic status (SES) is a known risk factor for the disease and since it is probably related to OC use, we will control for SES as a confounder, using stratified analysis. • Hypothetical results: Expected number of breast cancer cases (D) and controls ( ) by OC use and SES.

ConfoundingExample 1: Oral contraceptive use, SES and Breast cancer Conclusion: Because the crude (marginal or unadjusted) OR (1.89), ignoring SES, is larger than the stratum-specific ORs (1.00), SES appears to positively confound the estimated effect of OC use on breast cancer. Thus, the crude (marginal) OR appears to be confounded by SES, and we would generally infer from the stratum-specific ORs that OC use does not appear to be a risk factor for this disease in this source population (Note: We should also consider other possible sources of bias and the precision of these estimates by estimating confidence intervals)

ConfoundingExample 1: Oral contraceptive use, SES and Breast cancer Comment: SES appears to be a confounder because SES is positively associated with • exposure status (among noncases, who represent the source population): [(50x150)/(50x50)=3 and (40x150)/(50x10)=12] and • disease status (among nonusers): [(50x150)/(75x50)=2 and (30x150)/(75x10)=6] presumable because it affects both. The fact that the direction of these two associations was the same made the bias is positive –i.e., the crude OR is larger than the stratum-specific ORs.

ConfoundingExample 2: Wood dust, respiratory disease and smoking • Hypothesis and design: Suppose that we conduct a fixed cohort study to estimate the effect of exposure to wood dust on the occurrence of chronic respiratory disease (CRD) in middle-aged, male furniture workers. • Potential confounder: Since cigarette smoking is a known cause of the disease, we will control for smoking as a confounder, using stratified analysis. • Hypothetical results: Expected numbers of subjects at risk (N), new CRD cases (D), and risk (R), by wood-dust exposure and smoking

ConfoundingExample 2: Wood dust, respiratory disease and smoking Conclusion: Crude (unadjusted) RR (1.29) is less than the stratum-specific estimates (1.65-1.66), thus smoking appears to negatively confound the estimated effect of wood-dust exposure on CRD. Thus, the crude RR is biased for the effect, and we would infer from the stratum-specific RRs that exposed workers in this source population are about 65% more likely to develop the disease than are unexposed workers–assuming no further confounding or other bias is present.

ConfoundingExample 2: Wood dust, respiratory disease and smoking Comment: Confounding appears to have occurred in this study because smoking is positively associated with CRD risk (among the unexposed) and inversely associated with wood-dust exposure (in the source population). The latter association may be due to the fact that smokers elect or are selected to work in dust-free jobs where they can more easily and safely smoke.

ConfoundingExample 3: Physical activity, coronary heart disease (CHD), and age and gender • Hypothesis and design: Suppose that we conduct a cohort study to estimate the effect of physical activity level on the occurrence of CHD in a population of adults, aged 50-69. • Potential confounders: Since age and sex are known risk factors for CHD, we will control for these variables as confounders, using stratified analysis. The different strata are formed from the cross-classification of both variables (covariates)–i.e., younger men, older men, younger women, and older women. • Hypothetical results: Expected number of new CHD cases (D) over 10 years, by sex, age, and physical activity level at baseline (active vs. sedentary), in the absence of loss-to-follow-up:

ConfoundingExample 3: Physical activity, coronary heart disease (CHD), and age and gender Persons Persons RR Conclusion: Because the crude RR (0.54) is equal to the stratum-specific estimates, age and sex do not appear to confound the estimated effect of physical activity level on CHD. Thus, the crude RR would be unconfounded (but may be confounded by other factors) and we would infer that the rate in active adults is nearly half the rate in sedentary adults (assuming no other confounding occurred).

ConfoundingExample 3: Physical activity, coronary heart disease (CHD), and age and gender Persons Persons RR Comment: Confounding did not appear to occur in this study because activity level was not associated with age and sex (in the source population)–even though both age and sex were predictors of CHD (in the sedentary group). Thus, the two exposure groups appear comparable–at least with respect to age and sex. NOTE: it would be technically incorrect (although rarely an important error) to use person time and rates instead of persons to do this evaluation – if loss of follow-up occurred, one should estimate the risks using methods for censored data and base the evaluation on those risk ratio estimates.

ConfoundingExample 4: Social Support, hypertension, and race/ethnicity • Hypothesis and design: Suppose that we conduct a cross-sectional study to estimate the effect of social-support level on the presence of hypertension (elevated BP and/or maintained on antihypertensive medication) in a rural adult population. • Potential confounder: Since race is a known risk factor for hypertension, we will control for race as a confounder, using stratified analysis. • Hypothetical results: Expected number of subjects, by disease status, social-support level, and race.

ConfoundingExample 4: Social Support, hypertension, and race/ethnicity Conclusion: Although the crude OR (1.47) differs from both stratum-specific ORs(1.12 and 1.85), the latter two ORs differ from each other. In this situation, we assess possible confounding by comparing the crude (marginal) measure to a summary measure that has been properly adjusted (standardized) for the covariates. Since, in this example, that summary OR (not shown) is almost identical to the crude OR race does not appear to be a confounder.

ConfoundingExample 4: Social Support, hypertension, and race/ethnicity Comment: Confounding by race appears to be absent in these data because race was not associated with social-support level (among noncases [(270x385)/(690x151)=1]. It appears, however, that race modifies the effect of social support on hypertension–i.e., the magnitude of the estimated social-support OR is different for whites and blacks (effect measure modification).

Example 5: Confounding vs. Noncollapsibility • To show one problem with the change-in-estimate criterion for identifying confounders, consider the results of this hypothetical fixed cohort study in which the covariate is known to be a risk factor for the disease. The table below shows the number of subjects (N) at baseline, the estimated disease risk and 4 estimated measures of association, by covariate status (C vs. ).

Example 5: Confounding vs. Noncollapsibility R N R RR RD IOR corr Conclusion: Although C is a risk factor for D (reflected in the data), it is not associated with exposure status in the total sample (source population). Thus, C is not a confounder– a fact that is properly conveyed by comparing the crude and stratum-specific RD or RR estimates. (Since the RR estimates differ between strata, we must compare the crude (marginal) RR with a properly standardized estimate; they are equal).

Review: Causal Inference in Epidemiology Confounding