Case-Control Studies

Case-Control Studies Lecture 7 June 20, 2005 K. Schwartzman MD

Case Control Studies Readings • Fletcher, chapter 10 • Walker, chapter 6 [Case-Control Studies] from Observation and Inference, 1991 [course pack]

Case-Control Studies - Slide 1 Objectives Students will be able to: 1. Define the term “case-control study” 2. Explain the relationship between case-control and cohort studies 3. Understand the difference between cumulative incidence and incidence density designs

Case-Control Studies - Slide 2 • Objectives • 4. Calculate parameters which may be validly obtained • from case-control studies, namely: • a. Odds parameters: • - odds of exposure in cases • - odds of exposure in controls • - odds ratio • b. Risk parameters: • - approximation of relative risk • - attributablefraction • c. Incidence rate parameters: • - incidence rate ratio • - attributablefraction among the exposed • - attributable fraction for the population

Case-Control Studies - Slide 3 Objectives 5. Indicate situations in which case-control studies permit estimation of rate differences between exposure groups 6. Highlight advantages and disadvantages of case-control studies, including key biases 7. List possible sources of controls in case-control studies 8. Identify biases which may result from different types of control selection

Case Control Studies - Slide 4 Case-Control Studies Fletcher, p. 213: “Patients who have the disease and a group of otherwise similar people who do not have the disease are selected. The researchers then look backward in time to determine the frequency of exposure in the two groups.” In other words, a study population is first assembled based on a determination as to whether subjects have or have not developed an outcome of interest. Subjects (or person-time) are then classified as to whether an exposure of interest took place. Data on other variables (e.g. potential confounders) is also obtained.

Case-Control Studies - Slide 5 Walker, 1991: “Case-control studies constitute the major advance in epidemiologic methods of our time” Classic example: Doll & Hill, relationship between lung cancer and cigarette smoking (1950)

Case-Control Studies - Slide 6 Advantages Useful for study of conditions that are rare and/or characterized by a long latency between exposure(s) and outcomes of interest. May be useful in evaluating the impact of multiple types of exposure. Disadvantages May be particularly vulnerable to biases arising from selection of subjects (most often of the control group), and measurement (estimation) of exposure

Case-Control Studies - Slide 7 In case-control studies, data about exposure status is calculated after first determining outcome status. However, subjects may be recruited “prospectively” (concurrently), e.g.: - All persons aged 30-50 who are diagnosed with hypertension on the island of Montreal during 2005, within 2 weeks of diagnosis. - Controls recruited among persons of the same age who are newly diagnosed with appendicitis in Montreal during the same time period.

Case Control Studies - Slide 8 Often, outcome status is already available for all subjects (“historical”) at the time of initiation, e.g.: - During 2005, a researcher identifies all women aged 40-50 who were diagnosed with breast cancer on the island of Montreal in 2004. - In 2005, she recruits a control group among women of the same age who had negative screening mammograms in Montreal in 2004.

Case-Control Studies - Slide 9 Note that the terms “prospective” and “retrospective” are not very useful with respect to case-control studies, since data about exposure status is always retrospective (by definition).

Case-Control Studies - Slide 10 Cohort and Case-Control Studies Every case control study corresponds to an underlying cohort study, which is (ordinarily) hypothetical. Example (from Doll & Hill, 1950): _____________________________________________________ Women diagnosed with lung cancer vs other diseases at 20 London hospitals Smokers Non-Smokers Total Lung cancer cases 41 19 60 No lung cancer (controls) 28 32 60 Total 69 51 120 _________________________________________________________ Crude odds ratio = odds of exposure in cases/odds of exposure in controls = (a/b)/(c/d) = ad/bc = (41x32) / (19x28) = 2.5

Case-Control Studies - Slide 11 In the corresponding cohort study, women from the same geographic area would be recruited and classified as to smoking status, then followed for the development vs non-development of lung cancer.

Case-Control Studies - Slide 12 Assuming all cases of lung cancer during the period of interest were detected, one possible 2x2 table would be Smokers Non-Smokers Total Lung cancer 41 19 60 No lung cancer (controls) 859 981 1,840 Total 900 1000 1,900 OR = 2.5 but it could also be: Smokers Non-Smokers Total Lung cancer 41 19 60 No lung cancer (controls) 70 81 151 Total 111 100 211 OR = 2.5

Case-Control Studies - Slide 13 • The cases diagnosed and included, and the controls sampled, relate to the exposure experience of an underlying source population. • In each scenario, the estimated odds of cigarette smoking among cases are 2.5 times those among controls. • In each scenario, all cases of lung cancer were included. The size of the source population (and hence the number of non-cases) was varied.

Case-Control Studies - Slide 14 Cumulative incidence case-control studies Goal is to derive estimate of relative risks (relative cumulative incidences) of outcomes among exposed vs. unexposed Design: - Cases are ascertained during a defined observation period - Controls are persons who did not become cases during the period of observation. - The underlying cohort is a fixed one (not open or dynamic).

Case-Control Studies - Slide 15 Doll and Hill, 1950 Assume that the source population was as follows: 900 smokers & 1000 non smokers - followed 5 years Then the 2x2 table would be: Smokers Non-Smokers Total Cancer + 41 19 60 Cancer - 859 981 1,840 Total 900 1,000 2,000 ________________________________________________ Risk of cancer in smokers: 41/900 = 0.046 Risk of cancer in non smokers: 19/1000 = 0.019 Risk ratio: 0.046/0.019 = 2.4 Odds of smoking in women with cancer: 41/19 = 2.2 Odds of smoking in women without cancer: 859/981 = 0.88 Odds ratio = 2.5

Case-Control Studies - Slide 16 In the corresponding case control study we take 100% of cases, but sample the controls (60/1840 or 3.3% of all potential controls - those who happened to be admitted to hospital for some other reason). Hence the new table is: Smokers Non smokers Total Cancer + 100% x 41 = 41 100% x 19 = 19 60 Cancer - 3.3% x 859 = 28 3.3% x 981 = 32 60 Total 69 51 120 _________________________________________________________ “Risk” of cancer in smokers: 41/69 = 0.59 INVALID “Risk” of cancer in non smokers: 19/51 = 0.37 INVALID The “risk ratio” from this 2x2 table is also invalid Odds of smoking among cases: 41/19 = 2.2 (as before) Odds of smoking among controls: 28/32 = 0.88 (as before) Odds ratio: 2.2/0.88 = 2.5 (as before)

Case-Control Studies - Slide 17 General Form: Cumulative incidence case-control studies exposure + exposure - outcome + a b | total cases outcome - c d | total controls ___________ _____________ | total exposed total unexposed | total subjects Odds of exposure in cases = a/b Odds of exposure in controls = c/d Odds ratio = odds of exposure in cases = a/b = ad ______________________ ___ __ odds of exposure in controls c/d bc but: Odds of disease among exposed = a/c Odds of disease among unexposed = b/d Odds ratio = odds of disease among exposed = a/c = ad ___________________________ ___ __ odds of disease among unexposed b/d bc

Case-Control Studies - Slide 18 Risk parameter estimation in cumulative incidence case-control studies: Recall that relative risk = risk of disease in exposed ______________________ risk of disease in unexposed From our 2x2 table, this is: a/(a+c) = a(b+d) _______ ______ b/(b+d) b(a+c) If the disease is rare, then a<<c and b<<d among the source population then a+c ~ c and b+d ~ d then a(b+d) ~ ad ______ __ b(a+c) bc

Case-Control Studies - Slide 19 In a case-control study, it is then possible to estimate the attributable risk (fraction) among the exposed, even if the risk for the population is unknown. In a cohort study, the attributable risk fraction is: Rexp - Runexp __________ Rexp = (Rexp/Runexp) - (Runexp/Runexp) _______________________ Rexp/Runexp = RR-1 _____ RR In a case-control study, this is estimated by (OR-1)/OR

Case-Control Studies - Slide 20 Hence, from Doll and Hill (1950), the estimated fraction of lung cancer among female smokers which is attributable to smoking is: 2.5 -1 = 0.6 or 60% ______ 2.5

Case-Control Studies - Slide 21 Incidence Density Case-Control Studies The incidence density case-control study involves the implicit comparison of the person-time experience of cases and controls with respect to the exposure(s) of interest. The absolute quantity of person-time sampled - and hence the sampling fraction - is unknown. This is analogous to the situation with respect to persons in a cumulative incidence case-control study.

Case-Control Studies - Slide 22 Hence the underlying (hypothetical) cohort is an open or dynamic one. Persons considered controls at one point in time may then become cases; they can then appear twice in the 2x2 table. For this cohort, the general form of the 2x2 table is: exposure + exposure - outcome + a b person-time Pe Po Where Pe = person-time among exposed Po = person-time among unexposed IRe = a/Pe and IRo = b/Po IRR = aPo ____ bPe

Case-Control Studies - Slide 23 Suppose that all cases are counted, but the controls are sampled with respect to person-time, with sampling fraction ”f” generating the incidence density case-control study. Then the 2x2 table is: exposure + exposure - outcome + a b outcome - c = fPe d = fPo Then OR = ad = afPo = aPo ___ _____ ____ bc bfPe bPe which is equivalent to the IRR above.

Case-Control Studies - Slide 24 Note that this formulation does not involve any assumptions about disease rarity. It requires that the likelihood of being sampled from the source “population” of person-time varies as a proportion of the person-time potentially “contributed” by each individual. For example: A potential control subject who was absent from the geographic area of interest during most of the accrual period should have less chance of being selected than a potential subject who was present throughout. As with the cumulative incidence design, validity hinges on the assumption that f (the sampling fraction) does not vary with exposure status.

Case-Control Studies - Slide 25 An example of an incidence density case-control study: • A researcher wishes to evaluate the association between the use of nonsteroidal anti-inflammatory drugs (NSAIDS) and ventricular tachycardia (VT) • In an open cohort study lasting 2 years, subjects are recruited and classified as to exposure status (NSAID use), then followed for development of VT • In principle, it is possible to document periods of exposure and non-exposure for individuals, e.g. months on/off medication, as long as exposure is somehow reassessed

Case-Control Studies - Slide 26 Then for the cohort, incidence rates and an incidence rate ratio can be calculated for the exposed vs unexposed person-time experience, e.g. NSAID No NSAID Total VT, cases 80 40 120 Person-years 800 1200 2000 Incidence 0.1/p-y 0.033/p-y 0.06/p-y The estimated incidence rate ratio is: 80/800 _______ 40/1200 = 3 So, assuming no confounding, we estimate that the incidence of ventricular tachycardia among NSAID users is 3 times that among non-users

Case-Control Studies - Slide 27 Suppose we instead devise a case-control study. Here, cases will be defined by a first diagnosis of VT at Montreal hospitals, and controls will be recruited among persons who visit the eye clinics of the same hospitals: both over a 2-year accrual period. They will be compared with respect to use of NSAIDS within the last 24 hours prior to presentation.  If sampling is done correctly (e.g. the probability of selection is unrelated to NSAID use) then the controls should represent the person-time experience of the source population

Case-Control Studies - Slide 28 • If a possible control spent half the accrual period on NSAIDS, and half off, he has a 50% chance of contributing to the “exposed” group and a 50% chance of contributing to the “unexposed” group • This individual will contribute one or the other, depending on the date of the visit chosen as control; but in a larger group of people, the control days sampled will reflect the proportion of exposed person-time • A person can be a control early in the accrual period and a case later • In principle, a single person can also be sampled repeatedly as a control if the time window for exposure definition is short (more complicated in terms of analysis)

Case-Control Studies - Slide 29 Suppose that the case-control study includes all cases which would have been detected with the open cohort design. Two controls are recruited per case. This (unbeknownst to the researchers) corresponds to a sampling fraction for controls of 0.12 person-day sampled per person-year of follow-up that would have occurred in the open cohort. • Then the 2x2 table is: • NSAID No NSAID Total • VT, cases 80 40 120 • No VT(controls) 800*0.12 1200*0.12 2000*0.12 • = 96 = 144 = 240 • _____________________________________________ • Total 176 184 360 • OR = (80x144)/(40x96) = 3.0  same as earlier IRR

Case-Control Studies - Slide 30 Another example of an incidence density design: • Bronchodilators are used for the treatment of asthma • There is concern that overuse may be associated with an increased risk of adverse events, including death • Side effects can include arrhythmias, which may lead to sudden death • Suissa et al conducted a case-control study using the Saskatchewan health insurance database • They identified 30 persons prescribed anti-asthma medications who died of cardiovascular events, rather than of asthma; the date of death was termed the index date

Case-Control Studies - Slide 31 • 4080 control days were then sampled randomly from the 574,103 person-months of follow-up for the entire asthmatic group; each such day was also an index date • Cases and controls were then compared as to use of theophylline and beta-agonists during the 3 months preceding the index date • These were the main exposures of concern

Case-Control Studies - Slide 32 Questions for discussion: • Why do you think the researchers chose this study design? • What would have been the corresponding cohort study?

Case-Control Studies - Slide 33 With respect to the relationship between theophylline use and sudden cardiac death, the authors found the following: Theophylline in last 3 months Yes No | Total Cardiac Death Yes 17 13 | 30 No 956 3124 | 4080 Note that numbers in table refer to person-days (not to persons) OR (crude) = ad = 17 x 3124 = 4.3 (2.1 - 8.8) __ ________ bc 13 x 956 IRR (crude) = 4.3 (2.1 - 8.8)

Case-Control Studies - Slide 34 The odds of recent theophylline use among persons aged 5-54 years prescribed anti-asthma drugs who died of cardiovascular events were 4.3 times those among other persons in the same age range who were also prescribed anti-asthma drugs, but did not die. “Asthmatics” aged 5-54 who are prescribed theophylline have an estimated 4.3 fold increase in incidence of fatal cardiovascular events, compared with “asthmatics” who are not prescribed theophylline.

Case-Control Studies - Slide 35 As with the cumulative incidence design, an attributable rate fraction can be estimated for exposed persons: It is: Ie-Io, where Ie = incidence among exposed and ____ Ie Io = incidence among the unexposed = IRR - 1 = OR - 1 ______ _____ IRR OR For the Saskatchewan study, the estimated attributable rate fraction among “asthmatics” who were prescribed theophylline is: 4.3 - 1 = 0.77 ______ 4.3 Among “asthmatics” aged 5-54 prescribed theophylline, an estimated 77% of fatal cardiovascular events were related to its prescription.

Case-Control Studies - Slide 36 It is also possible to estimate the attributable rate fraction for the entire population (PAR%) In a cohort study, this is simply It - Io, where It = incidence among the total population _____ It Io = incidence among the unexposed For the corresponding incidence density case-control study, the population attributable rate fraction is IRR - 1 x proportion of cases who were exposed, ____ IRR estimated as OR - 1 x a _____ ____ OR a+b Similar parameters involving risk can be generated for the cumulative incidence design

Case-Control Studies - Slide 37 For the Saskatchewan study, recall the 2 x 2 table Theophylline in last 3 months Yes No | Total Cardiac death Yes 17 13 | 30 No 956 3124 | 4080 OR = 4.3 Pexp |case = 17/30 = 0.57 then PAR fraction = OR -1 x Pexp |case _____ OR = 4.3 - 1 x 0.57 = 0.44 ______ 4.3 Among Saskatchewan “asthmatics” aged 5-54, an estimated 44% of cardiovascular deaths relate to theophylline prescriptions.

Case-Control Studies - Slide 38 Attributable rates (rate difference) The absolute rate difference (i.e., the absolute rate of disease attributable to exposure) is Ie - Io Data from a standard case-control study alone cannot validly be used to estimate absolute rates of disease. Even if case ascertainment is complete, the controls represent an unknown and arbitrary fraction of the true person-time at risk. Hence the rate difference cannot be estimated.

Case-Control Studies - Slide 39 However, incidence rates can be estimated if there is additional knowledge about the amount of person-time at risk Exposure (+) (-) Disease (+) a b Disease (-) c = f x  x Pt d = f x (1- ) x Pt Then Ie = a = a _____ ___________  x Pt [c/(c+d)] x Pt Then Io = b = b _________ ___________ (1- ) x Pt [d/(c+d)] x Pt and the rate difference is Ie-Io where  = proportion of person-time which is exposed

Case-Control Studies - Slide 40 Example: In this nested case-control study, the researchers knew that in the source cohort (Saskatchewan “asthmatics” aged 5-54), there were 47,842 person-years at risk during the study period The 2x2 table was: Theophylline in last 3 months Yes No | Total Cardiac death Yes 17 13 | 30 No 956 3124 | 4080

Case-Control Studies - Slide 41 Then the estimated incidence of cardiac death in “asthmatics” prescribed theophylline (Ie) is: a = 17 = 0.0015 per person-year ___________ ________________ [c/(c+d)] x Pt 956/4080 x 47,842 And in “asthmatics” who were not prescribed theophylline the estimated incidence (Io) is: b = 13 = 0.00035 per person-year ___________ _________________ [d/(c+d)] x Pt 3124/4080 x 47,842 The estimated rate difference is therefore 0.0015-0.00035 = 0.00115 per person-year. Note that the IRR computed as Ie/Io remains 4.3

Case-Control Studies - Slide 42 Ie and Io may also be estimated if It is known for the source population Recall that It = (Ie x ) + [Io x (1- )] But Ie = Io x OR Then It = Io [(OR x ) + (1- )] So Io = It = It ______________ ________________________ (OR x ) + (1- ) {OR x [c/(c+d)]} + [d/(c+d)] Then use Ie = Io x OR Then RD = Ie - Io as usual [= Io (OR-1)]

Case-Control Studies - Slide 43 Example: The total incidence (It) of cardiovascular death in the Saskatchewan cohort was 30 deaths/47,842 person-years = 0.00063 per person-year. Then Io = 0.00063 = 0.00035 ___________________________ [4.3 x (956/4080)] + (3124/4080) and Ie = 0.00036 x 4.3 = 0.0015 RD = 0.0015 - 0.00035 = 0.00115

Case-Control Studies - Slide 44 Additional points  Corresponding estimates of attributable risks and risk differences can be made for cumulative incidence case-control studies, if the corresponding additional data is available  Estimates of absolute risks/incidence rates and risk/rate differences can be made only if the total amount of persons/person-time at risk is known, or at least one absolute risk/incidence rate is known (i.e. for the total population, the exposed, or the unexposed)  Nested case-control studies are a special type of study where cases and controls are explicitly drawn from a defined larger cohort (as in the Saskatchewan asthma study)

Case-Control Studies - Slide 45 Case-Control Studies: Strengths and Limitations Advantages of case-control studies:  Efficiency - much less expensive/intensive than cohort studies.  Very useful for outcomes that are rare or occur after a long latency period.  Most outcomes are relatively rare over short-term follow-up.  Permit evaluation of multiple exposures.  Can rapidly “accrue” person-time experience.  Avoid losses to follow-up inherent in cohort studies.

Case-Control Studies - Slide 46 Disadvantages • Not useful/efficient for very rare exposures (may not be present in either cases or controls). • Cannot directly compute incidence rates. • Cannot usually evaluate more than one outcome. • Temporality may be lost or distorted. • Potential for considerable bias, i.e. loss of validity. Bias relates to: - Measurement of exposure status - Selection of subjects (usually controls)

Case-Control Studies - Slide 47 With respect to measurement, exposure ascertainment must be consistent for cases and controls. There may be potential for misclassification of exposure in relation to disease status

Case-Control Studies - Slide 48 Example 1 Differential recall of exposures among cases vs controls e.g. medication use and congenital malformations - particularly if mothers “attuned” to study hypothesis. If cases more likely to recall exposure, results will be biased toward a positive association between exposure and outcome. The more objective the source of exposure data, the better.

Case-Control Studies