Subgroup Analysis

Subgroup Analysis “Fun to look at but don’t believe them!” (P Sleight, 2000) Deciding on analysis after looking at the data is “dangerous, useful, and often done.” (IJ Good, 1983)

Most trials report subgroup analyses (median=4 subgroups) Assmann SF, Lancet 2000; 355:1064-1069

Influence of Study Characteristics on Reporting of Subgroups • 44% of 469 randomized trials published in major journals reported subgroup analyses • Subgroup analyses were more likely to be reported in high impact journals, non-surgical trials, and large trials. • There was an interaction between source of funding and reporting of subgroups in trials without significant overall results – a subgroup finding! Sun et al, BMJ, 2011

Conclusion: Sun et al, BMJ 2011 “Industry funded randomised controlled trials, in the absence of statistically significant primary outcomes, are more likely to report subgroup analyses than non-industry funded trials. Industry funded trials less frequently test for interaction than non-industry funded trials. Subgroup analyses from industry funded trials with negative results for the primary outcome should be viewed with caution.”

Aims of Subgroup Analysis • To show consistency of trial findings for major endpoints for important patient subsets • To assess whether there are large differences in the treatment effect among different types of patients and, if so, identify hypotheses for future research. (Assess the possibility of treatment X subgroup or covariate interactions) Aim should not be to salvage a trial for which the overall results were not as hoped for!

Subgroup Analysis by Astrological Birth Sign ISIS-2: Streptokinase and Aspirin for Acute MI Gemini or Libra 9% (NS) Other signs -28% (p < 0.00001) Overall -23% (p < 0.00001) Percentage Reduction in 5 Week Vascular Mortality “Lack of evidence of benefit just in one particular subgroup is not good evidence of lack of benefit.”

Subgrouping Considerations • Most trials are not designed to look at subgroups; sample size and power based on overall treatment effect (power is lower for subgroups than overall comparison). • For subgroup analysis, it is often not clear how to control for type 1 error (the more subgroups examined, the greater the risk of a type 1 error). • Not all subgroups of interest can be pre-specified (we are not that smart). • The subgroup may not be what it appears to be (it may be a marker or label for some other characteristic).

Subgroup Definitions • Proper subgroup – grouping of patients according to baseline characteristics • Improper subgroup – grouping of patients according to characteristics following randomization (i.e., factors potentially affected by treatment) • Interaction – evidence that treatment effects differ by subgroup (quantitative versus qualitative) Yusuf S, et al., JAMA, 266:93-98, 1991.

A Priori and A Posteriori Subgroups • A priori: written in the protocol in advance of the study (hypothesis driven) • A posteriori (post hoc or exploratory): - specified … later - before unblinding - after unblinding • Both have inflated error rates, but more of a problem with a posteriori defined subgroups.

INSIGHT START Protocol: Early Treatment for HIV “Subgroup analyses for the primary endpoint and major secondary outcomes will be performed to determine whether the treatment effect (early versus deferred) differs qualitatively across various baseline-defined subgroups. Subgroup analysis will be performed by age, gender, race/ethnicity, geographic region, the presence of risk factors for serious non-AIDS conditions, baseline CD4+ cell count, baseline HIV RNA level, calendar date of enrollment to assess the effect of different treatment patterns that may emerge, and the ART-regimen pre-specified at the time of randomization….An overall test of heterogeneity will provide evidence of whether the magnitude of the treatment difference varies across baseline subgroups.”

Pre- and Post-Stratificationand Subgroup Analysis • Pre-stratification variables are often, but not always, subgroups of interest. • Aim of post-stratified analysis is to obtain a “better” estimate of overall treatment effect. • Aim of subgroup analysis is to determine whether treatment differences are consistent. • Like post-stratification, plans for subgroup analysis should be pre-specified –– sometimes there are surprises.

Subgrouping vs. Stratification Grouping Purpose Pre-stratification “insurance” for balance in randomization increase the accuracy of estimates of treatment effect check the consistency of the treatment effect Post-stratification Subgroups

m1A m1B m1 m2A m2B m2 m3A m3B m3 m4A m4B m4 • Typical situation: m1 ≠ m2 ≠ m3 ≠ m4 • Study is designed/powered based on na and nb • Goal: miA = miB for all i. Stratified Design for Comparing Treatments Treatment Stratum A B 1 2 3 4 na nb

Subgrouping Factors Determined Experimentally 2 x 2 Factorial A No A { B Determined by Randomization No B versus A No A { B Baseline Characteristic No B

NIH Policy on Subgroups “When an NIH-defined Phase III clinical trial is proposed, evidence must be reviewed to show whether or not clinically important sex/gender and race/ethnicity differences in the intervention effect are to be expected.” “Inclusion of the results of sex/gender, race/ethnicity and relevant subpopulations analyses is strongly encouraged in all publication submissions.” http://grants.nih.gov/grants/funding/women_min/guidelines_amended_10_2001,htm

ICH Guidelines on Subgroups • If the size of the study permits, important demographic or baseline value-defined subgroups should be examined. • These analyses are not intended to “salvage” an otherwise unsupportive study. • Subgroup analyses may suggest hypotheses to be examined in other studies • If there is a prior hypothesis about a subgroup, this should be part of the statistical analysis plan.

Issues to Consider • Appropriate significance level? Bonferroni method may be too conservative – loss of power in a situation where power is already low. • Should subgroup analysis be performed if the overall result is negative? Much harder sell. • Should only a priori subgroups be described? Not always that smart. • How should subgroup analyses be presented? Interaction tests important. • Should analyses be based on post-randomization measures? No

A Consumer’s (and Producer’s?)Guide to Subgroup Analysis • Document heterogeneity between subgroups • Argue consistency with biologic phenomena • Argue consistency with other data from the trial • Argue consistency with other studies • It is easy to build a story after the fact!

Data from Neonatal Hypocalcemia Trial:All Calcium Levels in mmol/l Treatment mean 2.445 2.408 2.300 2.195 No. babies 64 102 169 285 SE 0.0365 0.0311 0.0211 0.0189 Treatment effect 0.037 0.105 SE 0.0480 0.0283 P-value 0.44 0.0002 Breast-fed Bottle-fed Supplement Placebo Supplement Placebo Reference: Cockburn et al, BMJ, 281:11-14; 1980. See also Pocock. Clinical Trials a Practical Approach..

Data from Neonatal Hypocalcemia Trial (cont.) P-value = 0.22

HDFP Study Deaths Percent Difference in Mortality Race, Sex, Age SC RC Black men 112 140 -18.5 Black women 70 98 -27.8 White men 109 126 -14.7 White women 58 55 +2.1 30-49 81 82 -5.7 50-59 115 159 -25.3 60-69 153 178 -16.4 Overall 349 419 -16.9

HDFP Subgroups Black Men (1) Black Women (2) Dead Alive Dead Alive 112 952 70 1274 SC SC 140 944 98 1256 RC RC ^ ^ O = 0.79 O = 0.70 1 2 W = 55.0 W = 38.3 1 2 White Men (3) White Women (4) Dead Alive Dead Alive 109 1783 58 1026 SC SC 126 1735 55 1101 RC RC ^ ^ O = 0.84 O = 1.13 3 4 W = 54.8 W = 26.8 3 4

/174.9

Cox Model for Interaction • Treatment x gender interaction Z1 = 1 if eplerenone; 0 if placebo Z2 = 1 if male; 0 if female Z3 = Z1 x Z2 H0 : β3 = 0 h(t; Z) = h0 (t) exp[β1 Z1 +β2 Z2 +β3 Z3]

Subgroup Analyses According to Follow-up Time • Heart and estrogen/progestin Replacement Study (HERS) • JAMA 1998; 280: 605-613. • Adenomatous Polyp Prevention on Vioxx (APPROVe) Trial • N Engl J Med 2005; 352:1092-1102 • Lancet 2008; 372:1756-1764.

HERS

APPROVE Later determined that a different test for interaction was pre-specified and inclusion of events after treatment discontinuation changed findings.

Barrett-Connor on HERS*A Fable: Looking for the Pony A man has 2 sons, one a hopeless pessimist and the other an unrealistic optimist. Determined to change their thinking to a less extreme position, the man buys a room full of toys for the pessimist and a room full of horse manure for the optimist. When he returns, the pessimist is crying because he has broken all of his toys. In contrast, the optimist is shoveling through his gift and proclaim: “with all that manure there must be a pony in there somewhere.” Circulation 2002;105:902-903.

“New Study Reassures Most Users of Hormones. For Newly Menopausal, There’s No Heart Risk; A Reversal of Findings.”“At Issue is something called the P value…” Wall Street Journal April 4, 2007

Cardiovascular and Global Index Events by Years Since Menopause at Baseline (WHI Study) Years Since Menopause <10 10-19 ≥20 No. of Cases No. of Cases No. of Cases PvalueforTrend† Hormone Therapy (n=3608) Hormone Therapy (n=4483) Hormone Therapy (n=3608) Placebo(n=3529) HR(95%CI) Placebo(n=3529) HR(95%CI) Placebo(n=3529) HR(95%CI) CHD‡ 39 51 0.76 113 103 1.10 194 158 1.28 .02 (0.50-1.16) (0.84-1.45) (1.03-1.58) Stroke 41 23 1.77 100 79 1.23 142 113 1.26 .36 (1.05-2.98) (0.92-1.66) (0.98-1.62) Total Mortality 53 67 0.76 142 149 0.98 267 240 1.14 .51 (0.53-1.09) (0.78-1.24) (0.96-1.36) Global Index§ 222 203 1.05 482 440 1.12 675 632 1.09 .62 (0.86-1.27) (0.98-1.27) (0.98-1.22) † Test for trend (interaction) using years since menopause as continuous (linear) form of categorical coded values. Cox regression models stratified according to active vs. placebo and trial, including terms for years since menopause and the interaction between trials and years since menopause JAMA 2007;297:1465-1477

CHD Events by Years Since Menopause at Baseline Years Since Menopause P-valueforTrend† <10 10-19 ≥20 HR(95%CI) HR(95%CI) HR(95%CI) CHD‡ 0.76 1.10 1.28 .02 (0.50-1.16) (0.84-1.45) (1.03-1.58) “These analyses, although not definitive, suggest that the health consequences of hormone therapy may vary by distance from menopause…”

Vaccine Placebo AIDS Vaccine Trial(Science 28 February 2003) Not Infected Infected 191 3,139 3,330 98 1,581 1,679 289 4,720 5,009 5.7% vs. 5.8% 95% CI (0.78 to 1.24)

AIDS Vaccine TrialSubgroup Analysis White and Hispanic Black, Asian, Other Not Infected Not Infected Infected Infected 179 2,824 12 315 Vaccine Vaccine 81 1,427 17 154 Placebo Placebo 6.0 vs. 5.4% 3.7 vs. 9.9%

Example: ACTG 155 Randomization (allocation ratio) Arms: AZT 2 ddC 2 AZT + ddC 3 Primary outcome: disease progression (AIDS/death) Secondary outcome: CD4+ cell count change, toxicities Sample Size: 991 Number Subgrouping: CD4<50 269 50≤CD4<150 336 CD4≥150 386

“We found no overall benefits of zalcitabine used alone or with zidovudine. However, a trend analysis suggested a better outcome for combination therapy compared with zidovudine as the pretreatment CD4 cell count increased”.“Our study suggests that combination therapy may be beneficial in patients with higher CD4 cell counts”.

Pooled Analysis of AZT + ddX vs. AZTTreatment Naïve Patients < 100 382 0.66 (0.53 - 0.82) 100 - 199 319 0.63 (0.50 - 0.81) 200 - 299 186 0.62 (0.45 - 0.84) 300 - 499 90 0.63 (0.40 - 0.98) BaselineCD4+ No. AIDS/Death Events Hazard Ratio* *AZT + ddx vs. AZT

Some Lessons From ACTG 155 Presentation 1. What does “a priori” mean? If it is important, amend the protocol. 2. Confusion about stratification and subgrouping.

Lessons Continued 3. It is easy to develop explanations for possible subgroup effects. 4. By chance some subgroups will be more extreme than others.

Lessons Continued 5. For an ordered/continuous variable, test for trend is important. CD4+ > 50 50 - 149 150+ 4 df test for interaction (3 treatment groups and 3 CD4 categories) or 2 df test (3 treatment groups and continuous CD4) 6. “Subgroup label” may be a marker for something else.

Guidelines to Follow for Interpreting Subgroup Analysis • Assess magnitude of interaction before focusing on separate subgroups and their tests of significance • Assess consistency with biologic phenomenon realizing that “human imagination is capable of developing a rationale for most findings” (Ware, NEJM, 2003). • Assess consistency with other data from trial • Assess consistency with other studies

Guidelines For Reporting Subgroup Analyses (NEJM 2007;2189-2194) • Abstract: Only if based on primary outcome and pre-specified • Methods: Number pre-specified; any of special interest; endpoint; methods used to assess heterogeneity; number preformed; potential effect on type 1 error • Results: present tests of heterogeneity; forest plot • Discussion: Cautious in interpretation; state limitations; cite supporting or contradictory data

Criteria Used to Assess Credibility of Subgroup Effect (BMJ 2012:344:e1553) • Design • Baseline characteristic? • Stratification factor? • A priori specified? • Fewer than 5 subgroups tested? • Analysis • Test for interaction performed? • If multiple interactions, independent? • Context • Direction correctly pre-specified? • Consistent with evidence from previous studies? • Consistent across outcomes? • Indirect evidence (e.g., biologic rationale) supports finding?

Methods Section of ESPRIT PaperN Engl J Med 2009; 361: p. 1550 “Data on the primary end point were summarized for pre-specified subgroups defined according to baseline characteristics. A total of 12 subgroup analyses were pre-specified. The heterogeneity of hazard-ratio estimates between subgroups were assessed by including an interaction term between treatment and subgroup in expanded Cox models. The results of subgroup analyses should be interpreted with caution; a significant interaction could be due to chance, because there was no adjustment made to type 1 error for the number of subgroups examined.” 5 subgroups were reported: age, gender, race/ethnicity, baseline CD4+ count and baseline HIV RNA level.

Summary • P-values for individual subgroups are misleading – report CIs. • Calculate subgroup by treatment interactions, but be cognizant of low power • Keep in mind most trials are designed assuming no interaction. • Define key subgroups to be investigated in the protocol. • Report subgroup findings very cautiously – ultimately want validation in another study or meta-analysis. “Only one thing is worse than doing subgroup analyses --- believing the results.” Richard Peto

Subgroup Analysis

Subgroup Analysis

Presentation Transcript

Insecticide subgroup

EVAL 6970: Meta-Analysis Subgroup Analysis

FpML Messaging Subgroup Gap Analysis October 2012

Architecture Subgroup

Analysis of LIFE Study by Ethnic Demographic Subgroup

Wind Capacity Tiers PSB Cost Analysis Subgroup

Subgroup Discovery

COMM Subgroup

Metadata subgroup

Subgroup SGTERN

EIP Subgroup

Subgroup Discovery

Subgroup Analysis in Cost-Effectiveness Analysis

LRISv2 Subgroup

Subgroup 4H

Bayesian Subgroup Analysis

Airworthiness Subgroup

EVAL 6970: Meta-Analysis Subgroup Analysis

Biofuels Subgroup

Overall and subgroup analysis

Subgroup Discovery

LRISv2 Subgroup