PRACTICAL STATISTICAL REASONING IN CLINICAL TRIALS FOR NON-STATISTICIANS

PRACTICAL STATISTICAL REASONING INCLINICAL TRIALS FOR NON-STATISTICIANS Presented on November 14, 2012 by: • Paul Wakim, PhD • Abigail G. Matthews, PhD Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Presenters • Abigail G. Matthews, PhD Biostatistician NIDA CTN Data and Statistics Center EMMES Corporation • Paul Wakim, PhD Senior Mathematical Statistician NIDA CCTN

Outline: • Introduction • Trial Design • Q&A • Analysis Plan • Trial Monitoring and Interim Analyses • Q&A • Primary Analysis • Subgroup Analyses • Q&A

Goals • Improve communication between researchers and biostatisticians • Importance of collaboration • Role of the biostatistician in clinical trials research • Basic statistical concepts • Discussion with participants from all backgrounds NO technical information, and NO formulas

Lack of Communication

Why is Communication So Important? • Biostatisticians cannot: • Propose research questions • Be subject-matter experts • Design a study without clinical input • Design statistical analyses without clinical input • Interpret results and place in clinical context • Investigators cannot: • Be knowledgeable about all statistical issues involved in sample size estimation and development of analysis plans • Implement the often complex statistical analyses involved in clinical trials • Interpret statistical analyses » Without communication, neither can do their jobs

Role of a Biostatistician • Work with investigators on trial design • Insure design will yield results that answer research question of interest • Aid in defining primary outcome • Conduct sample size calculations • Write appropriate sections of protocol • Develop analysis plan • Identify interim analyses and procedures for trial monitoring • Design primary analysis • Specify methods for subset analyses, sensitivity analyses and other exploratory analyses

Role of a Biostatistician (cont’d) • Implement trial monitoring and interim analyses • Develop monitoring reports for investigators, site staff, and sponsor, for example • Recruitment rates • Demographics • Availability of primary outcome • Prepare and present DSMB reports for open and closed sessions • Conduct interim analyses such as efficacy, futility and sample size re-estimation • Aid in preparation of IND Annual Reports

Role of a Biostatistician (cont’d) • Implement analysis plan • Aid in creation of the final/clinical study report • Tables • Figures • Interpretation • Perform any additional analyses for manuscripts • Contribute to IND reports as necessary • Develop novel statistical methodologies to analyze clinical trial data more appropriately (if necessary)

TRIAL DESIGN

Trial Design • Basic designs • Primary outcome measure (a.k.a. primary endpoint) • Sample size and power analysis Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Basic Design: Superiority Clinical hypothesis: Experimental treatment is more effective than the control treatment Statistical hypotheses: Null hypothesis H0: Experimental – Control = 0 Alternative hypothesis H1: Experimental – Control ≠ 0 We expect (hope) to reject H0 in favor of H1 Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Basic Design: Superiority Superior Inconclusive Inconclusive Inferior 95% confidence intervals around the difference: Experimental – Control High numbers (on the right) represent good outcome Diff.= 0 Based on Piaggio 2006

Basic Design: Non-Inferiority Clinical hypothesis: Experimental treatment is not less effective than the control treatment Statistical hypotheses: Null hypothesis H0: Experimental – Control < – M Alternative hypothesis H1: Experimental – Control ≥ – M We expect (hope) to reject H0 in favor of H1 Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Basic Design: Non-Inferiority Inconclusive Non-inferior(?) Inconclusive(?) Superior Non-inferior Inferior Non-inferior 95% confidence intervals around the difference: Experimental – Control High numbers (on the right) represent good outcome Diff.= -M Diff.= 0 Based on Piaggio 2006

Basic Design: Equivalence Clinical hypothesis: Experimental treatment is as effective as the control treatment Statistical hypotheses: Null hypothesis H0: Experimental – Control < – M or Experimental – Control > + M Alternative hypothesis H1: – M ≤ Experimental – Control ≤ + M We expect (hope) to reject H0 in favor of H1 Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Basic Design: Equivalence Inferior Inconclusive Inconclusive(?) Equivalent(?) Equivalent(?) Superior Equivalent Inconclusive(?) 95% confidence intervals around the difference: Experimental – Control High numbers (on the right) represent good outcome Diff.=-M Diff.=+M Diff.=0 Based on Piaggio 2006

Primary Outcome Measure (aka primary endpoint) • Clinically meaningful • Simple vs. composite Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Three “Deadly Sins” in Measuring Clinical Trial Outcomes Treating ordinal data as categorical Creating dichotomies from continuous data Using change from baseline From Stephen Senn, 2011 Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Sample Size & Power Analysis • What the biostatistician needs and why: • Number of treatment groups • Superiority or non-inferiority or equivalence • One-sided or two-sided • Expected drop-out rate Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Expected Drop-Out Rate (amount of missing primary data) Expected drop-out rate   Sample size  Expected drop-out rate   Sample size  Increase the sample size to account for the expected amount of missing data in the primary analysis Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Sample Size & Power Analysis • What the biostatistician needs and why: • Number of treatment groups • Superiority or non-inferiority or equivalence • One-sided or two-sided • Expected drop-out rate • Smallest meaningful clinical difference to detect Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Smallest Meaningful Clinical Difference to Detect Difference to detect   Sample size  Difference to detect   Sample size  Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Sample Size & Power Analysis • What the biostatistician needs and why: • Number of treatment groups • Superiority or non-inferiority or equivalence • One-sided or two-sided • Expected drop-out rate • Smallest meaningful clinical difference to detect • Alpha, aka chance of Type I error, e.g. 5% Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Alpha aka probability of making a Type I error Non-technical definition (superiority trial): Chance of concluding that the experimental treatment is (more) effective when in fact it is not Technical definition: Probability of rejecting H0 when H0 is true Different perspectives: FDA, Pharmaceutical company Bottom line: Most commonly used value for α: 0.05 (two-sided) Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Alpha aka probability of making a Type I error Alpha   Sample size  Alpha   Sample size  Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Sample Size & Power Analysis • What the biostatistician needs and why: • Number of treatment groups • Superiority or non-inferiority or equivalence • One-sided or two-sided • Expected drop-out rate • Smallest meaningful clinical difference to detect • Alpha, aka chance of Type I error, e.g. 5% • Power to detect an effect, e.g. 80% or 90% Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Power to Detect an Effect Non-technical definition (superiority trial): Chance of concluding that the experimental treatment is (more) effective when in fact it is Technical definition: Probability of rejecting H0 when H0 is false (i.e. when H1 is true) Different perspectives: FDA, Pharmaceutical company Bottom line: Most commonly used value for power: between 0.80 & 0.90 Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Power to Detect an Effect Power   Sample size  Power   Sample size  Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Sample Size & Power Analysis • What the biostatistician needs and why: • Number of treatment groups • Superiority or non-inferiority or equivalence • One-sided or two-sided • Expected drop-out rate • Smallest meaningful clinical difference to detect • Alpha, aka chance of Type I error, e.g. 5% • Power to detect an effect, e.g. 80% or 90% • Variability of primary outcome measure Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Variability of Primary Outcome Measure Variability   Sample size  Variability   Sample size  Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Sample Size & Power Analysis • What the biostatistician needs and why: • Number of treatment groups • Superiority or non-inferiority or equivalence • One-sided or two-sided • Expected drop-out rate • Smallest meaningful clinical difference to detect • Alpha, aka chance of Type I error, e.g. 5% • Power to detect an effect, e.g. 80% or 90% • Variability of primary outcome measure • Correlation between measurements within the same cluster (aka Intra-Class Correlation or ICC) Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

From Wikipedia Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Correlation Between Measurements within the Same Cluster (e.g. repeated measures) Intra-class correlation   Sample size  Intra-class correlation   Sample size  Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Sample Size & Power Analysis • What the biostatistician needs and why: • Number of treatment groups • Superiority or non-inferiority or equivalence • One-sided or two-sided • Expected drop-out rate • Smallest meaningful clinical difference to detect • Alpha, aka chance of Type I error, e.g. 5% • Power to detect an effect, e.g. 80% or 90% • Variability of primary outcome measure • Correlation between measurements within the same cluster (aka Intra-Class Correlation or ICC) Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

One Final Note About Sample Size • Cost, which has nothing to do with biostatistics, is most often a key factor in the final decision on sample size. Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services Clinical Trials Network National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Questions?

ANALYSIS PLAN

Purpose • Identify primary outcome measure a priori • Spell out analytic methods a priori • Remove criticism of data driven analyses In CTN: • Analysis plan must be finalized before data lock • Developed by DSC, but approved by Lead Node

Key Components of an Analysis Plan • Population to analyze: Intent-to-Treat (ITT) vs. per-protocol (PP) analysis • Statistical test or model for primary outcome • Adjustment for multiple comparisons • Handling of missing data • Handling of outliers • Interim analyses • Sensitivity analyses • Secondary and subgroup analyses

1. Population Analyzed Intent-to-Treat (ITT) • ALL randomized participants are analyzed • “Once randomized, analyzed” • Participants with completely missing data are included Per-Protocol (PP) • Analyze a select subset of randomized participants as stated in protocol • For example, • Only participants who had at least 80% of study medication • Only participants who attended at least 50% of the expected TAU sessions

2. Statistical Test or Model Test • What statistical test should be used? • What time points are of interest? • Measure of treatment effect Modeling • Must have parameter(s) to test primary outcome and hypothesis • Longitudinal model/repeated measures, single time point or composite score • Consider inclusion of stratification factors, time by treatment interactions, additional covariates (e.g. level of baseline substance use) • Potential site effects

3. Adjustment for Multiple Comparisons Why? • Need to control the study-wise false positive rate (type I error) • If perform 100 tests, 5% will be significant by chance if α = 0.05 When? • More than one primary outcome • Multiple treatment comparisons (e.g. multiple doses vs. placebo) • Multiple time points of interest, but not longitudinal model

3. Adjustment for Multiple Comparisons (cont’d) How? • Bonferroni • Very conservative, but simple • Split type I error rate equally between all statistical tests • Stepwise procedures

4. Handling of Missing Data Based on the first 24 multi-site CTN trials on substance abuse conducted between 2001 and 2010, the percent of missing data for the primary outcome measure ranged from 2% to 60% (Wakim 2011). There are many methods of handling missing data with varying levels of complexity, e.g., • Simple: imputing missing abstinence data as positive • Complex: pattern mixture models

Types of Missing Data • Missing Completely at Random (MCAR) • Whether an observation is missing or not is completely random • Participant does not attend visit due to snow storm • Missing at Random (MAR) • Unobserved data can be explained by observed data • Most common statistical methods will yield valid results under MAR • Missing Not at Random (MNAR) • Unobserved data cannot be explained by observed data • Participant does not attend study visit because they were using • Standard statistical methods cannot be used

5. Handling of Outliers An outlier is a value that is so far from the others that it appears to have come from a different population. The presence of outliers can invalidate many statistical analyses. Motulsky 2010

6. Interim Analyses • Specify type of interim analyses to be performed • Sample size re-estimation • Futility • Efficacy • Specify when analyses will be performed • e.g., sample size re-estimation when 50% of participants have completed active treatment • Specify frequency of these analyses • e.g., DSMB meetings every six months

PRACTICAL STATISTICAL REASONING IN CLINICAL TRIALS FOR NON-STATISTICIANS