Statistics for Clinical Trials in Neurotherapeutics Barbara C. Tilley, Ph.D. Medical University of South Carolina
Funding: NIA Resource Center on Minority Aging 5 P30 AG21677 NINDS Parkinson’s Disease Statistical Center U01NS043127 and U01NS43128
Issues in Neurotherapeutics • What is the outcome? • How will this be measured • One or many measures of outcome? • How will you analyze the data? • (Nquery $700, STPLAN free, etc.)
Sample Size: Putting it all together Continuous (Normal) Distribution Need all but one: , , 2, , N Z = 1.96 (2 sided, 0.05); Z = 1.645 (always one-sided, 0.05, 95% power) = difference between means 2= pooled variance + ) s 4(Z Z 2 2 a b = 2n d 2
Adjusting for Drop-outs/Drop-ins • 10% dropout, increasing sample size by 10% is not enough • Use: 1/(1-R)2 Friedman, Furburg, DeMets
Sample Size for Multiple Primary Outcomes • Choose largest sample size for any single outcome. • If multiple aims, use largest sample size for any aim.
Sample Size: Food for Thought • Is detectable difference biologically/clinically meaningful? • Is sample size too small to be believable? WHERE DID YOU GET the estimate???? • Report power (for design), not conditional power for negative study.
Sample Size: Keeping It Small • Study continuous outcome (if variability does not increase) • Updrs Score rather “above or below cut-point” • Study surrogate outcome where effect is large • Rankin at 3 months rather than stroke mortality • Reduce variability (ANCOVA, training, equipment, choosing model)
Sample Size: Keeping It Small • Difference between two means = 1 • Standard deviation = 2; N = 64/group • Standard deviation = 1; N = 17/group
Analysis • Parametric? • Normal • Binomial • Nonparmetric? • Ranked
Sample Size Sample size to detect effect of size observed in NINDS t-PA Stroke Trial Barthel: • Non-parametric N = 507 • Binary N = 335 Rankin: • Non-parametric N = 394 • Binary N = 286
Multiple Comparisons • Different questions, can argue no adjustment (O’Brien, 1983) • Effect on blood pressure • Effect on quality of life • All pair-wise comparisons or multiple measures of same outcome, adjust • Pairwise comparisons of Drugs A, B, C (same outcome)
Multiple Comparisons • Bonferroni (or less conservative Simes, or Hockberg) • /#tests = 0.05/5 = 0.01 • Sample size, use adjusted • ANOVA methods – Tukey’s, etc. • Sample size for ANOVA
Bonferroni for Different Primary Outcomes, Same Construct • All outcomes measure same construct • Stroke recovery • PD progression • May lack power when most measures of efficacy are improved, but no single measure is overwhelmingly so. • Problem exacerbated when outcomes are highly correlated.
Use Global Tests When: • No one outcome sufficient or desirable • Outcome is difficult to measure and combination of correlated outcomes useful
Properties of Global Test • If all outcome measures perfectly correlated, • test statistic, p-value same as for single (univariate) test • power = power of univariate test • Assumes common dose effect • Power increases as correlation among outcomes decreases
O’Brien’s Non-parametric Procedure (Biomet., 1984) • Separately rank each outcome in the two treatment groups combined. • Sum ranks for each subject. • Compare mean ranks in the two treatment groups using • Wilcoxon or t-test • ANOVA if more than two treatments
Sample Size forGlobal Test • Use largest sample size for single outcome
NINDS t-PA Trial Observed Agreement & Correlations for Binary Outcomes
Randomization • Stratification • Age, prior stroke, years with PD, site • Greatest gain if N < 20 • Too many strata, difficult to balance • 3 age x 2 years with PD x gender = 12 • Blocking – balance number in each treatment group • Important if number expected per site is small • Minimization • Can be complicated to implement, cause delays
Interim Analyses • Who? • Why? • When? • How?
Stopping “Guidelines” 5.0 3.0 2.0 -2.0 -3.0 -5.0 Reject Ho • O’Brien-Fleming • Pocock • Peto Continue Fail to Reject Ho 0 Standard Normal Statistic (Zi) Reject Ho # Looks 1 2 3 4 5
Intent-to-Treat (ITT) Intent-to-treat means analyzing ALL patients as randomized. • Patients lost to follow-up (LTF) • Patients who do not adhere to treatment • Patients who were randomized and did not receive treatment • Patients incorrectly randomized
Imputation • Definition - replacing a value for those lost to follow-up or not adhering. • Imputation may or may not be ITT.
Optimal Approach MAKE IMPUTATION UNECESSARY!
Optimal Approach Continued • Make follow-up a high priority • Monitor follow-up closely • Build in patient incentives • “gifts” for patients (t-shirts, mugs, etc.) • free parking, meal ticket • Transportation • Follow even those off treatment
Hypertension Detection and Follow-up Program/MRFIT • Outcome was mortality • HDFP 21/10,940 • MRFIT 30/12,866 • Used Death Index, Social Security, detectives
NINDS t-PA Stroke Trial • Four 3-month outcomes • Barthel,NIHSS,GOS, Rankin • NINDS Project Officer pushed for complete ascertainment • Study staff made house calls, searched medical records • 5/612 (<1%) lost to follow-up on at least one of the four outcome measures • Used worst value possible
NET-PD Futility StudiesLTF for 1-year outcome(Used worst outcome in assigned group) • FS-1 3/200 • Creatine 2 • Minocycline 0 • Placebo 1 • FS-2 4/213 • GPI 3 • CoQ10 1 • Placebo 0
Handling Missing Values • Why? • How?
Subgroup Analyses (Sub-set) • Pre-specified based on rationale • NINDS t-PA Stroke Trial • Those randomized 0-90 minutes and 91-180 minutes from stroke onset • Post-hoc in the presence of interaction • (Yusuf, 1991)
Subgroup Analyses • The more subgroups examined, the more likely analyses will lead to finding a difference by chance alone. • 10 mutually exclusive subgroups; • 20% chance that in one group the treatment will be better than control and that the converse will be true in another
Trial of Org10172 for Stroke (TOAST) Trial N = 379(M) 238 (F) N=372(M) 239 (F) Test for interaction p = 0.251
Pooled AnalysisCarotid Endarterectomy Rothwell, 2004 NASCET &ECST N (men) 4175 N(women) 1718 Test for interaction p = 0.007 (Cox model)
Pooled Analysis ECASS, Atlantis, NINDSKent 2005 N (men) 4175 N(women) 1718 Test for interaction p = 0.04 (logistic model)
References • Rubin, DB. More powerful randomization-based p-values in double blind trials with non-compliance. Statistics in Medicine (1998) 17:317-385. • Little R, Yau L. Intent-to-treat analysis for longitudinal studies with drop-outs. Biometrics (1996) 52:1324-1333. • NINDS t-PA Stroke Trial Study Group. Tissue Plasminogen Activator for Acute Stroke (1995) 333:1581-1587. • Curb JD, et al. Ascertainment of vital status through the national death index and social security administration. A J Epi (1985)121:754-766. • Multiple Risk Factor Intervention Trial Research Group. Multiple risk factor intervention trial: risk factor changes and mortality results. JAMA (1982) 248:1466-77.
Completers • Retain only those patients who remain on treatment • Was used frequently in past in trials in rheumatoid arthritis • Not intent-to-treat • Obvious potential for bias • patients not responding to treatment drop-out
Last Observation Carried Forward • For those missing a final value, use most recent previous observation. • Potential for bias in disease with downward course
Worst case • Replace missing values with worst outcome • assumes that those who are lost to follow-up were not successfully treated • generally variance is not inflated • could inflate or deflate differences
Best Case/Worst Case • Replace missing values in treatment group by worst outcome and missing values in comparison group with best outcome. • Rarely used • Generally overly conservative as both treatment and placebo group drop-out for lack of efficacy.