490 likes | 679 Vues
Topics in Clinical Trials (6 ) - 2012. J. Jack Lee, Ph.D. Department of Biostatistics University of Texas M. D. Anderson Cancer Center. N=1?. N=14?. N=1000?. How many patients are needed in a clinical trial?. As small as it takes to get the trial approved?.
E N D
Topics in Clinical Trials (6) - 2012 J. Jack Lee, Ph.D. Department of Biostatistics University of Texas M. D. Anderson Cancer Center
N=1? N=14? N=1000? How many patients are needed in a clinical trial? As small as it takes to get the trial approved? As large as possible until it bankrupts your bank account? It depends on what you want to achieve. Adequate N is needed for proper statistical inference. Inadequate N may lead to inconclusive or wrong results. Unduly large N may not be feasible and can also be unethical. Clinical Trials should have sufficient statistical power to detect clinically meaningful differences between groups. Sample size should be considered early in the planning phase.
Examples • Pilot study (feasibility): N 18 • Phase I (toxicity): 20 N 40 • Phase II (efficacy): 30 N 100 • Phase III (confirmatory): N > 100 • Primary Prevention Trials: N > 10,000 • e.g. BCPT (Tamoxifen): N=16,000 (13,388) • PHS (aspirin, b-carotene): N=22,071
Essence of Sample Size Calculation • Adequate N is needed for proper statistical inference. • Central limit theorem, large sample approximation • Control false positive (type I) and false negative (type II) errors • Inadequate N may lead to inconclusive or wrong results. • 71 RCT failed to find sig. Results between groups • 67 had > 10% risk of missing a 25% tx improvement • 50 had > 10% risk of missing a 50% improvement (false negative results) • Spurious finding can occur by chance alone when N is small (false positive results) • Unduly large N may not be feasible. • It can also be unethical. Why?
Fundamental Points • Clinical Trials should be designed with good operating characteristics to yield valid scientific inference. • Sufficient sample size is needed for estimation and/or hypothesis testing. • Sample size calculation should be based on the identification of • A primary endpoint • The objective to be achieved on the primary endpoint
Examples • In a single-arm Phase II trial • Primary endpoint: Response rate • Objective: find out whether the new treatment can achieve a target response rate • In a randomized controlled Phase III trial • Primary endpoint: Overall survival • Objective: find out whether the new treatment can yield a longer overall survival compared to the standard treatment
Sample Size Calculation Is Only An Estimate • Parameters used in calculation are estimates themselves with a level of uncertainty. • Estimated tx effect may be based on a different population. • Estimated tx effect is often overly optimistic based on highly selected pilot studies. • Pts eligibility criteria may be changed, thus, affect the sample population. • Rule of thumb: • Be conservative • May need a pilot study to refine the estimates • Better to design a larger study with early stopping and a smaller study than try to expand N /extend f/u during the trial.
Statistical Concepts • Estimation • 1-sample: estimating tx effect • 2-sample: estimating tx difference • Methods for 1-sample binary endpoint, e.g., response rate • Exact: e.g. Clopper-Pearson Interval • Asymptotic Gaussian approximation Hypothesis Testing
Estimation • Let X=systolic blood pressure (SBP) • X ~ N(90,102) • With sample size N, mean(X )~ N(90, 100/N )
For binary response, e.g. 3 out of 10 metastatic breast cancer patients responded to Taxol. What is the estimated response rate p*? p* ~ N ( p , p(1-p)/N) = N (0.3, 0.021) SE = sqrt(0.021) = 0.145 95% CI for p: p* 1.96 SE(p*) = 0.30 0.28 = (0.02, 0.58) Probability of Response Standard Error 0.1 0.2 0.3 0.4 0.5 0.2 3 4 6 6 7 0.1 9 16 21 24 25 0.05 36 64 84 96 100 0.025 144 256 336 384 400
For binary response, e.g. 3 out of 10 metastatic breast cancer patients responded to Taxol. What is the estimated response rate p*? Point estimate: p* = 0.30, SE(p*) = sqrt(0.3x0.7/10) = 0.145 95% CI for p: p* 1.96 SE(p*) = 0.30 0.28 = (0.02, 0.58) Suppose we have 30 out of 100 metastatic breast cancer patients responded to Taxol. What is the estimated p*? Point estimate: p* = 0.30, SE(p*) = sqrt(0.3x0.7/100) = 0.046 95% CI for p: p* 1.96 SE(p*) = 0.300.09 = (0.21, 0.39)
Sample Size Calculation Based on Estimation • SE = SD / sqrt(N) • Width of 95% CI = 1.96 x SE x 2 • Compute N s.t. SE or width of CI is within a pre-specified precision
Truth Ho H1 Ho H1 b Action a Hypothesis Testing • Framework of hypothesis testing a: Type I error (level of significance) b: Type II error (1- b = Power) Sample Size Calculation: Find N s.t. to a and b are under control. Typically, compute N for a given a to yield (1-b)x100% power. For example, compute N for a = 0.05 to yield 80% power.
P-values • P-value = probability of obtaining data as extreme or more extreme as the observed result when the null hypothesis is true. • Smaller p-values stronger evidence against H0. • Nothing sacred about p = 0.05. • (p = 0.045 vs. p = 0.055) • Statistical Significance Clinical Significance • Large samples: small differences may be significant • Small samples: large differences may not be significant • The frequentist inference depends on sample space, i.e. the design.
Tools for Sample Size Calculation • STPLAN • http://biostatistics.mdanderson.org/SoftwareDownload/ • NQuery • PASS • EaSt • Many web sites
Power Example: Let Y = reduction in SBP in an anti-hypertension trial mean(Y) ~ Normal(m, s2); Ho: m ≤ 0 vs. H1: m > 0 If s = 20 and a = 0.05, how big N should be to have 80% power for testing Ho vs. H1 if the true m = 5 ? N= 98.9 100 dstplan
n = 5 Courtesy of Don Berry
Selecting Appropriate Statistical Methods for Categorical Data GoalAnalysis Describe one group Proportion Compare one group to a Chi-square test hypothetical value Compare two unpaired groups Chi-square test* Compare two paired groups McNemar's test Compare three or more Chi-square test* unmatched groups Model the effect of multiple Logistic regression prognostic variables *: When sample size is small, use Fisher’s exact test
Selecting Appropriate Statistical Methods for Gaussian Data GoalAnalysis Describe one group Mean, SD Compare one group to a One-sample t-test hypothetical value Compare two unpaired groups Two-sample t-test Compare paired data Paired t-test Compare three or more One-way ANOVA unmatched groups
Selecting Appropriate Statistical Methods for Non-Gaussian Data GoalAnalysis Describe one group Median, Percentiles Compare one group to a Signed-rank test hypothetical value Compare two unpaired groups Mann-Whitney test Wilcoxon rank sum test Compare paired data Signed-rank test Compare three or more unmatched groups Kruskal-Wallis test
Selecting Appropriate Statistical Methods for Survival Data GoalAnalysis Describe one group Kaplan-Meier Compare two unpaired groups log-rank test Compare three or more Cox regression unmatched groups Model the effect of multiple Cox regression prognostic factors
Sample Size Based on Hypothesis Testing for Continuous Outcome • One-sample test • Two-sample test Note: Using Za for one-sided tests; replacing with Za/2 for two-sided tests. (d/s) is often called the effect size Cohen defined ES=.2, .5, and .8 as small, medium, and large, respectively. Cohen (1988): Statistical power analysis for the Behavioral Sciences
Sample Size Based on Hypothesis Testing for 2 Independent Binary Outcomes How big N should be for comparing the response rates between doxorubicin (control) and FTI (new intervention) in pancreatic cancer patients? Ho: pC = pIvs. H1: pC < pI Estimated pC= 0.1, pI= 0.3, 1-sided a = 0.05, 1-b = .80 2N = 98.9, N 50 dstplan
Ex: Two-sample Binomial Probability • Annual event rate: • Pc = 0.4, PI = 0.3 • Two-sided a = 0.05 • 90% power • Sample size • Total N = 956 • Each group = 478 dstplan
Sample Size Calculation for Survival Outcome – Instantaneous Entry – No Censoring • Assume exponential survival • To test Ho: lc= lIvs. H1: lc≠ lI
Example: Exponential Survival • Assumelc= 0.30 and lI = 0.20. What will be the sample size needed to test the equality of hazard rate with two-sided a = 0.05 and 1 b = 0.90 • 5-yr mortality rates are 0.7769 and 0.6321 for the control and intervention groups, respectively. • Median survival time = ln(.5)/l = 2.31 and 3.47, respectively • By plugging the formula, N=128 or 2N=256. • Using the comparison of two proportions, 2N=412 • Survival approach is more efficient.
Sample Size Calculation for Survival Outcome – Instantaneous Entry – With Censoring • All patients entered at the same time and censored at time T. • In the previous example, if a 5-yr study is planned, then, the required sample size is 2N=376.
Sample Size Calculation for Survival Outcome – Staggered Entry • Assume participants are recruited uniformly over a period of To • The trial continues for T years (T > To) • With 3 years of accrual in a 5-yr study, 2N = 466 by using similar formula as before but with
The Key Quantity – Expected # of Events • Expected # of events is a function of sample size, hazard rate, recruitment rate, and censoring distribution. • Assume uniform accrual over (0,To ) and f/u over (0, T)
Sample Size Based on CI estimation for 2 Independent Binary Outcomes How big N should be such that the width of a 100(1-a)% CI for pI - pCwill not exceed WCI. Choose pI - pC= WCI / 2 The same as the H.T. formula with Zb = 0
Sample Size Based on Hypothesis Testing for Paired Binary Outcomes How big N should be for comparing the response rates between control and intervention given to each of the two eyes, respectively? Ho: pC = pIvs. H1: pC < pI Estimated pC= 0.2, pI= 0.4, a = 0.05, 1-b = .90, and the proportion of the pts with discordant response f=.5 Np= 132
Sample Size for McNemar’s Test Ho: pC = pIvs. H1: pC < pI Estimated pC= 0.2, pI= 0.4, a = 0.05, 1-b = .90, and the proportion of the pts with discordant response f=.5 Use Connor (1987), N = 104
Impact of Noncompliance • “Diluting” the treatment effect • Increase sample size • Raise questions about the study validity • Difficult to make proper inference to the population
Sample Size for Other Designs • Repeated measures • Equivalence trials • Historical control trials • Cluster randomization trials (Reading Assignment) • Efficient targeted design trials
Premise (binary endpoint) • In the study population • R+ : portion of marker positive (likely to respond) • R- : portion of marker negative (less likely to respond) • Proportion of R- : g • Patients were randomized into control and exp groups • Response probability • For R- pts: pc + d0 • For R+ pts: pc + d1 • Relative efficiency
Gefitinib Trials • INTACT I & II Trials had 2,130 pts. Results were negative. • We want to study an EGFR inhibitor in high-risk oral IEN • Only a fraction (1 - g = .10) of subjects presenting the target • Response rates are: std tx (e.g., retinoids): pc = .40 • EGFR inhibitor: w/ target is pc + 1 , w/o target is pc + 0 • Sample size needed for 90% power at 2-sided 5% significance level
Adherence/Compliance Monitoring • Pill diary • Pill count • Forget to bring in the bottle • Dump the remaining drugs into the toilet • Over-subscribe • Dispense by weight – not precise • Special pill dispenser to monitor when the bottle is opened • Laboratory test of drug level in serum or urine • Half-life of the drug • Choosing cutoff value to declare (+) or (-) • Percent compliance • % compliance = # of pills taken / # of pills prescribed • Dose intensity • # of pills taken / # of pills should have been taken per protocol • Measure the actual amount of drug taken
Main Reasons for Noncompliance • Toxicity or side effects • Involving life style/behavior change • Complex or inconvenient interventions • Insufficient or lack of understanding instructions • Change of mind, refusal • Lack of family support
Other Adjustments for Sample Size • Increase number of screened/registered patients to take the ineligibility into consideration • 10% ineligible, Total N = N/0.9 • Increase number of randomized patients if not everyone is evaluable • 5% inevaluable, Total N = N/0.95 • Drop out, loss to f/u • Be aware of informative censoring • Interim analysis, sample size re-estimation • To be covered later in the course
Sample Size/Power Calculation via Simulations for Hypothesis Testing • Generate data according to the study design • Compute the test statistics • Determine whether you reject H0 or not • Repeat steps 1-3 • Useful tips • Set seed to initialize the random number generator • Check the distribution of the data to make sure they are accurately generated • Run the test under H0 to verify the level of significance • Do it for at least 1,000 trials. • Precision of statistical power? sqrt((.8x.2)/1,000) = 0.013 sqrt((.5x.5)/1,000) = 0.016
Homework #7 (due 2/21) Sample size calculation for comparing two binomial probabilities In a randomized Phase II trial, patients are randomized to receive either the standard treatment or a new targeted treatment. The goal is to compare the response rate between the two treatments by testing the following hypothesis H0: ps = pT H1: pspT Assume ps= 0.3, pT= 0.5, a=0.05, and b=0.1, 1. Calculate the sample size required assuming equal randomization between the two treatments. (Use STPLAN) 2. Applying the Bayesian response adaptive randomization, compute the required sample size using the following decision rules: At the end of trial, if Prob(ps > pT) > 0.95, conclude standard treatment is better. Otherwise, if Prob(ps < pT) > 0.95, conclude the new treatment is better. (The AR software can be downloaded from http://biostatistics.mdanderson.org/SoftwareDownload.) 3. Similar as in 2, compute the sample size but add an early stopping rule that at any given time of the study, if observe Prob(ps > pT) > 0.999, terminate the study and conclude standard treatment is better. Otherwise, if Prob(ps < pT) > 0.999, terminate the study and conclude the new treatment is better. 4. Compare the maximum sample size, averaged sample size, type I error, statistical power, probability of early stopping, probability of patients randomized into each arm, and the average number of responses observed in the trial in (a), (b), and (c).
Homework #8 (due 2/21) Sample size calculation for comparing survival endpoints in two groups Instead of using the binary endpoint, we now assume that the anti-tumor activity is measured by a survival endpoint. Assume the 5-yr survival rate for recurrent head and neck cancer is about 30% for the standard treatment. Assume a new agent can increase the 5-yr survival rate to 50%. Please design a two-arm randomized study comparing the standard versus new treatments with a two-sided a = 5% and 90% power for testing equal hazard rate assuming exponential survival. Compute the sample size needed (e.g., use STPLAN). 1. Assume instantaneous accrual and no censoring. 2. Assume instantaneous accrual with 5 years of f/u. 3. Compute the accrual rate and total sample size needed with 2 years of accrual and 3 years of additional follow-up, i.e. the total study duration is 5 years. 4. Please verify the result in 3. above by conducting simulation studies with at least 1,000 runs. 5. Compute the f/u time and total study duration required if the accrual time is 3 years with a rate of 5 patients per month.