360 likes | 496 Vues
Biostatistics Case Studies 2014. Session 1: Sample Size & Power for Inequality and Equivalence Studies I. Youngju Pak, PhD. Biostatistician ypak@labiomed.org. Class Schedule. Announcements. All class materials will be uploaded in the following website
E N D
Biostatistics Case Studies 2014 Session 1: Sample Size & Power for Inequality and Equivalence Studies I Youngju Pak, PhD. Biostatistician ypak@labiomed.org
Announcements • All class materials will be uploaded in the following website • http://research.labiomed.org/Biostat/Education/CaseStudies_Fall2014/CaseStudies2014Outline.htm • Try to read posted articles before each as best as you can and pay more attention to statistical components when you read them • Send me an e-mail (ypak@labiomed.org) so I can communicate with you if necessary. • Send me a copy of article that you want to discuss if you have one. This might be used for the last session
Two or more treatments are assumed equal (H0)and the study is designed to find overwhelming evidence of a difference (Superiority and/or Inferiority). • Most common comparative study type. • It is rare to assess only one of superiority or inferiority (“one-sided” statistical tests), unless there is biological impossibility of one of them. • Hypotheses: • Ha: | mean(treatment ) - mean (control ) | ≠ 0 • H0: | mean(treatment ) - mean (control ) | = 0 Inequality study:
Insignificnat p-values for Inequality tests • Insignificant p-values (> 0.05) usually mean that you don’t find a statistically sufficient evidence to support Ha and this doesn’t necessary mean H0 is true. • H0 might or might not be true => Your study is still “INCONCLUSIVE”. • Insignificant p-values do NOT prove your null !
Equivalence Study: • Two treatments are assumed to differ (H0) and the study is designed to find overwhelming evidence that they are equal. • Usually, the quantity of interest is a measure of biological activity or potency(the amount of drug required to produce an effect) and “treatments” are drugs or lots or batches of drugs. • AKA, bioequivalence. • Sometimes used to compare clinical outcomes for two active treatments if neither treatment can be considered standard or accepted. This usually requires LARGE numbers of subjects.
Hypotheses for equivalence tests • Ha : mean (trt 1) – mean (trt 2) = 0 • H0: mean(trt 1) - mean (trt 2 ) ≠ 0 • With a finite sample size, it is very hard to find two group means are exactly the same. • So we put a tolerability level for the equivalence, AKA, the equivalence margin, usually denoted as Δ • Practical hypotheses would be • Ha : Δ 1< mean(trt1) – mean (trt2) < Δ2 • H0:mean(trt 1) – mean (trt2) ≤ Δ 1 or mean(trt1) – mean (trt2) ≥ Δ2 Non-inferiority
Today, we are going to learn how to determine sample size for Inequality tests using software for three papers.Then, Discuss some logic.
How was N=498 determined? What reduction in CVD events can 224 + 224 subjects detect? Nevertheless How many subjects would be needed to detect this Δ?
Software Output for % of CVD Events 224 + 224 → detect 6.7% vs. 1.13%, i.e., 88% ↓. Need 3115 + 3115 to detect 25% ↓ from 6.7% to 5%, i.e., a total of (3115+3115)/0.9 = 6922.
From earlier design paper (Russell 2007): Δ = 0.85(0.05)mm = 0.0425 mm
Software Output for Mean IMT Each group N for 10% Dropout → 0.9N = 224 → N = 224/0.9 = 249. Total study size = 2(249)=498
Software Output - Means Can detect 0.4 SDs. Units? Since normal range =~ 6SD, this corresponds to ~0.4/6=7% shift in normal range. Applies to any continuously measured outcome.
From Nance paper Δ = ~8% Δ SD√(1/N1 + 1/N2) = 2.82 Solve for SD to get SD =~ 6.8%
Back to: How was 498 determined?
How IMT Change Comparison Will be Made Strength of Treatment Effect: Signal:Noise Ratio t= Observed Δ SD√(1/N1 + 1/N2) Δ = Aggressive - Standard Mean Diff in IMT changes SD = Std Dev of within group IMT changes N1 = N2 = Group size | t | > ~1.96 ↔ p<0.05
Could Solve for N Observed Δ SD√(1/N1 + 1/N2) t = ≥~1.96 if (with N = N1 = N2): 2SD2 Δ2 Δ ≥ 1.96SD√(2/N) orN ≥ (1.96)2 This is not quite right. The Δ is the actual observed difference. This sample Δ will vary from the real Δ in “everyone”. Need to increase N in case the sample happens to have a Δ that is lower than the real Δ (50% possibility).
Need to Increase N for Power Power is the probability that p<0.05 if Δ is the real effect, incorporating the possibility that the Δ in our sample could be smaller. 2SD2 Δ2 (1.96)2 N = for 50% power. Need to increase N to: 2SD2 Δ2 N = (1.96 + 0.842)2 for 80% power. 2SD2 Δ2 N = (1.96 + 1.282)2 for 90% power. from Normal Tables
Info Needed for Study Size: Comparing Means 2SD2 Δ2 N = (1.96 + 0.842)2 • Effect • Subject variability • Type I error (1.96 for α=0.05; 2.58 for α=0.01) • Power (0.842 for 80% power; 1.645 for 95% power) Same four quantities, but different formula, if comparing %s, hazard ratios, odds ratios, etc.
2SD2 Δ2 N = (1.96 + 0.842)2 2(0.16)2 (0.0425)2 N = (1.96 + 0.842)2 = 224 Each group N for 10% Dropout → 0.9N = 224 → N = 224/0.9 = 249. Total study size = 2(249)=498
SD Estimate Could be Wrong Should examine SD as study progresses. May need to increase N if SD was underestimated.
Free Study Size Software www.stat.uiowa.edu/~rlenth/Power
Study Size Software in GCRC Lab ncss.com ~$500