Biostatistics Case Studies 2014

Biostatistics Case Studies 2014 Session 1: Sample Size & Power for Inequality and Equivalence Studies I Youngju Pak, PhD. Biostatistician ypak@labiomed.org

Class Schedule

Announcements • All class materials will be uploaded in the following website • http://research.labiomed.org/Biostat/Education/CaseStudies_Fall2014/CaseStudies2014Outline.htm • Try to read posted articles before each as best as you can and pay more attention to statistical components when you read them • Send me an e-mail (ypak@labiomed.org) so I can communicate with you if necessary. • Send me a copy of article that you want to discuss if you have one. This might be used for the last session

Two or more treatments are assumed equal (H0)and the study is designed to find overwhelming evidence of a difference (Superiority and/or Inferiority). • Most common comparative study type. • It is rare to assess only one of superiority or inferiority (“one-sided” statistical tests), unless there is biological impossibility of one of them. • Hypotheses: • Ha: | mean(treatment ) - mean (control ) | ≠ 0 • H0: | mean(treatment ) - mean (control ) | = 0 Inequality study:

Insignificnat p-values for Inequality tests • Insignificant p-values (> 0.05) usually mean that you don’t find a statistically sufficient evidence to support Ha and this doesn’t necessary mean H0 is true. • H0 might or might not be true => Your study is still “INCONCLUSIVE”. • Insignificant p-values do NOT prove your null !

Equivalence Study: • Two treatments are assumed to differ (H0) and the study is designed to find overwhelming evidence that they are equal. • Usually, the quantity of interest is a measure of biological activity or potency(the amount of drug required to produce an effect) and “treatments” are drugs or lots or batches of drugs. • AKA, bioequivalence. • Sometimes used to compare clinical outcomes for two active treatments if neither treatment can be considered standard or accepted. This usually requires LARGE numbers of subjects.

Hypotheses for equivalence tests • Ha : mean (trt 1) – mean (trt 2) = 0 • H0: mean(trt 1) - mean (trt 2 ) ≠ 0 • With a finite sample size, it is very hard to find two group means are exactly the same. • So we put a tolerability level for the equivalence, AKA, the equivalence margin, usually denoted as Δ • Practical hypotheses would be • Ha : Δ 1< mean(trt1) – mean (trt2) < Δ2 • H0:mean(trt 1) – mean (trt2) ≤ Δ 1 or mean(trt1) – mean (trt2) ≥ Δ2 Non-inferiority

Today, we are going to learn how to determine sample size for Inequality tests using software for three papers.Then, Discuss some logic.

Paper #1

How was N=498 determined? What reduction in CVD events can 224 + 224 subjects detect? Nevertheless How many subjects would be needed to detect this Δ?

Software Output for % of CVD Events 224 + 224 → detect 6.7% vs. 1.13%, i.e., 88% ↓. Need 3115 + 3115 to detect 25% ↓ from 6.7% to 5%, i.e., a total of (3115+3115)/0.9 = 6922.

From earlier design paper (Russell 2007): Δ = 0.85(0.05)mm = 0.0425 mm

Software Output for Mean IMT Each group N for 10% Dropout → 0.9N = 224 → N = 224/0.9 = 249. Total study size = 2(249)=498

Paper #2

Williamson paper

Software Output - Percentages

Software Output - Means Can detect 0.4 SDs. Units? Since normal range =~ 6SD, this corresponds to ~0.4/6=7% shift in normal range. Applies to any continuously measured outcome.

Paper #3

From Nance paper Δ = ~8% Δ SD√(1/N1 + 1/N2) = 2.82 Solve for SD to get SD =~ 6.8%

Software Output for Gilchrist Paper

Some Logic

Back to: How was 498 determined?

How IMT Change Comparison Will be Made Strength of Treatment Effect: Signal:Noise Ratio t= Observed Δ SD√(1/N1 + 1/N2) Δ = Aggressive - Standard Mean Diff in IMT changes SD = Std Dev of within group IMT changes N1 = N2 = Group size | t | > ~1.96 ↔ p<0.05

Could Solve for N Observed Δ SD√(1/N1 + 1/N2) t = ≥~1.96 if (with N = N1 = N2): 2SD2 Δ2 Δ ≥ 1.96SD√(2/N) orN ≥ (1.96)2 This is not quite right. The Δ is the actual observed difference. This sample Δ will vary from the real Δ in “everyone”. Need to increase N in case the sample happens to have a Δ that is lower than the real Δ (50% possibility).

Need to Increase N for Power Power is the probability that p<0.05 if Δ is the real effect, incorporating the possibility that the Δ in our sample could be smaller. 2SD2 Δ2 (1.96)2 N = for 50% power. Need to increase N to: 2SD2 Δ2 N = (1.96 + 0.842)2 for 80% power. 2SD2 Δ2 N = (1.96 + 1.282)2 for 90% power. from Normal Tables

Info Needed for Study Size: Comparing Means 2SD2 Δ2 N = (1.96 + 0.842)2 • Effect • Subject variability • Type I error (1.96 for α=0.05; 2.58 for α=0.01) • Power (0.842 for 80% power; 1.645 for 95% power) Same four quantities, but different formula, if comparing %s, hazard ratios, odds ratios, etc.

2SD2 Δ2 N = (1.96 + 0.842)2 2(0.16)2 (0.0425)2 N = (1.96 + 0.842)2 = 224 Each group N for 10% Dropout → 0.9N = 224 → N = 224/0.9 = 249. Total study size = 2(249)=498

Change Effect Size to be Detected

SD Estimate Could be Wrong Should examine SD as study progresses. May need to increase N if SD was underestimated.

Some Study Size Software

Free Study Size Software www.stat.uiowa.edu/~rlenth/Power

Study Size Software in GCRC Lab ncss.com ~$500

nQuery - Used by Most Drug Companies

Biostatistics Case Studies 2014

Biostatistics Case Studies 2014

Presentation Transcript

Biostatistics Case Studies 2007

OCR B Case Studies 2014

Biostatistics Case Studies

Biostatistics Case Studies 2005

Biostatistics Case Studies 2006

Biostatistics Case Studies 2006

Biostatistics Case Studies 2008

Biostatistics Case Studies

Biostatistics Case Studies 2010

Biostatistics Case Studies 2007

Biostatistics Case Studies 2010

Biostatistics Case Studies 2008

Biostatistics Case Studies 2007

Biostatistics Case Studies 2010

Biostatistics Case Studies 2014

Biostatistics Case Studies 2009

Biostatistics Case Studies 2010

Biostatistics Case Studies 2006

Biostatistics Case Studies 2005