1 / 26

Equivalence Testing

Equivalence Testing. Peter A. Lachenbruch Director, Division of Biostatistics CBER. Do Two Antigens Evoke Similar Responses?. Do new antigen and standard antigen have the same mean response? A challenge of each antigen is applied to two sides on the back of a subject.

parker
Télécharger la présentation

Equivalence Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Equivalence Testing Peter A. Lachenbruch Director, Division of Biostatistics CBER

  2. Do Two Antigens Evoke Similar Responses? • Do new antigen and standard antigen have the same mean response? • A challenge of each antigen is applied to two sides on the back of a subject. • The application should be randomly determined • This assumes no systemic effects that could muddy the comparison

  3. Two Antigens (2) • After a period of time, the sizes of the wheals from each antigen is measured • To show the equivalence, we wish to show that the wheals are within a small limit of each other. • What is small? • What’s the correct method?

  4. Two Antigens (3) • Set the margin as a fraction (10% is often used) of the mean of the standard • Select an absolute value for the margin • this will often be based on previous experience, so may not differ much from the fraction. • Require the difference of the means to be within a margin of 0

  5. Difference -0.3 0.3 The heavy lines indicate a difference in mean level. If the difference in means is in the heavy area, the means are different. We want to show that the true mean difference is in the lightly shaded area.

  6. Two Antigens (4) • Suppose we found the mean difference of the pairs and their standard deviation to be

  7. Two Antigens (5) • Concluding equivalence since t is “not significant” is an error because we can choose a small sample size and have no power to detect a difference. • Suppose the historically known wheal size was 3 cm, then 10% would be 0.3 cm

  8. Two Antigens (6) • We can compute a 90% confidence interval, and note if it is entirely contained in the equivalence region: 0.2±1.66*2.5/100 (-0.215, 0.615) • This overlaps the region (-0.3, 0.3)

  9. Confidence Interval • Acceptable Region -0.3 0.3 -------------^----------^------------------ Observed Confidence Interval -0.215 0.615 ---------------^---------------------^------ They don’t overlap. We can’t conclude equivalence.

  10. Two Antigens (7) • Another method is to test the joint null hypothesis • H01: μ1 - μ2> δ OR • H02: μ1 - μ2< -δ • By rejecting both of these hypotheses, we can conclude that | μ1 - μ2|< δ

  11. Conclusions • Discuss selection of margin • Sample size based on null hypothesis of non-equivalence • Use confidence interval or two one-sided tests • Show both ITT and PP analyses • Provide evidence of trial validity • design • efficacy of control

  12. Concerns • In comparative trials detection of difference implies the trial had sufficient quality to detect it • In equivalence trials, sloppy design, execution, analysis move conclusions toward equivalence • Finding equivalence doesn’t mean that both treatments were effective • Sometimes a third, placebo arm is valuable • Some entry criteria let non-diseased patients in trial

  13. Concerns (2) • Some clearly effective drugs do not show effectiveness in each study • Population may be less responsive than in licensure study • Sample size may be too small to show difference • Must show that active control is superior to untreated control by a given amount • This is principal advantage over the historical control study • With large variability, it may be useful to include a placebo group

  14. Concerns (3) • Design - since much information is available, there is no excuse for poor design • Double blinding • Randomization • Inclusion/Exclusion criteria should be same as for trials of active comparator • Dosing and schedule for active comparator should be as licensed • Outcomes for active comparator should be similar to previous trials

  15. Concerns (4) • Paired or unpaired • Paired design can reduce variance substantially • If permanent change occurs (e.g., vaccination) a paired design is not possible • Analysis - which patients to include? • Intention to treat (ITT) - as randomized tends to obscure differences - just what we don’t want • Per protocol (PP)- by treatment actually received - but can also lead to biases if there are unequal dropout patterns

  16. Concerns (5) • Useful to perform both ITT and PP analyses • Analysis may show: • Confidence interval lies entirely within the equivalence range - conclude equivalence • Some points of the confidence interval are outside the equivalence range - can’t conclude equivalence • It is possible for the two treatments to be significantly different, but the confidence interval lie within the equivalence range

  17. Concerns (6) • The test of the null hypothesis is not appropriate • A test of the relevant null hypothesis (H0:|1 - 2|> ) should be done • Even when TOST is used, the other concerns remain

  18. Two Antigens (8) • We may test that the difference is greater than 0.3 cm or less than -0.3 cm • We perform two one-sided t tests

  19. Two Antigens (9) • First we test the hypothesis that the mean difference is greater than 0.3. If t1 < -1.66 reject the hypothesis (t1=-0.4>-1.66, so we don’t reject) • We then test the hypothesis that the mean difference is less than -0.3. If t2is greater than 1.66 we reject (t2=2.0> 1.66, so we reject)

  20. Two Antigens (10) • Since t1 does not reject H01:d>0.3 and t2 rejects H02:d<-0.3, we cannot reject the hypothesis of non-equivalence. • A 90% confidence interval is (-0.215,0.615) which overlaps (-0.3,0.3)

  21. Testing for no difference • The null hypothesis is that the two allergens are the same (H0: 1 = 2) • The alternative is that they are not. Ha:12 • Failing to reject the null hypothesis is not the way to show equivalence

  22. No difference test • If we fail to reject, this has not given much evidence that they really aren’t different • Study is not powered to show a difference? • Sloppy study conduct tends to show no effect • Wrong treatment assignment reduces difference

  23. Equivalence • The proper null hypothesis is H0:1-2 which says the difference of means is “too large” • The alternative is the converse. We usually compute the power at H0:1-2=0

  24. Equivalence model • This tests the null hypothesis that the treatments are different. By rejecting it, we conclude the alternative: they are the same. • Usually, the sample size requirements are a little larger than for a test of equality. By rejecting the hypothesis of equality, we conclude they are different.

  25. TOST=Two one-sided tests • The null hypothesis is usually tested by performing two tests at the  level:

  26. Equivalence -0.3 0.3 The heavy line is the region of equivalence. If the mean difference is in this region, they are equivalent.

More Related