Clinical Significance for Quality of Life Endpoints in Clinical Trials

Jeff A. Sloan, Ph.D. Mayo Clinic, Rochester, MN, USA Clinical Significance for Quality of Life Endpoints in Clinical Trials FDA/Industry Statistics Workshop Washington, September 16, 2005

Primary goal: advance the state of the science to help cancer patient QOL soar

Take home message:there is good news • There are problems with using QOL assessments as indicators of efficacy in clinical trials. • There are scientifically sound solutions to these problems. The problems have been disseminated widely and consistently. The solutions have not.

It takes a certain amount of bravery to work in QOL research

Science is a candle in the dark - Carl Sagan We will use the candle of science to improve the QOL of cancer patients

How do you determine the clinical significance of QOL assessments?

What is a clinically meaningful QOL burden?

Why is it difficult to define “clinical significance” for QOL? • Pain analogy • 25 years ago physicians were the sole raters of patient pain • JCAHO 2000 guideline: every patient’s pain to be assessed upon intake on a 0-10 scale • Time and experience alleviates novelty and skepticism, and guidelines evolve

Why is it difficult to define “clinical significance” for QOL? • Blood pressure analogy • 100 years ago, clinical significance of BP scores was unknown (Lancet 1899) • massage therapy was the gold standard • present guidelines for BP clinical significance today redrawn (McCrory DC. Lewis SZ. Chest. 126(1 Suppl):11S-13S, 2004)

The solution found for tumor response cutoffs may provide guidance • We call a reduction of 50% a response. • Have reductions of 49% all the time, but do not worry about misclassification. • Moertel (1976) basis for 50% cutoff • Find a cutoff and stick to it? (RECIST)

What Clinical significance is NOT • Statistical significance • Example drawn from JCO 2001 (anonymous) • HSQ before / after scores on 1300 patients • all p-values <0.0001 • conclusion: all domains of QOL were significantly different across treatment groups • problem: 1300 patients provides 80% power to detect a change of 1 unit on 0-100 point scale

EORTC QLQ-LC13 • Item n=537 n=346 Effect Size • Coughing 46.2 44.3 small • Dyspnea 17.2 16.2 small • Pain 26.9 25.5 small • all p-values were statistically significant

The Six Papers • 1) Methods used to date • 2) Group versus individual differences • 3) Single item versus multi-item • 4) Patient, clinician, population perspectives • 5) Changes over time • 6) Practical considerations for specific audiences • MCP, April, May, June 2002

No single statistical decision rule or procedure can take the place of well-reasoned consideration of all aspects of the data by a group of concerned, competent, and experienced persons with a wide range of scientific backgrounds and points of view. Canner (1981) If it looks like a duck, sounds like a duck, and walks like a duck, the odds of it being a worm or an elephant in a clever disguise are small in the extreme. Sloan (2001)

Bottom Line • Assessing the clinical significance of QOL can be as simple as a 10-point change on a 100-point scale, if that is consistent with the goals of the scientific enquiry. The real issue underlying the controversy over QOL is the relative novelty and lack of experience that presently exists with QOL. With time and familiarity this too shall pass. (Sloan, J Chronic Obs. Pul. Dis. 2: 57-62, 2005.)

Presenting global solutions is always interesting you me

Two general methods for clinical significance • Anchor-based methods requirements • independent interpretable measure (the anchor) which has appreciable correlation between anchor and target • Distribution-based methods • rely on expression of magnitude of effect in terms of measure of variability of results (effect size)

The MID method in one slide

The Empirical Rule Effect Size (ERES) Approach (Sloan et al, Cancer Integrative Medicine 1(1):41-47, 2003) • QOL tool range = 6 standard Deviations • SD Estimate =100 percent / 6 = 16.7% of theoretical range • Two-sample t-test effect sizes (J Cohen, 1988): small, moderate, large effect(0.2, 0.5, 0.8 SD shift) • S,M,L effects = 3%, 8%, 12% of range

The Empirical Rule • Tchebyshev’s Theorem: at least 1-1/k2 of any distribution will fall within k standard deviations (SD’s) of the mean • If the distribution is symmetric, 99% will fall within 3 standard deviations • The pdf for the range is a function of the SD • an estimate of the SD can be obtained via • range = 6 SD

Assumption Checking for ERES(Dueck, Sloan, 2006, J. Biopharm. Stats, in press) • Tchebyshev’s Inequality is conservative • Tested the effect of various distributional assumptions • Only a uniform distribution results in deviation from the assumption of a 6 SD-based estimate (28% instead of 17%)

All Methods Give Similar Answers • Cohen - 1/2 SD is moderate effect • MCID - 1/2 point on 7-point Likert • 7-1 = 6 point range ==> SD of 1 unit • so 1/2 point ==:> 1/2 SD • Cella - 10 point on FACT-G • 10/1.12 = 8.9% / 16.7% = 1/2 SD • Feinstein - correlation approach • Cohen was arbitrary, should be 0.6 SD

The Good News • Statistical, Philosophical, Empirical, Clinical, Historical, Practical approaches to defining a clinically significant effect for symptom assessments are all in the same ballpark • A 10 point difference on a 100-point scale (1/2 SD) is almost always going to be clinically significant • Smaller differences may also be meaningful (data) • Applies to groups or individuals (just different SD) Norman GR, Sloan JA, Wyrwich KW. Expert Review of Pharmacoeconomics and Outcomes Research Sept 2004; 4(5): 515 – 519 Sloan JA, Cella D, Hays R. J Clin Epidemiol (in press).

Four Guidelines(Sloan, Cella, Hays, JCE 2005,in press) • The method used to obtain an estimate of clinical significance should be scientifically supportable. • The ½ SD is a conservative estimate of an effect size that is likely to be clinically meaningful. An effect size greater than ½ SD is not likely to be one that can be ignored. In the absence of other information, the ½ SD is a reasonable and scientifically supportable estimate of a meaningful effect.

Four Guidelines(Sloan, Cella, Hays, JCE 2005,in press) • Effect sizes below ½ SD, supported by data regarding the specific characteristics of a particular QOL assessment or application, may also be meaningful. The minimally important difference may be below ½ SD in such cases. • If feasible, multiple approaches to estimating a tool’s clinically meaningful effect size in multiple patient groups are helpful in assessing the variability of the estimates. However, the lack of multiple approaches with multiple groups should not preemptively restrict application of information gained to date.

Summary • Defining clinical significance for QOL assessments is today where pain was 25 years ago, tumor response was 50 years ago and blood pressure was 100 years ago • Define clinical significance a priori, and use the definition in the analytical process • Consensus is building as the answers from different approaches are similar and relatively robust

New ideas have enabled us to make advances in QOL science

A Mayo/FDA meeting regardingguidance on patient-reported outcomes (PRO)Discussion, Education, and Operationalization • FDA to release guidances for assessing PRO’s in all clinical trials (3rd quarter 2005?) • Meeting co-sponsored with FDA to: • provide a focused process to facilitate discussion among all stakeholders • educate stakeholders on background, content, and concerns • provide an opportunity for input • delineate ways to best operationalize the guidance into clinical trials • February 23-25, 2006, DC (Westfields Marriott, Chantilly, VA, 7 miles from Dulles)

The NCCTG QOL Team

Thank you Email: jsloan@mayo.edu

Clinical Significance for Quality of Life Endpoints in Clinical Trials