Evaluating Health-Related Quality of Life Measures

Evaluating Health-Related Quality of Life Measures Ron D. Hays, Ph.D. UCLA GIM & HSR February 10, 2014 (9:00-11:50 am) HPM 214, Los Angeles, CA

Where are we now in HPM 214?http://hpm214.med.ucla.edu/ • Introduction • Profile Measures • Preference-Based Measures • Designing Measures • Evaluating Measures  • Use of Measures in HIV/AIDS • PROMIS/IRT • Course Review (Cognitive interview assignment due) • Final Exam (3/17/14)

Four Levels of Measurement • Nominal (categorical) • Ordinal (rank) • Interval (numerical) • Ratio (numerical)

Ordinal Scale • In general, how would you rate your health is … • Excellent? • Very good? • Good? • Fair? • Poor?

Ordinal Scale • In general, how would you rate your health is … • 100 = Excellent? • 075 = Very good? [85] • 050 = Good? [60] • 025 = Fair? • 000 = Poor?

Interval Scales • “Everyday” Temperature Scales • Fahrenheit • Centigrade • 20°C + 20° C = 40°C • 40° C ≠2 times as hot as 20°C A 4- year old is twice as old as a 2-year old. If you subtract 1 from both of their ages, then 4 becomes 3 and 2 becomes 1. The 4-year old is still twice as old as the 2-year old despite the new age values being 3 versus 1 (i.e., “0” no longer means zero years).

Ratio Scales • Kelvin Temperature Scale (absolute 0) • Age • Days spent in hospital in last 30 days

Measurement Range for HRQOL Measures Nominal Ordinal Interval Ratio

Levels of Measurement and Their Properties

Four Types of Data Collection Errors Coverage Error • Does each person in target population have an equal chance of selection? Sampling Error • Only some members of the target population are sampled. Nonresponse Error • Do people in the sample who respond differ from those who do not? Measurement Error • Inaccuracy in answers given to survey questions. 11

Characteristics of Good Measures • Acceptability • Variability • Reliability • Validity • Interpretability

Indicators of Acceptability • Response rate • Administration time • Missing data (item, scale)

Variability • Responses fall in each response category • Distribution approximates bell-shaped “normal” curve (68.2%, 95.4%, and 99.6%)

Reliability Reliability is the degree to which the same score is obtained for thing being measured (person, plant or whatever) when that thing hasn’t changed. • Ratio of signal to noise

Observed Score is:

Flavors of Reliability • Inter-rater (rater) • Need 2 or more raters of the thing being measured • Test-retest (administrations) • Need 2 or more time points • Internal consistency (items) • Need 2 or more items

Reliability Minimum Standards 0.70 or above (for group comparisons) 0.90 or higher (for individual assessment) SEM = SD (1- reliability)1/2 95% CI = true score +/- 1.96 x SEM if z-score = 0, then CI: -.62 to +.62 when reliability = 0.90 Width of CI is 1.24 z-score units

Hypothetical Ratings of Performance of Six Students in HPM 214 by Two Raters Using Excellent to Poor Scale [1 = Poor; 2 = Fair; 3 = Good; 4 = Very good; 5 = Excellent] 1= John (Good, Very Good) 2= Ida (Very Good, Excellent) 3= Di (Good, Good) 4= Claire (Fair, Poor) 5= Adriane (Excellent, Very Good) 6= Ara (Fair, Fair) (Target = 6 students; assessed by 2 raters)

Kappa Coefficient of Agreement(Corrects for Chance)

Cross-Tab of Ratings Rater 2

Calculating KAPPA

Guidelines for Interpreting Kappa

Weighted Kappa(Linear and Quadratic) Wl = 1 – ( i/ (k – 1)) W q = 1 – (i2 / (k – 1) 2) i = number of categories ratings differ by k = n of categories Linear weighted kappa = 0.52; Quadratic weighted kappa = 0.77

Intraclass Correlation and Reliability Model Reliability Intraclass Correlation One-way Two-way mixed Two-way random BMS = Between Ratee Mean Square N = n of ratees WMS = Within Mean Square k = n of items or raters JMS = Item or Rater Mean Square EMS = Ratee x Item (Rater) Mean Square 25

01 13 01 24 02 14 02 25 03 13 03 23 04 12 04 21 05 15 05 24 06 12 06 22 Two-Way Random Effects (Reliability of Performance Ratings) Students (BMS) 5 15.67 3.13 Raters (JMS) 1 0.00 0.00 Stud. x Raters (EMS) 5 2.00 0.40 Total 11 17.67 df Source SS MS 6 (3.13 - 0.40) = 0.89 2-way R = ICC = 0.80 6 (3.13) + 0.00 - 0.40

Responses of Students to Two Questions about Their Health 1= John (Good, Very Good) 2= Ida (Very Good, Excellent) 3= Di (Good, Good) 4= Claire (Fair, Poor) 5= Adriane (Excellent, Very Good) 6= Ara(Fair, Fair) (Target = 6 students; assessed by 2 items)

Two-Way Mixed Effects (Cronbach’s Alpha) 01 34 02 45 03 33 04 21 05 54 06 22 Respondents (BMS) 5 15.67 3.13 Items (JMS) 1 0.00 0.00 Resp. x Items (EMS) 5 2.00 0.40 Total 11 17.67 Source SS MS df 3.13 - 0.40 = 2.93 = 0.87 Alpha = ICC = 0.77 3.13 3.13

Rating of 6 Students’ Health by 12 Family Members (2 per student) 1. John (fam1: Good, fam2: Very Good) 2. Ida (fam3: Very Good, fam4: Excellent) 3. Di (fam5: Good, fam6: Good) 4. Claire (fam7: Fair, fam8: Poor) 5. Adriane (fam9: Excellent, fam10: Very Good) 6. Ara (fam11: Fair, fam12: Fair) (Target = 6 students; assessed by 2 family members each)

01 13 01 24 02 34 02 45 03 53 03 63 04 72 04 81 05 95 05 04 06 12 06 22 One-Way ANOVA (Reliability of Ratings of Students) Respondents (BMS) 5 15.67 3.13 Within (WMS) 6 2.00 0.33 Total 11 17.67 Source MS SS df 3.13 - 0.33 = 2.80 = 0.89 1-way = 3.13 3.13

Standardized Alpha for Different Numbers of Items and Average Inter-item Correlation Average Inter-item Correlation ( r ) Number of Items (k) .0 .2 .4 .6 .8 1.0 2 .000 .333 .572 .750.889 1.000 4 .000 .500 .727 .857 .941 1.000 6 .000 .600 .800.900 .960 1.000 8 .000 .666 .842 .924 .970 1.000 Alphast = k * r 1 + (k -1) * r

Spearman-Brown Prophecy Formula ) ( N • alpha x alpha = y 1 + (N - 1) * alpha x N = how much longer scale y is than scale x

Example Spearman-Brown Calculations

Number of Items and Reliability: Three Versions of the Mental Health Inventory (MHI)

Multitrait Scaling Analysis • Internal consistency reliability • Item convergence • Item discrimination

Item-scale correlation matrix 36

Item-scale correlation matrix 37

Validity • Does instrument measure what it is supposed to measure? • A “validated” instrument is a holy grail

Reliability and Validity

Threats to Validity • Acquiescent Response Set • Socially Desirable Response Set

Listed below are a few statements about your relationships with others. How much is each statement TRUE or FALSE for you? 1. I am always courteous even to people who are disagreeable. 2. There have been occasions when I took advantage of someone. 3. I sometimes try to get even rather than forgive and forget. 4. I sometimes feel resentful when I don’t get my way. 5. No matter who I’m talking to, I’m always a good listener. Definitely true; Most true; Don’t know; Mostly false; Definitely false

Two Types of Validity • Content Validity • Includes face validity • Construct Validity • Many Synonyms

Content Validity • Does the measure adequately represent the domain? • Do items operationalize concept? • Do items cover all aspects of concept? • Does scale name represent item content? • Face validity is extent to which measure “appears” to reflect what it is intended to • E.g., by expert judges or by patient focus groups

Construct Validity • Do scores on a measure relate to other variables in ways consistent with hypotheses?

Evaluating Construct Validity Cohen effect size rules of thumb (d = 0.2, 0.5, and 0.8): Small correlation = 0.100 Medium correlation = 0.243 Large correlation = 0.371 r = d / [(d2 + 4).5] = 0.8 / [(0.82 + 4).5] = 0.8 / [(0.64 + 4).5] = 0.8 / [( 4.64).5] = 0.8 / 2.154 = 0.371 (Beware r’s of 0.10, 0.30 and 0.50 are often cited as small, medium, and large.)

Average HRQOL Scores for Comparison Groups and Deviation Scores for Patients With Chronic Conditions From Stewart AL Greenfield S, Hays RD, et al. Functional stth chronic conditions. JAMA 1989;262:907-913.

Relative Validity Analyses • Form of "known groups" validity • Relative sensitivity of measure to important clinical difference • One-way between group ANOVA

Relative Validity Example

Responsiveness to Change • HRQOL measures should be responsive to interventions that changes HRQOL • Need external indicators of change (Anchors)

Self-Report Indicator of Change • Overall has there been any change in your asthma since the beginning of the study? Much improved; Moderately improved; Minimally improved No change Minimally worse; Moderately worse; Much worse

Evaluating Health-Related Quality of Life Measures

Evaluating Health-Related Quality of Life Measures

Presentation Transcript

Overview of Health-Related Quality of Life Measures

Measures of Health-Related Quality-of-Life

Neuro -QOL: Health-Related Quality of Life Measures for Neurology Research and Practice

Modeling Health-Related Quality of Life over Time

Health-Related Quality of Life (HRQOL)

Health-Related Quality of Life Measures (HLT POL 239B)

Measuring Health-Related Quality of Life

Overview of Health-Related Quality of Life Measures

Evaluating Health-Related Quality of Life Measures

Health-Related Quality of Life Preference Measures for Vision Studies

Measuring health-related quality of life in evaluating healthcare

Estimating Minimally Important Differences (MIDs) of Health-Related Quality of Life Measures?

“A Critical Look at Health-Related Quality of Life Measures” SGIM Annual Meeting

Health-Related Quality of Life Assessment

Neuro -QOL: Health-Related Quality of Life Measures for Neurology Research and Practice

Estimating Minimally Important Differences (MIDs) of Health-Related Quality of Life Measures?

Health-Related Quality of Life (HRQOL)

“A Critical Look at Health-Related Quality of Life Measures” SGIM Annual Meeting

Health-Related Quality of Life in Outcome Studies