Best-Worst Scaling in Health: Pros and Cons

Advantages and disadvantages of the use of best-worst scaling in the field of health Terry Flynn PhD MRC HSRC, Bristol

Outline • What is best-worst scaling? • How has it been used in HSR to date? • Application: dermatology trial • Application: quality of life • Advantages and disadvantages • Areas for research

Traditional DCEs • Discrete Choice Experiments increasingly used in HSR • Respondents choose a preferred specification of the good or service • Aim is to obtain quantitative estimates of utility (benefit) associated with different attribute levels describing the good or service

The issue of interest here Generally dermatology patients would prefer: • Being seen by a consultant-led team rather than a GP with part-time interest in dermatology • An appointment this week to one in 3 months But suppose the choice is between an appointment this week with a GP specialist and one in 3 months with a consultant. Which do patients value most? Doctor expertise or waiting time?

An example Appointment A Appt this week GP specialist Easy to get to (S)he is thorough You pay £5 Appointment B Appt in 3 months Consultant Difficult to get to Isn’t thorough You do not pay Which appointment would you choose?

Application 1 Estimating preferences for aspects of a dermatology appointment

Dermatology trial example

Best-Worst Scaling • Devised by Finn & Louviere (JPPM 1992) • introduced to health care by McIntosh & Louviere (HESG 2002) • statistical proof paper Marley & Louviere (J Math Psych 2005) • ‘user guide’ by Flynn et al (JHE 2006) • Differs from traditional DCEs in the nature of the choice task • Individuals choose the best and the worst attribute based on the levels displayed in a given specification

Dermatology trial • Patients who had been referred to secondary care for skin complaint • Postal questionnaire • Randomly assigned to short version (8 DCE scenarios) or long (16) • 202 out of 240 q’airres returned (139 complete) • Each scenario is a SINGLE consultation described by waiting time, expertise of doctor, ease of attending and thoroughness

Attributes & levels • Waiting time • 3 months • 2 months • 1 month • 1 week • Doctor expertise • Part time specialist (GPSI) • Full time specialist (consultant) • Ease of access • Easy • Difficult • Individualised care • Thorough • Not thorough

Attribute levels

Attribute impacts

BWS estimated differences

Multinomial (conditional) logit analysis • Effect of patient characteristics (clinical or sociodemographic) upon preferences • Separate effects of age/sex etc upon attribute importance from effects upon level scales • Independent variables are version of effects coding – epidemiological example: mean effect across both sexes is estimated, with effect code giving additional effect for one sex (the other is this multiplied by minus 1)

Fully adjusted MNL results Estimate Std Error z p>|z| [95% confidence interval] Attributes Waiting time | - - - - - - Dr | 1.342555 .1117852 12.01 0.000 1.12346 1.561650 Convenience | .5544422 .1045399 5.30 0.000 .3495477 .7593367 Indivcare | .3628801 .1053237 3.45 0.001 .1564495 .5693108 Levels wait3m | -1.958953 .1605818 -12.20 0.000 -2.273687 -1.644218 wait2m | -1.117335 .1493553 -7.48 0.000 -1.410066 -.8246039 wait1m | .2137621 .1457884 1.47 0.143 -.0719779 .499502 wait0m | 2.862526 - - - - - drpttime | -1.470253 .1035633 -14.20 0.000 -1.673234 -1.267273 drfulltime | 1.470253 - - - - - convhard | -1.185982 .102335 -11.59 0.000 -1.386555 -.9854091 conveasy | 1.185982 - - - - - indivno | -2.843362 .1205684 -23.58 0.000 -3.079671 -2.607052 indivyes | 2.843362 - - - - -

Higher education Estimate Std Error z p>|z| [95% confidence interval] Attributes educ_dr | -.1317751 .0923188 -1.43 0.153 -.3127166 .0491665 educ_conv | .0564573 .0860605 0.66 0.512 -.1122183 .2251328 educ_indiv | .0145355 .0862812 0.17 0.866 -.1545725 .1836435 Levels educ_3m | -.4883798 .1332613 -3.66 0.000* -.7495672 -.2271924 educ_2m | -.2318958 .1232679 -1.88 0.060 -.4734964 .0097049 educ_1m | .2727426 .1212335 2.25 0.024* .0351292 .5103559 educ_0m | .4475330 - - - - -educ_drpt | -.1920152 .085256 -2.25 0.024* -.3591139 -.0249165 educ_drft | .1920152 - - - - - educ_convh~d | -.3173861 .0854444 -3.71 0.000* -.4848541 -.1499182 educ_conve~y | .3173861 - - - - - educ_indivno | -.4161934 .0982534 -4.24 0.000* -.6087665 -.2236203 educ_indivye | .4161934 - - - - -

Scoring 7+/30 on skin severity Estimate Std Error z p>|z| [95% confidence interval] Attributes score7_dr | -.3202987 .0886460 -3.61 0.000* -.4940416 -.1465559 score7_conv | -.1181401 .0826505 -1.43 0.153 -.2801322 .0438519 score7_indiv | -.1738758 .0823243 -2.11 0.035* -.3352284 -.0125232 Levels score7_3m | -.1215269 .1246303 -0.98 0.330 -.3657979 .122744 score7_2m | -.2264255 .116925 -1.94 0.053 -.4555942 .0027433 score7_1m | .022866 .1132425 0.20 0.840 -.1990853 .2448173 score7_0m | .3250864 - - - - - score7_drpt | .1038593 .0744481 1.40 0.163 -.0420564 .2497749 score7_drft | -.1038593 - - - - - score7_con~d | .1480246 .0727303 2.04 0.042* .0054759 .2905734 score7_con~y | -.1480246 - - - - - score7_ind~n | .2354545 .0826623 2.85 0.004* .0734394 .3974696 score7_ind~y | -.2354545 - - - - -

Implications for dermatology • Policies to improve ‘process’ aspects of the consultation will benefit higher sociodemographic groups most • Policies to improve waiting times will benefit those patients who they themselves feel most affected by their skin condition

Statistical issues • MNL is (usually) a first step • Is there heterogeneity? • Likely covariates that characterise it? • More complex methods? • Mixed logit • what distributional assumption? • lots of parameters in BWS: 72 possible pairs here • Latent class analysis • Non/semi parametric

Application 2 Estimating tariffs for the ICECAP quality of life instrument for older people

Heterogeneity It’s one thing to know what the ‘average’ preference for an impaired health state is in the population……but suppose the poor/ill regard that state as being particularly dreadful – any decision to take (or not take) this into consideration requires us to find out if the poor/ill have different preferences

Heterogeneity (2) • The use of population-level tariffs might mean some interventions are deemed cost-ineffective when for the poor/ill they are highly cost-effective • Even if we don’t want to move away from population-level provision society should have the data to debate this

Aim • To produce a set of ‘tariffs’ for the 45=1024 possible quality of life scenarios that a British older person might experience • An older person could tick the box to indicate which of 4 levels (s)he is experiencing for each of 5 questions • e.g. before the meals-on-wheels service a score of 0.6 on a zero to one scale • after the meals-on-wheels service a score of 0.75 on a zero to one scale

The ICECAP quality of life instrument • Four levels • all; • a lot (many); • a little (few); • none • Example: role o I am able to do all of the things that make me feel valued P I am able to do many of the things that make me feel valued o I am able to do a few of the things that make me feel valued o I am unable to do any of the things that make me feel valued

The ICECAP quality of life instrument (contd) Similarly for: Attachment (love and friendship) Security (thinking about the future without concern) Enjoyment (enjoyment and pleasure) Control (independence)

A complete quality of life state

The best-worst scaling study • 315 completed interviews (478 approached to take part) • 255 had complete best-worst data • Average length of interview: 35 minutes Administered in older person’s own home • All had participated in Health Survey for England (HSE) • Data available from previous round of HSE (6-12 months previous) included sociodemographic and health (n=226)

Statistical design • Respondents randomised to: • Orthogonal main effects plan in 16 scenarios or • Its foldover

Population-level BWS estimates (n=255)

Heterogeneity in ICECAP

Latent class analysis • Performed on the choice data • Conditional logit results for each class • No adjustment for covariates • Need to know first of all if subgroups who are internally homogeneous exist • Then see if we can characterise these in terms of health/wealth/other factors • Covariate-adjusted conditional logit regressions (1-class) suggested there was heterogeneity…

LCA results

Statistical vs policy significance

Who are these people? • Can distinguish class three easily: disproportionately: • Male • Without any qualifications • Married (but only at 10% level) But so what? Class 1 vs class 2….?

Class 1 versus class 2 • Difficult to distinguish them • Having had a total joint replacement was predictor for class 2 (more bothered about attachments than control) • Being unable to climb 12 stairs was predictor for class 1 (more bothered about control than attachments) • Work with UTS researchers to investigate alternative characterisations of clustering

Advantages of BWS • All attribute levels on the same scale • More data • Estimate attribute impacts • Understand heterogeneity more easily; distributional assumptions not needed when have individual respondent utilities • Use as a method to get a random utility theory consistent set of rankings • Easier choice task? • Simpler statistical design

Disadvantages of BWS • The problem of the numeraire (money) • Conditional not unconditional demand • Nest within a DCE and adjust for different random utility components • Getting individual respondent models not practical in many contexts

Future research in Best-Worst methods • Individual patient preferences • clustering using other taxonomic methods • investigate decision rules (lexicographic preferences) • Estimating attribute importance (rather than simply impact) • Alternative conceptualisation of utility • Anchoring (the unconditional demand issue)

Investigating Choice Experiments for the Preferences of Older People (ICEPOP) Professors Joanna Coast (Birmingham) Jordan Louviere (UTS) Tim Peters (Bristol) & Dr Terry Flynn We would like to thank Dr Tony Marley for comments and assistance

Bristol sample 198 of the 1024 QoL states represented in Bristol ------------------------------------------------------------- Percentiles Smallest 1% .3477733 .1051461 5% .5968553 .2584114 10% .6542614 .2636209 Obs 810 25% .7704228 .2659647 Sum of Wgt. 810 50% .8608195 Mean .8291571 Largest Std. Dev. .1323457 75% .9135509 1 90% .9852881 1 Variance .0175154 95% 1 1 Skewness -1.44312 99% 1 1 Kurtosis 6.231093

ICECAP sample (313) 137 of the 1024 QoL states represented in BWS study ------------------------------------------------------------- Percentiles Smallest 1% .2659647 0 5% .5297861 0 10% .6329976 .1483945 Obs 313 25% .7576444 .2659647 Sum of Wgt. 313 50% .8515525 Mean .8137987 Largest Std. Dev. .1524833 75% .9135509 1 90% .9623603 1 Variance .0232512 95% .9982446 1 Skewness -2.020194 99% 1 1 Kurtosis 9.229487

Random Utility Theory • Let latent utility for item i be: Ui = di + ei Ui = latent utility, di = explainable portion & ei = unexplainable portion. • Probability that i is chosen: P(i | Cn) = P[(di + ei) > (dj + ej)]  j  Cn, if ’s ~ EV1 (0, 2) McFadden’s MNL model: P(i | Cn) = exp(di) / jexp(dj)

Best-Worst Scaling in Health: Pros and Cons