Outcome Measures in Psychiatric Research Vaughan Bell School of Psychology, Cardiff University
Why bother ? • The theory can be complex, the subject matter dry and the work hard. • But, knowing how to create and assess measures opens up an important skill set. • Critically assess research • Assess the way patients (or you!) are measured • Design and implement your own research • Deploy / assess empirical methods in patient care
Outline • Part1 • What is outcome ? • Types of measure • What attributesdoes a good measure need ? • Possible confounding factors and caveats. • Ethical issues • Part 2 • Workshop on creating / assessing measures
What is Outcome ? • In the schizophrenia literature (Brekke et al, 1993), outcome is classified in three ways: • Clinical outcome: signs, symptoms, service utilisation. • Functional outcome: social, vocational, independent living. • Subjective outcome: patient’s experiences of outcome. • It is important to be clear on what sort of outcome you want to assess and why.
Types of Measure • Clinical metrics: length of hospital stay, readmissions, use of PRN medication etc etc. • Psychometric scales: quantifies some aspects of psychological function or behaviour. • Self-report • Structured / semi-structured interview • Multi-rater measures: multiple people are asked to rate the same material – perhaps with one of the methods above. • Tasks: e.g. experimental / neuropsychogical tests
Purpose • Screening tool: Determines whether a symptom or psychological trait is present. • Aid to diagnosis: typically increases the reliability of ‘bedside’ diagnoses. • Quantification: allows symptoms or traits to be quantified by intensity, duration etc. • Dimensional scale: measures an attitude or trait through its range in the population.
Attributes of a Good Measure • The two essential features of a good measure, are: • Reliability: measures consistently. • Validity: measures what it is designed for. • Each of these have various sub-categories that need to be fulfilled.
Reliability • Reliability must be established first, as validity relies on it. • i.e. A reliable scale could measure nonsense, but do so consistently… • …but it is impossible for a measure to be inconsistent and measure what it is designed for.
Reliability • Common forms… • For psychometric scales: • Internal reliability – “are similar items answered in similar ways ?” • Test-retest reliability – “does the test produce similar results when used on the same people on different occasions ?” • For multi-rater observational measures: • Inter-rater reliability – “does the measure produce similar results when used by different observers”
Internal Reliability “Are similar items answered in similar ways ?” • Typically tested with Cronbach’s Alpha. • An alpha above 0.7 is usually considered satisfactory (Kline, 1993) • If a measure has multiple independent factors, they may need to be tested separately.
Test-Retest Reliability “Does the test produce similar results when used with the same people on different occasions ?” • Typically tested with Pearson correlation. • Results from first occasion are correlated with results from second occasion. • Correlation should be above 0.8 (Kline, 1993) • Assumes that the object of measure is stable between occasions. • Therefore, this is usually tested on non-clinical groups.
Inter-rater Reliability “Does the measure produce similar results when used by different observers ?” • Typically tested with Cohen’s Kappa. • Cohen’s Kappa controls for problems with directly comparing multiple raters’ scores. • e.g. one rater consistently scoring five-points more than the other will correlate despite not agreeing. • A two point rating (e.g. symptom present / absent) will have 25% agreement just by chance.
Cohen’s Kappa • Kappa values and level of agreement between raters (Landis and Koch, 1977): • Fair: 0.21 - 0.40 • Moderate: 0.41 - 0.60 • Substantial: 0.61 - 0.80 • Almost perfect: 0.81 - 1.00
Validity • Face validity – “does the perception of the measure influence the outcome ?” • Content validity – “does the measure cover everything it needs to cover ?” • Construct validity – “do the results of the measure agree with what theory predicts ?” • Criterion validity – “does the measure fulfil expected criteria ?” – usually the performance of a certain group • Incremental validity – “does it measure anything new ?”
Face Validity “does the perception of the measure influence the outcome ?” • If participants or testers misperceive the nature of the measure, it may affect the results. • e.g. if someone takes a verbal memory test but thinks it is a creative thinking test, it may look like they are confabulating. • Similarly, asking ‘are you depressed ?’ may have good face validity for depression • …but asking ‘are you deluded ?’ has poor face validity for delusions.
Content Validity “does the measure cover everything it needs to cover ?” • i.e. it is comprehensive ? • If a measure of anxiety asks only about social anxiety, it doesn’t have good content validity. • This is usually assessed by comparing, or generating the measure, based on • Known phenomena • Literature reviews
Construct Validity “do the results of the measure agree with what theory predicts ?” • There are two ways of assessing this: • Convergence – correlates with measures of things known to be associated • Divergence – negatively correlates with things known to be mutually exclusive. • e.g. a good measure of depression should correlate with a measure of low mood • …but negatively correlate with a measure of self-esteem.
Construct Validity Or, a good measure or anxiety should predict performance, as per the Yerkes-Dodson Law (1908)
Criterion Validity “does the measure fulfil certain criteria ?” • Often, this can be the same as construct validity (e.g. correlates with similar measures) • It is often tested by asking members of a certain group to take the test… • …who are known to have high levels of the measure being attributed. • e.g. people with psychosis should score higher than the general population on a good measure of anomalous perceptual experience.
Incremental Validity “does it measure anything new ?” • An assessment of what the measure adds to the ‘toolkit’ of psychological assessment. • If it measures exactly the same as something else, in the same way… • …there may not be any point in developing it.
Wider Validity Issues • Rarely tackled in the textbooks, but it is important to assess how the validity tests were carried out. • Has validity been established for: • different cultural groups ? • different ages ? • someone with a disability ? • etc.
Other Ongoing Changes • Particularly in the clinical environment, there may be a number of difficult-to-disentangle influences. • Children present a particular challenge as it might be difficult to separate the effects of: • Cognitive development • Psychopathology • Treatment • Task engagement / motivation
Covariates • One way of controlling for this is to use covariates in statistical analysis, particularly ANOVA. • e.g. I want to see if my Special Brain Power Training™ boosts children’s intelligence. • I gave class A the training and see if they score more highly on an IQ test than class B. • However, class A are, on average, older, so they are likely to do better anyway. • I introduce age as a covariate into the analysis, to cancel-out its effects and make a fairer comparison.
Confounding Factors • Other factors can sometimes be more difficult to deal with: • Floor / ceiling effects: Where the test is so hard or easy that it is not possible to differentiate between participants. • Therefore, it is important to pilot the measure, and compare performance with norms. • Emotional impact: Testing can be stressful for healthy individuals. • For patients, especially so, particularly if they see themselves as ‘failing’ the tests.
Ethical Issues • There are distinct ethical (and potentially legal) implications when using such tests, e.g.: • Are you using the measure for clinical decisions ? • If so, are you qualified and competent to develop / deploy / interpret the measure ? • If not, do you have adequate supervision from someone who is ?
Ethical Issues • Is the measure being used for research? • If so, has the research been given ethical approval? • Are you asking for informed consent ? • Does the patient know this is not part of their standard care ? • Are the results being kept private or anonymised ?
Conclusions • The purpose of measurement and the type of measure are crucial. • Measures need to be reliable and valid. • You need to be aware of possible confounding factors. • You need to be clear whether the context is research or clinical… • …and know and abide by the ethics of each.