Reliability and Validity Testing

Reliability and Validity Testing

Definitions • Validity - the extent to which a test measures what it is designed to measure • Reliability - the extent to which a test or measure is reproducible

Validity • Logical (face) - how much the measure obviously involves the performance. • Construct - how well the measure relates to the theory • Content - how well the outcome evaluates the intervention • Criterion - how well the test measures against a set standard

Assessment of Validity • Criterion validity • Concurrent • Predictive • Prescriptive

Bland and Altman Bias Dispersion of the Bias Relationship of Bias to value M = Experimental measured value GS = Gold Standard measured value

Advantages Easy to interpret visually Can indicate bias in measurements Can be clinically useful Useful for validity Disadvantages Difficult for more than two raters or datasets More complex to interpret Needs high numbers Should also report raw data to interpret variation Bland and Altman Limits of Agreement

Reliability • A measure CANNOT be valid but NOT reliable • However a measure CAN BE reliable but NOT valid

Reliability Observed score = True score + Error score True score hard to evaluate but we can estimate the error score

Sources of Error The Participants

Sources of Error The Testing Poor directions Additional motivation Inconsistent protocol

Sources of Error The Scoring The scorers Type of scoring system

Sources of Error The Instrumentation Calibration Inaccuracies Sensitivity

Statistical techniques • Pearsons r • ICC • Limits of agreement • Cronbachs alpha • Kappa statistic • Weighted kappa statistic

Pearsons r Weaknesses • Bi-variate • Limited to two variables • Does not consider differences in variance • Only measures association not agreement • Not really appropriate for reliability

Strengths Univariate Allows for unequal cell numbers Value from -1 to +1 Allows any number of raters or subjects Weaknesses Has several formulae Does not imply usefulness Ratios can be difficult to compare Between subject variation should reflect population Intra-class correlation (ICC)

Calculation Variance between (due to) repeated trials Variance between (due to) repeated observers/observations Variance from ANOVA model = Mean Squares

Shrout and Fleiss formulae Case 1: Each subject rated by a different set of k raters randomly selected from a larger population of raters Case 2: A random sample of k raters, selected from a larger population of raters, rates each subject Case 3: Each subject is rated by k raters who are the only raters of interest

Cases (1,1), (2,1) & (3,1) are used when the unit of measurement is obtained from only one measurement • Cases (1,k), (2,k) & (3,k) are used when the unit of measurement is obtained from more than one measurement (i.e. a mean measurement)

How to calculate • Use equations and values obtained from ANOVA’s (Rankin and Stokes, 1998) • Use macros downloaded from SPSS.com (may not work with all versions of SPSS)

Cronbachs Alpha • Generalised measure of reliability • Easy to interpret • Similar to intraclass correlation

Kappa statistics • Kappa statistic • Nominal data • Weighted Kappa statistic • Ordinal data

Generating ICC’s Need • Correct macro • Data laid out appropriately • Two lines of syntax to run macros • All files resident in the same directory

References • Sim J (1993) Measurement validity in Physical Therapy research. Physical Therapy, 73 (2); 48-55 • Rankin G, Stokes M (1998) Reliability of assessment tools in rehabilitation: an illustration of appropriate statistical analyses. Clinical Rehabilitation, 12; 187 • Bland JM, Altman DG (1986) Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, Feb 8; 307-310. • Kreb DE (1984) Intraclass correlation coefficients: Use and calculation. Physical Therapy, 64 (10); 1581-1582. • Thomas JR, Nelson JK (2001) Research Methods in Physical Activity 4th Ed. Human Kinetics, Leeds. • George,K, Batterham,A & Sulliavan,I (2000) Validity in clinical research: a review of basic concepts and definitions. Physical Therapy in Sport, 1; 19-27

more references • Eliasziw M, Young SL, Woodbury MG, Fryday-Field K (1994) Statistical methodology for the concurrent assessment of interrater and intrarater reliability: Using gonimetric measurements as an example. Physical Therapy, 74 (8); 777-788. • Keating J, Maryas T (1998) Unreliable inferences from reliable measurements. Australian Journal of Physiotherapy, 44 (1); 5-10. • Greenfield MLH, Kuhn JE, Wotjys EM (1998) Validity and Reliability. American Journal of Sports Medicine, 26 (3); 483-485. • Batterham,A.M. & George,K.P. (2000) Reliability in evidence-based clinical practice: a primer for allied health professionals. Physical Therapy in Sport, 1; 54-61

Reliability and Validity Testing

Reliability and Validity Testing

Presentation Transcript

Reliability and Validity

Reliability and Validity

Reliability and Validity

Reliability and Validity

VALIDITY AND RELIABILITY

Reliability and Validity

Validity and Reliability

Reliability and Validity

Validity and reliability

Validity and Reliability

Validity and Reliability

Reliability and Validity

Validity and Reliability

Validity and Reliability

Reliability and Validity

Validity and Reliability

Reliability and Validity

Reliability and Validity

Validity and Reliability