290 likes | 554 Vues
Reliability and Validity Testing. Definitions. Validity - the extent to which a test measures what it is designed to measure Reliability - the extent to which a test or measure is reproducible. Validity. Logical (face) - how much the measure obviously involves the performance.
E N D
Definitions • Validity - the extent to which a test measures what it is designed to measure • Reliability - the extent to which a test or measure is reproducible
Validity • Logical (face) - how much the measure obviously involves the performance. • Construct - how well the measure relates to the theory • Content - how well the outcome evaluates the intervention • Criterion - how well the test measures against a set standard
Assessment of Validity • Criterion validity • Concurrent • Predictive • Prescriptive
Bland and Altman Bias Dispersion of the Bias Relationship of Bias to value M = Experimental measured value GS = Gold Standard measured value
Advantages Easy to interpret visually Can indicate bias in measurements Can be clinically useful Useful for validity Disadvantages Difficult for more than two raters or datasets More complex to interpret Needs high numbers Should also report raw data to interpret variation Bland and Altman Limits of Agreement
Reliability • A measure CANNOT be valid but NOT reliable • However a measure CAN BE reliable but NOT valid
Reliability Observed score = True score + Error score True score hard to evaluate but we can estimate the error score
Sources of Error The Participants
Sources of Error The Testing Poor directions Additional motivation Inconsistent protocol
Sources of Error The Scoring The scorers Type of scoring system
Sources of Error The Instrumentation Calibration Inaccuracies Sensitivity
Statistical techniques • Pearsons r • ICC • Limits of agreement • Cronbachs alpha • Kappa statistic • Weighted kappa statistic
Pearsons r Weaknesses • Bi-variate • Limited to two variables • Does not consider differences in variance • Only measures association not agreement • Not really appropriate for reliability
Strengths Univariate Allows for unequal cell numbers Value from -1 to +1 Allows any number of raters or subjects Weaknesses Has several formulae Does not imply usefulness Ratios can be difficult to compare Between subject variation should reflect population Intra-class correlation (ICC)
Calculation Variance between (due to) repeated trials Variance between (due to) repeated observers/observations Variance from ANOVA model = Mean Squares
Shrout and Fleiss formulae Case 1: Each subject rated by a different set of k raters randomly selected from a larger population of raters Case 2: A random sample of k raters, selected from a larger population of raters, rates each subject Case 3: Each subject is rated by k raters who are the only raters of interest
Cases (1,1), (2,1) & (3,1) are used when the unit of measurement is obtained from only one measurement • Cases (1,k), (2,k) & (3,k) are used when the unit of measurement is obtained from more than one measurement (i.e. a mean measurement)
How to calculate • Use equations and values obtained from ANOVA’s (Rankin and Stokes, 1998) • Use macros downloaded from SPSS.com (may not work with all versions of SPSS)
Cronbachs Alpha • Generalised measure of reliability • Easy to interpret • Similar to intraclass correlation
Kappa statistics • Kappa statistic • Nominal data • Weighted Kappa statistic • Ordinal data
Generating ICC’s Need • Correct macro • Data laid out appropriately • Two lines of syntax to run macros • All files resident in the same directory
References • Sim J (1993) Measurement validity in Physical Therapy research. Physical Therapy, 73 (2); 48-55 • Rankin G, Stokes M (1998) Reliability of assessment tools in rehabilitation: an illustration of appropriate statistical analyses. Clinical Rehabilitation, 12; 187 • Bland JM, Altman DG (1986) Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, Feb 8; 307-310. • Kreb DE (1984) Intraclass correlation coefficients: Use and calculation. Physical Therapy, 64 (10); 1581-1582. • Thomas JR, Nelson JK (2001) Research Methods in Physical Activity 4th Ed. Human Kinetics, Leeds. • George,K, Batterham,A & Sulliavan,I (2000) Validity in clinical research: a review of basic concepts and definitions. Physical Therapy in Sport, 1; 19-27
more references • Eliasziw M, Young SL, Woodbury MG, Fryday-Field K (1994) Statistical methodology for the concurrent assessment of interrater and intrarater reliability: Using gonimetric measurements as an example. Physical Therapy, 74 (8); 777-788. • Keating J, Maryas T (1998) Unreliable inferences from reliable measurements. Australian Journal of Physiotherapy, 44 (1); 5-10. • Greenfield MLH, Kuhn JE, Wotjys EM (1998) Validity and Reliability. American Journal of Sports Medicine, 26 (3); 483-485. • Batterham,A.M. & George,K.P. (2000) Reliability in evidence-based clinical practice: a primer for allied health professionals. Physical Therapy in Sport, 1; 54-61