1 / 26

Reliability and Validity Testing

Reliability and Validity Testing. Definitions. Validity - the extent to which a test measures what it is designed to measure Reliability - the extent to which a test or measure is reproducible. Validity. Logical (face) - how much the measure obviously involves the performance.

vasanti
Télécharger la présentation

Reliability and Validity Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reliability and Validity Testing

  2. Definitions • Validity - the extent to which a test measures what it is designed to measure • Reliability - the extent to which a test or measure is reproducible

  3. Validity • Logical (face) - how much the measure obviously involves the performance. • Construct - how well the measure relates to the theory • Content - how well the outcome evaluates the intervention • Criterion - how well the test measures against a set standard

  4. Assessment of Validity • Criterion validity • Concurrent • Predictive • Prescriptive

  5. Bland and Altman Bias Dispersion of the Bias Relationship of Bias to value M = Experimental measured value GS = Gold Standard measured value

  6. Advantages Easy to interpret visually Can indicate bias in measurements Can be clinically useful Useful for validity Disadvantages Difficult for more than two raters or datasets More complex to interpret Needs high numbers Should also report raw data to interpret variation Bland and Altman Limits of Agreement

  7. Reliability • A measure CANNOT be valid but NOT reliable • However a measure CAN BE reliable but NOT valid

  8. Reliability Observed score = True score + Error score True score hard to evaluate but we can estimate the error score

  9. Sources of Error The Participants

  10. Sources of Error The Testing Poor directions Additional motivation Inconsistent protocol

  11. Sources of Error The Scoring The scorers Type of scoring system

  12. Sources of Error The Instrumentation Calibration Inaccuracies Sensitivity

  13. Statistical techniques • Pearsons r • ICC • Limits of agreement • Cronbachs alpha • Kappa statistic • Weighted kappa statistic

  14. Pearsons r Weaknesses • Bi-variate • Limited to two variables • Does not consider differences in variance • Only measures association not agreement • Not really appropriate for reliability

  15. Strengths Univariate Allows for unequal cell numbers Value from -1 to +1 Allows any number of raters or subjects Weaknesses Has several formulae Does not imply usefulness Ratios can be difficult to compare Between subject variation should reflect population Intra-class correlation (ICC)

  16. Calculation Variance between (due to) repeated trials Variance between (due to) repeated observers/observations Variance from ANOVA model = Mean Squares

  17. Shrout and Fleiss formulae Case 1: Each subject rated by a different set of k raters randomly selected from a larger population of raters Case 2: A random sample of k raters, selected from a larger population of raters, rates each subject Case 3: Each subject is rated by k raters who are the only raters of interest

  18. Cases (1,1), (2,1) & (3,1) are used when the unit of measurement is obtained from only one measurement • Cases (1,k), (2,k) & (3,k) are used when the unit of measurement is obtained from more than one measurement (i.e. a mean measurement)

  19. How to calculate • Use equations and values obtained from ANOVA’s (Rankin and Stokes, 1998) • Use macros downloaded from SPSS.com (may not work with all versions of SPSS)

  20. Cronbachs Alpha • Generalised measure of reliability • Easy to interpret • Similar to intraclass correlation

  21. Kappa statistics • Kappa statistic • Nominal data • Weighted Kappa statistic • Ordinal data

  22. Generating ICC’s Need • Correct macro • Data laid out appropriately • Two lines of syntax to run macros • All files resident in the same directory

  23. References • Sim J (1993) Measurement validity in Physical Therapy research. Physical Therapy, 73 (2); 48-55 • Rankin G, Stokes M (1998) Reliability of assessment tools in rehabilitation: an illustration of appropriate statistical analyses. Clinical Rehabilitation, 12; 187 • Bland JM, Altman DG (1986) Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, Feb 8; 307-310. • Kreb DE (1984) Intraclass correlation coefficients: Use and calculation. Physical Therapy, 64 (10); 1581-1582. • Thomas JR, Nelson JK (2001) Research Methods in Physical Activity 4th Ed. Human Kinetics, Leeds. • George,K, Batterham,A & Sulliavan,I (2000) Validity in clinical research: a review of basic concepts and definitions. Physical Therapy in Sport, 1; 19-27

  24. more references • Eliasziw M, Young SL, Woodbury MG, Fryday-Field K (1994) Statistical methodology for the concurrent assessment of interrater and intrarater reliability: Using gonimetric measurements as an example. Physical Therapy, 74 (8); 777-788. • Keating J, Maryas T (1998) Unreliable inferences from reliable measurements. Australian Journal of Physiotherapy, 44 (1); 5-10. • Greenfield MLH, Kuhn JE, Wotjys EM (1998) Validity and Reliability. American Journal of Sports Medicine, 26 (3); 483-485. • Batterham,A.M. & George,K.P. (2000) Reliability in evidence-based clinical practice: a primer for allied health professionals. Physical Therapy in Sport, 1; 54-61

More Related