1 / 36

URBDP 591 A Lecture 13: Measurement Reliability and Validity

URBDP 591 A Lecture 13: Measurement Reliability and Validity. Objectives True Score Measurement Error Types of Reliability Relationship between Reliability and Validity. Measurement Reliability and Validity. Reliability. The extent to which the variables are free from random error,

tgrothe
Télécharger la présentation

URBDP 591 A Lecture 13: Measurement Reliability and Validity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. URBDP 591 A Lecture 13:Measurement Reliability and Validity Objectives • True Score • Measurement Error • Types of Reliability • Relationship between Reliability and Validity

  2. Measurement Reliability and Validity Reliability The extent to which the variables are free from random error, usually determined by measuring the variables more than once. Validity The extent to which a measured variable actually measures the conceptual variables that is design to assess the extent to which it is known to reflect the conceptual variable other measured variables

  3. The Classical Theory of Measurement • Reliability (=precision) is the extent to which measurement is consistent or reproducible • Validity (=accuracy) is the extent to which what is measured is what the investigator wants to measure

  4. What is Reliability • the “repeatability” of a measure • the “consistency” of a measure • the “dependability” of a measure

  5. Components of an Observed Score Observed Score True Score Measurement Error + = • The best approximation to the true score is obtained by making multiple independent observations and averaging the results • Reliability of measurement is increased (and error decreased) by increasing the number of observations • Note that the true score is not all its name implies

  6. Reliability and Validity When a measure is VALID, ES + ER = 0, and X0 = Xt When a measure is RELIABLE, ER = 0, and X0 = XT + ES RELIABILITY is a necessary but not a sufficient condition for VALIDITY

  7. Accuracy and Precision Accuracy is the degree to which the observed value matches the true values. Accuracy is an issue pertaining to the quality of data and the number of errors contained in a data. Precision refers to the level of measurement and exactness of description. Precise is the extent to which the value can be reproduced.

  8. Accuracy and Precision Accuracy Precision Accuracy and Precision

  9. If a Measure is Reliable... We expect that a measure taken twice is similar X1 X2

  10. If a Measure is Reliable... The only thing common to the two measures is the true score, T. The true score must determine the reliability. X1 X2 T + e1 T + e2

  11. Reliability is... a ratio true level on the measure the entire measure var(T) var(X)

  12. What is Random Error? • any factors that randomly affect measurement of the variable across the sample • for instance, the natural variability in the occurrence of a phenomenon can generate different reflectance • random error adds variability to the data but does not affect average values for the overall sample

  13. Random Error the distribution of X with no random error frequency X

  14. Random Error the distribution of X with random error the distribution of X with no random error frequency X

  15. Systematic Error the distribution of X with no systematic error frequency X

  16. Systematic Error the distribution of X with systematic error the distribution of X with no systematic error frequency X

  17. Sources of Error • Precision Error • “Random error” • Leads to poor precision • Bias error • “Systematic error” • Leads to poor accuracy

  18. Reducing Measurement Error • pilot test your instruments -- get feedback from respondents • train your interviewers or observers • make observation/measurement as unobtrusive as possible • double-check your data • triangulate across several measures that might have different biases

  19. Measurement Error and Confirmatory Studies • In confirmatory studies, random errors will not bias estimations of group means • Random errors will inflate standard errors of the mean and confidence intervals, and in effect diminish the power of confirmatory studies

  20. Two Reliability Estimates • Test-Retest • more conservative • lower-bound estimate • Internal Consistency (Cronbach’s alpha) • change over time doesn’t affect • upper-bound estimate

  21. Reliability of Consistency of What? • observers or raters • tests over time • different versions of the same test • a test at one point in time

  22. Interrater or Interobserver Reliability object or phenomenon ? observer 1 observer 2 =

  23. Interrater or Interobserver Reliability • are different observers consistent? • can establish this outside of study in a pilot study • can look at percent of agreement (especially with category ratings) • can use correlation (with continuous ratings)

  24. Test-Retest Reliability • measure instrument at two times for multiple persons • compute correlation between the two measures • assumes there is no change in the underlying trait between time 1 and time 2

  25. Parallel-Forms Reliability • administer both forms to the same people • get correlation between the two forms • usually done in educational contexts where we need alternative forms because of the frequency of retesting and where you can sample from lots of equivalent questions

  26. Internal Consistency Reliability item 1 Average Inter-Item Correlation item 2 I1 I2 I3 I4 I5 I6 I1 I2 I3 I4 I5 I6 1.00 .89 1.00 .91 .92 1.00 .88 .93 .95 1.00 .84 .86 .92 .85 1.00 .88 .91 .95 .87 .85 1.00 item 3 test item 4 item 5 item 6

  27. item 1 item 3 item 4 item 1 item 3 item 4 item 1 item 3 item 4 .87 .87 .87 item 2 item 5 item 6 item 2 item 5 item 6 item 2 item 5 item 6 Internal Consistency Reliability Cronbach’s alpha item 1 item 2 item 3 test SH1 .87 SH2 .85 SH3 .91 SH4 .83 SH5 .86 ... SHn .85 item 4 item 5 item 6 alpha= .85

  28. Validity and Reliability Validity of a Measurement Instrument: Differences in the “values” assigned to the objects reflect true differences among the objects with respect to the characteristic being measured Predictive (Pragmatic) Validity: The extent to which the data are useful as a predictor of some related characteristic being measured. Content (Face) Validity: The extent to which the “domain” of the characteristic is “sampled” by the measure. Construct Validity: The extent to which the process in fact measures the construct of interest. Reliability of a Measurement Instrument: The extent to which an instrument produces consistent results over time and across independent but comparable measures.

  29. Accuracy Terms • errors of omission (map producer's accuracy) = incorrect in column / total in column. Measures how well the map maker was able to represent the ground features • errors of commission (map user's accuracy) = incorrect in row / total in row. Measures how likely the map user is to encounter correct information while using the map

  30. Accuracy Assessment • Overall Accuracy: total number of correctly classified elements divided by the total number of reference elements • Accuracies of Individual Categories • Producer’s Accuracy: number of correctly classified elements divided by the reference elements for that category (omission) • User’s Accuracy: correctly classified elements in each category by the total elements that were classified in that category (comission)

  31. What is Cohen’s Kappa • A measure of agreement that compares the observed agreement to agreement expected by chance if the observer ratings were independent • Expresses the proportionate reduction in error generated by a classification process, compared with the error of a completely random classification. • For perfect agreement, kappa = 1 • A value of .82 would imply that the classification process was avoiding 82 % of the errors that a completely random classification would generate.

  32. Reliability and Validity Concept Method Concept Method = Concept same different reliability discriminant same Method convergent very discriminant different

  33. Validity Construct Validity The extent to which a measured variable actually measures the conceptual variable (that is, the construct) that it is designed to assess. Criterion Validity The extent to which a self-report measure correlates with a behavioral measured variables. Face Validity The extent to which the measured variable appears to be an adequate measure of the conceptual variables Content Validity The degree to which the measured variable appears to have adequately sampled from the potential domain of question that might relate to the conceptual variable of interest.

  34. Convergent Validity The extent to which a measured variable is found to be related to other measured variables designed to measure the same conceptual variable. Discriminant Validity The extent to which a measured variable is found to be unrelated to other measured variables designed to measure the different conceptual variables. PredictiveValidity The extent to which the measurement can predict the future. Concurrent Validity The extent to which the self-report measure correlate with the behavioral measure that is assessed at the same time.

  35. Time Conceptual Variables Future behaviors Face Validity Predictive Validity Other Domain of the CVs Measured Variables (Self-Report) Measured Variables (Behavioral) Concurrent Validity Content Validity Items-Scales Similar Items-Scales Other Items-Scales Convergent Validity Discriminant Validity

More Related