1 / 24

Reliability and Validity of Measurement

Reliability and Validity of Measurement. RCS 6740 6/28/04. Validity of Measurement. Validity of measurement refers to the extent to which a test accurately measures what it is intended to measure. The three major types of validity are content, construct, and criterion-related.

austint
Télécharger la présentation

Reliability and Validity of Measurement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reliability and Validity of Measurement RCS 6740 6/28/04

  2. Validity of Measurement • Validity of measurement refers to the extent to which a test accurately measures what it is intended to measure. The three major types of validity are content, construct, and criterion-related.

  3. Validity of Measurement Cont. • Validity has a different meaning in the context of measurement than in the context of research design (Pedhazur & Schmelkin, 1991). • In measurement, “validity, or rather validation, refers not to a measure in question but to inferences made on the bases of scores obtained on it” (Pedhazur & Schmelkin, 1991, p. 31).

  4. Validity of Measurement Cont. • Validity is a unitary concept. The widely used types of validity are not mutually exclusive. • In the sense of the elephant, if validity is the elephant, the types of validity refer to different ways of measuring the elephant – or perhaps they refer to the different facets of the elephant.

  5. Validity of Measurement Cont. • Much in the way that we use multiple measures to operationalize constructs, we also have a variety of ways to measure validity. • The most commonly used types of validity are content, criterion, and construct. • Pedhazur and Schmelkin (1991) point out that although the content of a measure is important, content validity is not validity in the sense of the definition presented above and the focus should be on the construct being measured.

  6. Types of Measurement Validity Content Validity • The test is a representative sample of performance in some defined area of job-related knowledge, skill, ability, or other characteristic. The extent to which a test adequately samples the domain of information, knowledge, or skill that it purports to measure. Most important for achievement and job sample tests.

  7. Types of Validity Cont. Criterion-related Validity • The test is shown to be statistically related to some criterion of successful job performance. The type of validity that involves determining the relationship (correlation) between the predictor and the criterion. The correlation coefficient is referred to as the criterion-related validity coefficient. Criterion-related validity can be either concurrent or predictive.

  8. Types of Validity Cont. Construct Validity • The test is demonstrated to be a measure of a job-relevant characteristic (e.g., reasoning ability). The extent to which a test measures the hypothetical trait (construct) it is intended to measure. • Methods for establishing construct validity include correlating test scores with scores on measures that do and do not measure the same trait (convergent and discriminant validity); conducting a factor analysis to assess the test’s factorial validity; determining if changes in test scores reflect expected developmental changes; and seeing if experimental manipulations have the expected impact on test scores.

  9. Criterion-Related Validity • Criterion-related validation focuses on prediction, the overriding concern being the degree of successful prediction of a criterion, regardless of whether or not it is possible to explain the process or processes leading to the phenomenon that is being predicted (Pedhazur and Schmelkin, 1991, p. 32).

  10. Criterion-Related Validity Cont. Criterion-Related Validation Approaches • A criterion is any variable (e.g., academic achievement, voting, aggression, productivity, drug use, absenteeism, delinquency) one wishes to explain and/or predict by resorting to information from another variable(s) (Pedhazur & Schmelkin, 1991, p. 32).

  11. Criterion-Related Validity Cont. Criterion-Related Validation Approaches • Criterion-related validity focuses on prediction, the overriding concern being the degree of successful prediction of a criterion, regardless of whether or not it is possible to explain the process or processes leading to the phenomenon that is being predicted (Pedhazur & Schmelkin, 1991, p. 32). • Predictive Validity • Concurrent Validity • Regression equations are common tools in criterion-related validation.

  12. Construct Validity • Construct validation is concerned with validity of inferences about unobserved variables (the constructs) on the basis of observed variables (their presumed indicators) (Pedhazur & Schmelkin, 1991, p. 52). • A construct derives its meaning and relevance from the theoretical context, the nomological network, within which it is embedded (Pedhazur & Schmelkin, 1991, p. 69).

  13. Construct Validity Cont. Construct Validation Approaches • Construct validation is a never-ending enterprise. Gains, or losses, in credibility of inferences made on the bases of responses to (or status on) indicators of a construct depend on the nature and quality of the accumulated evidence involving the construct under consideration. • Because tests of hypothesis have a bearing on its validation, it is clear that the approaches to validation are limited only by the researcher’s imagination and acumen, and the theoretical formulation and expectations regarding the construct under consideration (Pedhazur & Schmelkin, 1991, p. 59).

  14. Construct Validity Cont. • Pedhazur and Schmelkin (1991) group approaches to construct validation under the following topics: • (a) logical analysis • (b) internal-structure analysis • (c) cross-structure analysis • Following is a summary of their points:

  15. Construct Validity Cont. Logical Analysis • Is the construct well defined? • Is the definition of the construct grounded in knowledge of theories and relevant research findings? • Are the items consistent with the definition of the construct? • To what extent are the obtained scores adversely influenced by specific measurement procedures used (e.g., X-rays of the elephant)?

  16. Construct Validity Cont. Internal Structure Analysis • If appropriate, was factor analysis used? • Factor analysis: an approach that, like cluster analysis, identifies relationships without using an outcome (dependent) variable. • Grouping related characteristics, factor analyses reveal unobserved "dimensions" that underlie a larger number of observed variables. This technique can either identify a subset of variables to represent these dimensions, or derive new variables that are composites of the original variables associated with each dimension. • In either case, subsequent analyses (e.g. regression or cluster) can benefit from variable reduction.

  17. Construct Validity Cont. Cross-Structure Analysis • Convergent validity refers to the convergence among different methods (preferably maximally different ones) designed to measure the same construct (Pedhazur & Schmelkin, 1991, p. 74). • Discriminant validity refers to the distinctiveness of constructs (Pedhazur & Schmelkin, 1991, p. 74). • Measures of similar constructs should have reasonably high correlations, whereas measures of different constructs should not be highly correlated.

  18. Reliability of Measurement • Reliability of measurement refers to the degree to which the results of an assessment are dependable and consistent. • Reliability is an indication of the consistency of scores over time, between scores, or across different tasks or items that measure the same thing. • If scores from an assessment are unreliable interpretations based on these scores, and subsequent decisions, will not be valid.

  19. Reliability of Measurement Cont. • In the sense of measurement, the reliability of a measure is the “ratio of true-score variance to observed-score variance” (p. 85). • The same instrument may be more or less reliable in different populations. Therefore, “the relevant reliability estimate is the one obtained for the sample used in the study under consideration” (Pedhazur & Schmelkin, 1991, p. 86).

  20. Reliability of Measurement Cont. • An important thing to remember is that different formulations of reliability imply different treatment of error variance. Thus, we cannot talk about reliability without talking about the formulation.

  21. Reliability of Measurement Cont. Test-Retest Reliability • Test-retest reliability- Individuals are asked to take the test of interest and then take the same test again at a later date. The scores are then compared. • The closer the scores are, the more reliable the test. Reliability is an important factor in testing because if paves the way for accuracy • Note that test-retest reliability should be used with caution, because “a low test-retest correlation, for example, may be indicative of a measure with low reliability, of true changes in the individuals measured, or a combination of both” (Pedhazur & Schmelkin, 1991, p. 89.

  22. Reliability of Measurement Cont. Equivalent Forms • Equivalent forms is a way to gauge the consistency of measurement based on the correlation between scores on two similar forms of the same test taken by the same individual. • This method also has problems because the coefficients of equivalence reflect both the extent of equivalence of the forms and the actual reliability.

  23. Reliability of Measurement Cont. Internal Consistency • Internal Consistency is a method of estimating whether different parts of a test are measuring the same variable. One of the most commonly used reliability coefficients for internal consistency is Cronbach's alpha (a). • This type of reliability tells us the extent to which the items in a scale measure the same phenomenon and the extent to which the scale hangs together. Two methods are split-half and coefficient alpha. Alpha is preferred.

  24. Reliability of Observation • As Pedhazur and Schmelkin (19991) suggest, there are multiple potential sources of error in observational data. • The most simple is addressed through inter-observer agreement, which is the percent of agreements with respect to the total number of decisions. • Tinsley and Weiss (1975) suggest multiple types of computation to examine the agreement and reliability among observers. • Generalizability theory (Shavelson, Webb, & Rowley, 1992) is one approach to evaluating the multiple sources of errors in such data.

More Related