Validity and Reliabilty

Validity and Reliabilty • Define different types of validity • Define reliability

Criteria for Good Instruments Validity • Validity refers to the degree that the test measures what it is supposed to measure. • Validity is the most important test characteristic.

Criteria for Good Instruments • There are numerous established validity standards. • Content validity • Criterion-related validity • Concurrent validity • Predictive validity • Construct validity

Content Validity • Content validity addresses whether the test measures the intended content area - sometimes called Face Validity. • Content validity is the extent to which the questions are representative of all the questions that could be asked • Content validity is measured by expert assessment and judgment (content validation).

Content Validity • Content validity is concerned with both: • Item validity: Are the test items measuring the intended content? • Sampling validity: Do the items measure the content area being tested? • One example of a lack of content validity is a math test with heavy reading requirements. It may not only measure math but also reading ability and is therefore not a valid math test.

Criterion-Related Validity • Criterion-related validity is determined by relating performance on a test to performance on an alternative test or other measure.

Criterion-Related Validity • Two types of criterion-related validity include: • Concurrent: The scores on a test are correlated to scores on an alternative test given at the same time (e.g., two measures of reading achievement). • Predictive: The degree to which a test can predict how well a person will do in a future situation, e.g., GRE, (with predictor represented by GRE score and criterion represented as success in graduate school).

Construct Validity • Most important form of validity. • Construct validity assesses what the test is actually measuring and are the results significant, meaningful and useful. • It is very challenging to establish construct validity.

Construct Validity • Construct validity requires confirmatory and disconfirmatory evidence. • Scores on tests should relate to scores on similar tests and NOT relate to scores on tests of other constructs. • For example, scores on a math test should be more highly correlated with scores on another math test than they are to scores from a reading test.

Validity • Some factors that threaten validity: • Unclear directions • Confusing or unclear items • Vocabulary or required reading ability too difficult for test takers • Subjective scoring • Cheating • Errors in administration

Reliability • Reliability refers to the consistency of an instrument to measure a construct. • Reliability is expressed as a reliability coefficient based upon a correlation. • Reliability coefficients should be reported for all measures. • Reliability affects validity. • There are several forms of reliability.

Reliability • Test-Retest (Stability) reliability measures the stability of scores over time. • To assess test-retest reliability, a test is given to the same group twice and a correlation is taken between the two scores. • The correlation is referred to Coefficient of Stability.

Reliability • Alternate forms (Equivalence) reliability measures the relationship between two versions of a test that are intended to be equivalent. • To assess alternate forms reliability, both tests are given to the same group and the scores on each test are correlated. • The correlation is referred to as the Coefficient of Equivalence.

Reliability • Internal Consistency reliability represents the extent to which items in a test are similar to one another. • Split-half: The test is divided into halves and a correlation is taken between the scores on each half. • Coefficient alpha and Kuder-Richardson measure the relationship between and among all items and total scale of a test.

Reliability • Scorer and rater reliabilities reflect the extent to which independent scorers or a single scorer over time agree on a score. • Interjudge (inter-rater) reliability: Consistency of two or more independent scorers. • Intrajudge (intra-rater) reliability: Consistency of one person over time.

Validity and Reliabilty