Classroom Assessment

Classroom Assessment Reliability

Classroom Assessment Reliability • Reliability = Assessment Consistency. • Consistency within teachers across students. • Consistency within teachers over multiple occasions for students. • Consistency across teachers for the same students. • Consistency across teachers across students.

Three Types of Reliability • Stability reliability. • Alternate form reliability. • Internal consistency reliability.

Stability Reliability • Stability Reliability • Concerned with the question: Are assessment results consistent over time (over occasions). Think of some examples where stability reliability might be important. Why might test results NOT be consistent over time?

Evaluating Stability Reliability • Test-Retest Reliability. Compute the correlation between a first and later administration of the same test. • Classification-consistency. Compute the percentage of consistent student classifications over time. (Example on next slide). • Main concern is with the stability of the assessment over time.

Example of Classification Consistency

Example of Classification Consistency (Good Reliability)

Example of Classification Consistency (Poor Reliability)

Alternate-form Reliability • Are two, supposedly equivalent, forms of an assessment in fact actually equivalent? • The two forms do not have to yield identical scores. • The correlation between two or more forms of the assessment should be reasonably substantial.

Evaluating Alternate-form Reliability Administer two forms of the assessment to the same individuals and correlate the results. Determine the extent to which the same students are classified the same way by the two forms. Alternate-form reliability is established by evidence, not by proclamation.

Example of Using a Classification Table to Assess Alternate-Form Reliability

Internal Consistency Reliability Concerned with the extent to which the items (or components) of an assessment function consistently. To what extent do the items in an assessment measure a single attribute? For example, consider a math problem-solving test. To what extent does reading comprehension play a role? What is being measured?

Evaluating Internal Consistency Reliability • Split-Half Correlations. • Kuder-Richardson Formua (KR20). • Used with binary-scored (dichotomous) items. • Average of all possible split-half correlations. • Cronbach’s Coefficient Alpha. • Similar to KR20, except used with non-binary scored (polytomous) items (e.g., items that measure attitude.

ReliabilityComponents of an Observation O = T + E Observation = True Status + Error.

Standard Error of Measurement • Provides an index of the reliability of an individual’s score. • The standard deviation of the theoretical distribution of errors (i.e. the E’s). • The more reliable a test, the smaller the SEM.

Individual characteristics Anxiety Motivation Health Fatigue Understanding (of task) “Bad hair day” External characteristics Directions Environmental disturbances Scoring errors Observer differences/biases Sampling of items Sources of Error in Measurement

Things to Do toImprove Reliability • Use more items or tasks. • Use items or tasks that differentiate among students. • Use items or tasks that measure within a single content domain. • Keep scoring objective. • Eliminate (or reduce) extraneous influences • Use shorter assessments more frequently.

End

Classroom Assessment