140 likes | 339 Vues
Assessment. Why do it? What kinds are there?. Why assess?. Reliability. True score + error (we can’t measure true scores) Reported as a correlation (“r”).
 
                
                E N D
Assessment Why do it? What kinds are there?
Reliability • True score + error (we can’t measure true scores) • Reported as a correlation (“r”). • E.g., a correlation of .8 between two sets of test scores is interpreted as meaning that 80% of the variability is true score variability. • Methods of establishing reliability • Test-retest • Alternate-forms • Split-half • Internal consistency
Improving reliability (> 0.70) • The more items, the greater the reliability. • Avoid ceiling and floor effects (too easy or too hard). • Wider range of abilities taking the test, the greater the reliability. • If people taking test have similar backgrounds, the reliability will be higher. (Note: consider that with similar backgrounds there will still be a range of abilities) • More objective the scoring, the greater the reliability.
Improving reliability • Use straightforward, clearly worded questions. • Good directions can increase reliability. • Reliability is increased if the people taking the test are rested, calm, well, and taking the test seriously. • Handout: Writing test questions
Content Criterion Psychological test Consequential Convergent Discriminant Face Predictive Concurrent Validity Construct Note: A criterion is an extra-test variable – something outside of the test that the test should be able to predict)
Relationship . . . Reliability Validity
Don’t get confused • Reliability and validity for assessments and for research studies • E.g., Validity of psychological experiments • Internal validity • External validity
Types of Tests • Published achievement tests • Standardized empirically documented tests • Non-standardized, not empirically documented tests • Teacher-made tests
Norm-referenced • Interpret a student’s assessment performance by comparing it to the performance of a well-defined group of other students who have also taken the same assessment” (p. 393). • Norm-group needs to be well-defined to ensure validity. • “Numbers” that compare a student to the norming group.
Criterion-referenced • “infer the kinds of performances the student can do in a domain, rather than the student’s relative standing in a norm group” (p. 393). • These lose validity when domain is not well-defined or when the assessment is a poor sample of the domain. • “Numbers” compare student to a standard.
Be sure to review the following • Appropriate use of test results • Inappropriate use of test results
Teacher-made tests • How can you strengthen reliability and validity? • Your tasks: • Exam questions in EP • Feedback rubric/scale for MA I