1 / 20

Large-scale testing: Uses and abuses

Large-scale testing: Uses and abuses. Richard P. Phelps Universidad Finis Terrae, Santiago, Chile January 7, 2014. Large-scale testing: Uses and abuses. 3 types of large-scale tests Measuring test quality A chronology of mistakes E conomists misunderstand testing How SIMCE is affected.

Télécharger la présentation

Large-scale testing: Uses and abuses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large-scale testing: Uses and abuses Richard P. Phelps Universidad Finis Terrae, Santiago, Chile January 7, 2014

  2. Large-scale testing: Uses and abuses 3 types of large-scale tests Measuring test quality A chronology of mistakes Economists misunderstand testing How SIMCE is affected

  3. 1. Three types of large-scale tests AchievementAptitudeNon-cognitive

  4. Achievement tests J.M. Rice - systematically analyzed test structures & effects E.L. Thorndike - developed scoring scales Historically, were larger versions of classroom tests ~ 1900 - “scientific” achievement tests developed (Germany & USA) SOURCE: Phelps, Standardized Testing Primer, 2007

  5. Achievement tests Purpose: to measure how much you know and can recall Developed using: content coverage analysis How validated: retrospective or concurrent validity (correlation with past measures, such as high school grades) Requires a mastery of content prior to test. Fairness assumes that all have same opportunity to learn content Coachable – specific content is known in advance SOURCE: Phelps, Standardized Testing Primer, 2007

  6. 1890s – A. Binet & T. Simon (France) • Pre-school children with mental disabilities • - achievement test not possible • - developed content-free test of mental abilities • (association, attention, memory, motor skills, reasoning) Aptitude tests 1917 – Adapted by U.S. Army to select, assign soldiers in World War 1 1930s – Harvard University president J. Conant wanted new admission test to identify students from lower social classes with the potential to succeed at Harvard developed the first Scholastic Aptitude Test (SAT) SOURCE: Phelps, Standardized Testing Primer, 2007

  7. Aptitude tests Purpose: predict how much can be learned Developed using: skills/job analysis How validated: predictive validity, correlation with future activity (e.g., university or job evaluations) Content independent. Measures: … what student does with content provided … how student applies skills & abilities developed over a lifetime Not easily coachable – the content is either… … not known in advance, … basic, broad, commonly known by all, curriculum-free; … less dependent on the quality of schools SOURCE: Phelps, Standardized Testing Primer, 2007

  8. Aptitude tests Aptitude tests can identify: - Students bored in school who study what interests them on their own - Students not well adapted to high school, but well adapted to university - Students of high ability stuck in poor schools SOURCE: Phelps, Standardized Testing Primer, 2007

  9. Comparing Achievement & Aptitude tests

  10. Non-cognitive tests More recently developed – measure values, attitudes, preferences Types: integrity tests career exploration matchmaking employment “fit”

  11. Non-cognitive tests Purpose: to identify “fit” with others or a situation Developed using: surveys, personal interviews How validated? success rate in future activities Content is personal, not learned “Faking” can be an issue (e.g., “honesty” tests)

  12. Comparing Achievement, Aptitude, & Non-Cognitive Tests

  13. Test reports can be “data dumps” 2. Measuring test quality 3 measures are important: 1. Predictive validity 2. Content coverage 3. Sub-group differences

  14. Predictive validity(values from -1.0 to +1.0) …measures how well higher scores on admission test match better outcomes at university (e.g., grades, completion) A test with low predictive validity provides a little information.

  15. A positive correlation between two measures Source: NIST, Engineering Statistics Handbook

  16. A negative correlation between two measures Source: NIST, Engineering Statistics Handbook

  17. No correlation between two measures Source: NIST, Engineering Statistics Handbook

  18. Howdoesonemeasurepredictivecapacity?CorrelationCoefficient: I--------------------------------------------I-1 0 1

  19. Predictive validities: SAT and PSU SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013

  20. Predictive validities: SAT and PSU (faculty: Administracion) SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013

More Related