1 / 19

Test Equating: Designs, Methods, and Applications to Progress Testing

Test Equating: Designs, Methods, and Applications to Progress Testing. Michelle M. Langer, Ph.D. National Board of Medical Examiners August 30, 2009. Why is Equating Necessary?. Most testing programs administer multiple test forms

bardia
Télécharger la présentation

Test Equating: Designs, Methods, and Applications to Progress Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Test Equating:Designs, Methods, and Applications to Progress Testing Michelle M. Langer, Ph.D. National Board of Medical Examiners August 30, 2009

  2. Why is Equating Necessary? • Most testing programs administer multiple test forms • Impossible to construct test forms that have identical characteristics • Comparison of raw scores is unfair to those taking the more difficult form • Equating the different forms of a test addresses these issues

  3. Why is Equating Necessary?

  4. Why is Equating Necessary?

  5. Why is Equating Necessary?

  6. Overview • Equating Requirements • Equating Designs • Equating Methods • Applications to Progress Testing

  7. Equating Requirements • Alternate forms built to same test specification • Test content and conditions of measurement for alternate forms are held constant

  8. Random Groups Design for Equating • Examinees are randomly assigned to take Form X or Form Y • Conditions of measurement for Form X and Form Y are the same Random Subgroup 1 Random Subgroup 2 Form X Form Y

  9. Single Group Design with Counterbalancing for Equating Random Random Subgroup 1 Subgroup 2 Form Taken First Form Taken Second Form X Form Y Form Y Form X

  10. Single Group Design with Counterbalancing for Equating • Counterbalancing used to control for order effects such as fatigue and practice • When there are no differential order effects, this design leads to much greater equating precision than the random groups design because each examinee serves as her or her own control

  11. Common-Item Nonequivalent Groups Design for Equating • Used when only one form can be administered per test date • Group 1 differs systematically from Group 2 Group 1 Group 2 Form X Form Y Common Common

  12. Common-Item Nonequivalent Groups Design for Equating • Two variations: Internal and External common items • Score on common items indicate how performance of Group 1 and Group 2 differ • Common items proportionally represent test content • Conditions of measurement for the common items must be the same in Form X and Form Y (similar position) • Requires strong statistical assumptions regarding relationship between scores on common items and total test

  13. Traditional Statistical Methods for Equating • Intent of traditional equating methods is for scores on alternate forms to have the same distributional characteristics after transformed to a common scale • Mean equating results in same mean • Linear equating results in same mean and standard deviation • Equipercentile equating results in approximately same score distribution • Traditional equating functions are defined for a particular population and conditions of measurement

  14. IRT Methods for Equating • Unidimensional IRT methods assume that proficiency, θ, can be described by a single latent variable • In common-item nonequivalent groups design, scale transformation methods are often used to place IRT parameter estimates on the same proficiency scale

  15. Applications to Progress Testing • Challenges • Examinee Population: Examinees with different levels of training are tested several times a year. • Equating Design: Students must be prevented from taking the same items at subsequent administrations. Test security must also be considered. Multiple forms. • Equating Method: The method chosen must operate relative to a specific population, which may not necessarily encompass all levels of training.

  16. Examinee Population • UK medical students • Years 1 through 5 • Multiple tracks (4-year and 5-year programs, etc.) • Reference group for equating: 601 graduating seniors

  17. Hybrid Random Groups Common Items Equating Design • 8 forms, 120 items each • Forms 1 through 6 randomly assigned to reference group • 40 additional items from a pool of forms 7 and 8 randomly assigned to each examinee; compose the common item set Random Random Random Random Random Random Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Form 1 Form 2 Form 3 Form 4 Form 5 Form 6 Forms 7,8 Forms 7,8 Forms 7,8 Forms 7,8 Forms 7,8 Forms 7,8

  18. Equating Method: IRT Concurrent Calibration • Puts the IRT parameter estimates on the same scale across forms • IRT true score equating used to relate raw scores on the forms • Resulting scale scores are comparable across forms • Rasch IRT model used:

  19. Recommended Resources • Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational Measurement, (4th ed., pp. 187-220). Westport, CT: American Council on Education and Praeger. • Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices. 2nd Edition. New York: Springer-Verlag. • Thissen, D., & Wainer, H. (2001). Test Scoring. Mahwah, NJ: Lawrence Erlbaum Associates.

More Related