1 / 20

Getting More from your Data

Session 2530, 2:15 – 4:00, Room 318 B Paper AC 2007-1783. Getting More from your Data. Your humble narrator: Kirk Allen, post-doctoral person Purdue University Department of Engineering Education. Application of Item Response Theory to the Statistics Concept Inventory.

taline
Télécharger la présentation

Getting More from your Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Session 2530, 2:15 – 4:00, Room 318 B Paper AC 2007-1783 Getting More from your Data Your humble narrator: Kirk Allen, post-doctoral person Purdue University Department of Engineering Education Application of Item Response Theory to the Statistics Concept Inventory

  2. Item Response Theory • Context • Students are answering questions on some form of test or survey • What is it?? • Models probability of student response as a function of latent trait “ability” • In general… • Prob(Response x) = Function(“ability”)

  3. Why you’re “getting more” • Not all questions are treated equally • Sports analogy • Not all questions have to be answered • Adjust for topic coverage • Adaptive testing

  4. About the Data • Statistics Concept Inventory • In development since Fall 2002 • My dissertation topic (May 2006) • 38 multiple choice questions on introductory statistics concepts • Fall 2005, 422 students • Variety of mathematics experience and statistics exposure

  5. Software • IRT Command Language • Free! • Maybe a little hard to use • BILOG and MULTILOG • Not free  • Name recognition • Does more (yay for graphs!) http://www.ssicentral.com/irt/index.html http://www.b-a-h.com/software/irt/icl/

  6. Assumptions • Unidimensionality • A single trait is responsible for examinee responses • There’s only one “ability” and we label it “θ” • Empirical, not inferred from test design • Local independence • Similarly, items are uncorrelated aside from the general ability factor • No subtests or other reasons for response carry-over from item to item

  7. Assumptions • Assessment • Unidimensionality • Scree plot (Exploratory Factor Analysis) • “Oh look, it’s really steep so I’m ok!” • Local independence • Generally, Unidimensionality is good enough • Also just sorta say “Well there aren’t testlets either” • These guys could mess you up… • Question 7… blah blah blah. • Question 8… What is your reason for #7?

  8. Multiple Choice Models • 1PL : items vary by difficulty (location) • “One-parameter logistic” • 2PL : … and discrimination (slope) • 3PL : … and lower asymptote (guessing)

  9. Multiple Choice Models (3PL) • Because engineers love equations… • What it means… • Probability of responding correctly to an item, for a given ability (θ), is a function of guessing (c), discrimination (α), and difficulty (β) • Simplify to 2PL by assuming c = 0 and further to 1PL by assuming α = 1

  10. Item Characteristic Curves • For each item, the graph of the previous function is called the Item Characteristic Curve (ICC) • X-axis: ability (θ) • Y-axis: probability of correct response • Following slides: comparison of three models for three questions

  11. Item Characteristic Curves (1-PL)

  12. Item Characteristic Curves (2-PL)

  13. Item Characteristic Curves (3-PL)

  14. Does it fit? • Divide students into bins based on ability estimates • Compare observed to model probabilities • Statistical assessment based on χ2 (chi-square) • Informal assessment graphically

  15. Graphical fit assessment of one item

  16. More “getting more” • Items can be strongly discriminating at extremes of the ability scale • Previous slide: steep at high ability • Poor discriminator by simpler metrics • Discrimination index 0.25, ranked 28 of 38

  17. Wait, there’s more! • Information replaces reliability • Based on item’s discrimination across the ability spectrum • Test information is the sum of item informations

  18. Information and Standard Error

  19. Other IRT Models • Multiple Choice • Nominal Response Model • Item Response Curves for each multiple choice option • Likert scales (e.g., 1-2-3-4-5) • Graded Response Model • Similarly, ICC for each numeric option • Can help evaluate whether 3 on this question is the “same” as 3 on that question

  20. Where I hang out • https://engineering.purdue.edu/SCI • allenk@purdue.edu • Use at your peril • kcallen@hotmail.com • More permanently trust-worthy

More Related