200 likes | 424 Vues
Session 2530, 2:15 – 4:00, Room 318 B Paper AC 2007-1783. Getting More from your Data. Your humble narrator: Kirk Allen, post-doctoral person Purdue University Department of Engineering Education. Application of Item Response Theory to the Statistics Concept Inventory.
E N D
Session 2530, 2:15 – 4:00, Room 318 B Paper AC 2007-1783 Getting More from your Data Your humble narrator: Kirk Allen, post-doctoral person Purdue University Department of Engineering Education Application of Item Response Theory to the Statistics Concept Inventory
Item Response Theory • Context • Students are answering questions on some form of test or survey • What is it?? • Models probability of student response as a function of latent trait “ability” • In general… • Prob(Response x) = Function(“ability”)
Why you’re “getting more” • Not all questions are treated equally • Sports analogy • Not all questions have to be answered • Adjust for topic coverage • Adaptive testing
About the Data • Statistics Concept Inventory • In development since Fall 2002 • My dissertation topic (May 2006) • 38 multiple choice questions on introductory statistics concepts • Fall 2005, 422 students • Variety of mathematics experience and statistics exposure
Software • IRT Command Language • Free! • Maybe a little hard to use • BILOG and MULTILOG • Not free • Name recognition • Does more (yay for graphs!) http://www.ssicentral.com/irt/index.html http://www.b-a-h.com/software/irt/icl/
Assumptions • Unidimensionality • A single trait is responsible for examinee responses • There’s only one “ability” and we label it “θ” • Empirical, not inferred from test design • Local independence • Similarly, items are uncorrelated aside from the general ability factor • No subtests or other reasons for response carry-over from item to item
Assumptions • Assessment • Unidimensionality • Scree plot (Exploratory Factor Analysis) • “Oh look, it’s really steep so I’m ok!” • Local independence • Generally, Unidimensionality is good enough • Also just sorta say “Well there aren’t testlets either” • These guys could mess you up… • Question 7… blah blah blah. • Question 8… What is your reason for #7?
Multiple Choice Models • 1PL : items vary by difficulty (location) • “One-parameter logistic” • 2PL : … and discrimination (slope) • 3PL : … and lower asymptote (guessing)
Multiple Choice Models (3PL) • Because engineers love equations… • What it means… • Probability of responding correctly to an item, for a given ability (θ), is a function of guessing (c), discrimination (α), and difficulty (β) • Simplify to 2PL by assuming c = 0 and further to 1PL by assuming α = 1
Item Characteristic Curves • For each item, the graph of the previous function is called the Item Characteristic Curve (ICC) • X-axis: ability (θ) • Y-axis: probability of correct response • Following slides: comparison of three models for three questions
Does it fit? • Divide students into bins based on ability estimates • Compare observed to model probabilities • Statistical assessment based on χ2 (chi-square) • Informal assessment graphically
More “getting more” • Items can be strongly discriminating at extremes of the ability scale • Previous slide: steep at high ability • Poor discriminator by simpler metrics • Discrimination index 0.25, ranked 28 of 38
Wait, there’s more! • Information replaces reliability • Based on item’s discrimination across the ability spectrum • Test information is the sum of item informations
Other IRT Models • Multiple Choice • Nominal Response Model • Item Response Curves for each multiple choice option • Likert scales (e.g., 1-2-3-4-5) • Graded Response Model • Similarly, ICC for each numeric option • Can help evaluate whether 3 on this question is the “same” as 3 on that question
Where I hang out • https://engineering.purdue.edu/SCI • allenk@purdue.edu • Use at your peril • kcallen@hotmail.com • More permanently trust-worthy