1 / 45

Item Response Theory

Item Response Theory. What’s wrong with the old approach?. Classical test theory Sample dependent Parallel test form issue Comparing examinee scores Reliability No predictability “Error” is the same for everybody. So, what is IRT?.

Télécharger la présentation

Item Response Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Item Response Theory

  2. What’s wrong with the old approach? • Classical test theory • Sample dependent • Parallel test form issue • Comparing examinee scores • Reliability • No predictability • “Error” is the same for everybody

  3. So, what is IRT? • A family of mathematical models that describe the interaction between examinees and test items • Examinee performance can be predicted in terms of the underlying trait • Provides a means for estimating scores for people and characteristics of items • Common framework for describing people and items

  4. Some Terminology • “Ability” • We use this as a generic term used to describe the “thing” that we are trying to measure • The “thing” can be any old “thing” and we need not concern ourselves with labeling the “thing”, but examples of the “thing” include: • Reading ability • Math performance • Depression

  5. The ogive • Natural occurring form that describes something about people • Used throughout science, engineering, and the social sciences • Also, used in architecture, carpentry, photograph, art, and so forth

  6. The ogive

  7. The ogive

  8. The Item Characteristic Curve (ICC) • This function really does everything: • Scales items & people onto a common metric • Helps in standard setting • Foundation of equating • Some meaning in terms of student ability

  9. The ICC • Any line in a Cartesian system can be defined by a formula • The simplest formula for the ogive is the logistic function:

  10. The ICC • Where bis the item parameter, and qis the person parameter • The equation represents the probability of responding correctly to item i given the ability of person j.

  11. bis the inflection point Item i bi=0.125

  12. We can now use the item parameter to calculate p • Let’s assume we have a student with q =1.0, and we have ourb= 0.125 • Then we can simply plug in the numbers into our formula

  13. Using the item parameters to calculate p p = 0.705 qi=1.00

  14. Wait a minute • What do you mean a student with an ability of 1.0?? • Does an ability of 0.0 mean that a student has NO ability? • What if my student has a reading ability estimate of -1.2?

  15. The ability scale • Ability is on an arbitrary scale that just so happens to be centered around 0.0 • We use arbitrary scales all the time: • Fahrenheit • Celsius • Decibels • DJIA

  16. Scaled Scores • Although ability estimates are centered around zero – reported scores are not • However, scaled scores are typically a linear transformation of ability estimates • Example of a linear transformation: • (Ability x Slope) + Intercept

  17. The need for scaled scores ½ the kids will have negative ability estimates

  18. The Two Scales of Measurement • Reporting Scale (Scaled Scores) • Student/parent level report • School/district report • Cross year comparisons • Performance level categorization • The Psychometric Scale (q) • IRT item and person parameters • Equating • Standard setting

  19. Unfortunately, life can get a lot worse • Items vary from one another in a variety of ways: • Difficulty • Discrimination • Guessing • Item type (MC vs. CR)

  20. Items can vary in terms of difficulty Easier item Harder item Ability of a student

  21. Items can vary in terms of discrimination • Discrimination is reflected by the “pitch” in the ICC • Thus, we allow the ICCs to vary in terms of their slope

  22. Good item discrimination Noticeable difference in p 2 close ability levels

  23. Poor item discrimination smaller difference Same 2 ability levels

  24. Guessing This item is asymptotically approaching 0.25

  25. Constructed Response Items

  26. Items and people • Interact in a variety of ways • We can use IRT to show that there exists a nice little s-shaped curve that shows this interaction • As ability increases – the probability of a correct response increases

  27. Advantages of IRT • Because of the stochastic nature of IRT there are many statistical principles we can take advantage of • A test is a sum of its parts

  28. The test characteristic curve • A test is made up of many items • The TCC can be used to summarize across all of our items • The TCC is simply the summation of ICCs along our ability continuum • For any ability level we can use the TCC to estimate the overall test score for an examinee

  29. Several ICCs are on a test

  30. The test characteristic curve

  31. The test characteristic curve • From an observed test score (i.e., a student’s total test score) we can estimate ability • The TCC is used in standard setting to establish performance levels • The TCC can also be used to equate tests from one year to the next

  32. Estimating Ability Total score = 3 Ability≈0.175

  33. Psychometric “Information” • The amount that an item contributes to estimating ability • Items that are close to a person’s ability provide more information than items that are far away • An item is most informative around the point of inflection

  34. Item Information Item is most informative here because this is where we can discriminate among nearby q values

  35. Item Information Item is much less informative at points along q where there is little slope in the ICC

  36. Test Information • Test information is the sum of item information • Tests are also most “informative” where the slope of the TCC is the greatest • Information (like everything else in IRT) is a function of ability • Test information really is test “precision”

  37. Let’s start with a TCC

  38. Information Functions BP/P We can evaluate information at a given cutpoint

  39. Information and CTT • CTT has reliability and of course the famous a coefficient • IRT has the test information function • Test quality can be evaluated conditionally along the performance continuum • In IRT information is, conveniently, reciprocally related to standard error

  40. Standard Error as a function of ability q = 0.175 SE = 0.25

  41. Standard Error of Ability Total score = 3 Ability≈0.175

  42. Standard Error of Ability Total score = 3 Ability≈0.175 Confident region of ability estimate }

  43. Item Response Theory • A vast kingdom of equations, and dizzying array of complex concepts • Ultimately, we use IRT to explain the interaction between students and test items • The cornerstone to IRT is the ICC which depicts that as ability increases the chances of getting an item correct increases

  44. Item Response Theory • Everything in IRT can be studied conditionally along the performance continuum • The CTT concept of reliability is what we call test information, and we can think of this as being a function of test precision • SE is related to information and can also be studied along q

  45. The Utility of Item Response Theory • Can be used to estimate characteristics of items and people • Can be used in the test development process to maximize information (minimize SE) at critical points along q • Can even be used for test administration purposes

More Related