1 / 48

Computerized Adaptive Testing: What is it and How Does it Work?

Computerized Adaptive Testing: What is it and How Does it Work?. Goals of this session. Learn about Computerized Adaptive Testing (CAT) Review Item Response Theory (IRT) Combining CAT with IRT Pros and cons of CAT Answer questions. Not to be confused with….

Faraday
Télécharger la présentation

Computerized Adaptive Testing: What is it and How Does it Work?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computerized Adaptive Testing: What is it and How Does it Work?

  2. Goals of this session • Learn about Computerized Adaptive Testing (CAT) • Review Item Response Theory (IRT) • Combining CAT with IRT • Pros and cons of CAT • Answer questions

  3. Not to be confused with… Computerized Adaptive Testing: Not as cute, but far fewer hairballs.

  4. PART I Introduction to CAT

  5. Motivation for Understanding CAT • There are already operational assessments that use CAT • Some believe it will revolutionize classroom testing in the future • Interesting idea that speaks to potential of computers to have new uses in education • Item Response Theory is all over testing now

  6. OK, so what is CAT? • A type of assessment where a question is displayed on a monitor • Students use mouse to select answer • Computer chooses next question based on previous responses • Next question is displayed on monitor, or else test ends

  7. A graphical representation Questions chosen depend on prior responses

  8. Analogy: A Game of 20 Questions • I am thinking of an object. You have 20 “yes-or-no” questions to figure it out. • Would you write out all your questions ahead of time? 1) Is it an animal? 2) Is it a vegetable? 3) Is it blue? 4) Is it red? 5) Is it bigger than a car? 6) Etc.

  9. 20 Questions, Continued • Isn’t it more effective to base your next question on previous answers? 1) Is it an animal? NO. 2) Is it a vegetable? YES. 3) Is it commonly found in a salad? YES. 4) Is it green? NO. 5) Would Bugs Bunny eat it? YES.

  10. Same principle used in CAT • Computer keeps track of each student’s pattern of responses so far • As test progresses, learn more about individual student • Choose next question (item) to get maximal info about that particular student’s level of ability • Purpose of assessment: Get best possible information about students

  11. Some items are more informative than others? Sure! • Some items are easier than others: 2 + 2 vs. 54389 + 34697 • Some items are more relevant than others: 3 + 7 vs. Academy Awards question • Some items are better at discerning proficient students from those who need improvement

  12. Which is most informative? • Suppose we have only 2 types of students: “Advanced” and “Beginning” • Use the test to classify each student • Which item below is the best for this purpose?

  13. Item 3 is the best • Item 1 is completely useless • Item 2 gives some information • Item 3 is all you need!

  14. But wait… • Wouldn’t we choose Item 3 for ALL students? • If so, why customize a test for an individual student? • Answer: For some students, Item A is more informative. For others, Item B is more informative.

  15. When is one item more informative than another? • Item A: 2 + 2 • Item B: (34 + 68) / 2 • If you’ve answered many difficult items correctly, Item A is waste of time • If you’ve answered many easy items incorrectly, Item B is too hard • Thus, give Item B to high-performing students, Item A to low-performing students

  16. Isn’t that unfair? • It seems like CAT penalizes students for performing well at start • If we give different items to different students, how can we compare their performances? • The above question arises whether we use CAT or not • Item Response Theory to the rescue!

  17. Summary of Part I • CAT customizes assessment based on previous responses, as in 20 Questions • Certain items more informative than others • For some students, Item A is more informative; for others, Item B is • When give different items to different students, need way to relate student performances (Item Response Theory)

  18. PART II Review of Item Response Theory

  19. Item Response Theory (IRT) • Quantifies the relation between examinees and test items • For each item, gives probability of correct response by ability level • Provides a means for describing characteristics of items, estimating ability of examinees • Places examinees on common scale when they have taken different items

  20. The IRT Model: One item

  21. Different items have different curves

  22. Where did those curves come from? • In IRT, ability is denoted by θ • Probability of a correct response is • Each item has its own values of a, b, and c. We know them from field testing • a is the “discrimination”: Related to the slope • bis the “difficulty”: Harder item, higher b • c is the “guessing parameter”: Chance of lucky guess

  23. Effect of the a parameter • All curves shown have equal b and c parameters • Larger a increases the slope in the middle

  24. Effect of the b parameter • All curves shown have equal a and c parameters • Larger b means harder item

  25. Effect of the c parameter • All curves shown have equal a and b parameters • c is the left asymptote

  26. Wait a minute • What do you mean by a student with an ability of 1.0? • Does an ability of 0.0 mean that a student has NO ability? • What if my student has a reading ability of -1.2? What in the world does that mean???

  27. The ability scale • Ability is on an arbitrary scale that just happens to be centered around 0.0 • We use arbitrary scales all the time: • Fahrenheit • Celsius • Decibels • Nevertheless, need more “user-friendly” reporting: “scaled” scores on conventional scale like 200-300

  28. Giving a score for each student • First assign an ability (θ) value to each student (say, -4 to 4) • Student is given the value of θ that is most consistent with his/her responses • The better he/she does on the test, the higher the value of θ that he/she receives • Computer converts the θ score to a scaled score • Report final score!

  29. Assigning scores • Set of answers: (C,C,I,C,C,I,I,C,C,C,I,C,C) • We know which items were taken by each student: a, b, c parameters • If Student 1’s items were harder than Student 2’s, take into account through item parameters • Student 1: θ = 1.25, scaled score = 290 • Student 2: θ = 0.65, scaled score = 268 • Can compare students who took different items!!!

  30. Summary of Part II • If you didn’t get all that, don’t worry • Just remember: • In IRT, different items have different curves (depending on a, b, c parameters) • IRT allows us to give scores on the same scale, even when students take different items • These features critical in CAT • So how do we choose which items to give?

  31. PART III Combining CAT with IRT

  32. CAT Reminder • CAT customizes assessment based on previous responses • For some students, Item A is more informative; for others, Item B is • With IRT, it’s OK to give different items to different students

  33. Which item would you choose next? PREVIOUS RESPONSES: • 10 + 19 = ? Answered correctly. • 27 + 38 = ? Answered incorrectly. • 12 + 26 = ? Answered incorrectly. POSSIBLE ITEMS TO GIVE NEXT: • 18 + 9 = ? • 13 + 17 = ? • 14 + 20 = ?

  34. Item selection to match ability/difficulty • Want to give items appropriate to ability • 2 + 2 is not informative for high-performing students; (34 + 68) / 2 is not informative for low-performing students • Student has taken 10 items, awaits 11th • Classic approach: Give item whose difficulty (b) is closest to current ability estimate (θ)

  35. Which item is better for θ = -1.2? Easier item Harder item

  36. More complex item selection • Previous method: Match difficulty to ability • This criterion only uses b parameter and θ • Recall that a parameter is related to slope, c is guessing parameter • Shouldn’t we consider those when choosing next item?

  37. Another item selection method • Ideal item: High value of a; value of b close to θ; low value of c • “Fisher Information” combines these factors into a single number • Choose item with highest Fisher Info

  38. Game: Which item would you choose? • Suppose our current estimate of θ is 0.6

  39. Results • If matching ability estimate (0.6) with difficulty, we would give Item 2 • If using Fisher Info, we would give Item 2

  40. Round 2 • Suppose our current estimate of θ is 0.7

  41. Round 2 Results • If matching ability estimate (0.7) with difficulty, we would give Item 2 • If using Fisher Info, we would give Item 1

  42. Summary of Part III • Tailor items to be most informative about individual student’s ability • Do this by combining CAT with IRT • One method: Match difficulty with current estimate of θ • Another method: Take all parameters into account via Fisher Info

  43. PART IV Practical Considerations

  44. Problem: Content Balance • In operational testing, must balance content (e.g., math test of algebra, geometry, number sense) • What if all your most informative items come from the same content strand? • In practice, dozens of constraints for each CAT: Content, topics, enemies list, etc. • CAT solution: Pick most informative item among those “in play”

  45. Problem: Test security • CAT administered on multiple occasions • Person A takes exam, memorizes items, tells Person B. Person B takes exam, benefits from Person A’s information • Different students, different items; however, some items more popular than others • CAT solution: Limit the amount each item can be administered

  46. CAT “Pros” • Convenient administration • Immediate scoring • Items maximally informative: Exams just as accurate, with shorter tests • Items at correct level: High-performing students not bored, low-performing students not overwhelmed

  47. CAT “Cons” • Limited by technology • Potential bias versus students with less computer experience • Content balance less exact than paper-and-pencil testing • Test security • Expensive

  48. Final summary • Introduction to CAT: Benefits of giving different items to different students • Review of IRT • Using IRT to select items in a CAT • Pros and cons of CAT

More Related