Multiple Choice Test Item Analysis Facilitator: Sophia Scott
Workshop Format • What is Multiple Choice Test Item Analysis? • Background information • Fundamentals • Guided Practice • Individual Practice
What is Multiple Choice Test Item Analysis? Statistically analyzing your multiple choice test items so that you can ensure that your items are effectively evaluating student learning.
Background information • What does a test score mean? • Reliability and Validity • Norm-referenced or Criterion-referenced
What does a Test Score Mean? • A score that is a reflection of what you really knew (true score) and error (things like atmosphere, nerves etc that modify your true score). • The purpose of a systematic approach to test design is to reduce error in test taking.
Reliability and Validity • Reliability – the test scores are consistent • Test-retest reliability (measure of an individual score is consistent over time) • Inter-rater reliability (consistency of individual judges’ ratings of a performance) • Validity – the test measured what it was suppose to measure. You want your test to be both reliable and valid
Norm-referenced or Criterion-referenced • Norm-referenced – defines the performance of test-takers in relation to one another. Use the frequency distribution and can rank students. Often used to predict success like GRE or GMAT. • Criterion-referenced – defines the performance of each test taker without regard to the performance of others. The success is being able to perform a specific task or set of competencies. Uses a mastery curve.
Item analysis How you interpret the results of a test and use individual item statistics to improve the quality of a test Terms used • Standard deviation – range above and below the average score, the more the scores are spread out the high the SD • Mean – average score • N – number of items on the test • Raw scores – actual scores • Variance = standard deviation squared
Fundamentals of Item Analysis • Were any of the items too difficult or easy? • Do the items discriminate between those students who really knew the material from those that did not? • What is the reliability of the exam?
1. Were any of the items too difficult or too easy? • Use the Difficulty Factor of a question • Proportion of respondents selecting the right answer to that item D = c / n D = difficulty factor c = number of correct answers n = number of respondents • Range 0 -1 • The HIGHER the difficulty factor – the easier the question is, so a value of 1 would mean all the students got the question correct and it may be too easy
Difficulty Factor • Optimal Level is .5 • To be able to discriminate between different levels of achievement, the difficulty factor should be between .3 and .7 • If you want the students to master the topic area, high difficulty values should be expected. D = c / n
Guided Practice What is the D for Items 1-3
Difficulty Factor • Item # 1 = .8 • Item # 2 = .6 • Item # 3 = .4 What does it mean? • Item # 1 = .8 may be too easy • Item # 2 = .6 good • Item # 3 = .4 good
Individual Practice What is the D for Items 4-5
Difficulty Factor • Item # 4 = .5 • Item # 5 = .6 What does it mean? • Item # 4 = .5 optimal • Item # 5 = .6 good Overall, you can say that only item #1 may be too easy
2. Do the items discriminate between those students who really knew the material from those that did not? • The Discrimination Index • DI = (a-b) / n • a=response frequency of the High group • b=response frequency of the Low group • n-number of respondents • Point- Biserial Correlation
2. Do the items discriminate between those students who really knew the material from those that did not? • Correlates the test-takers performance on a single test item with their total score. • Range +1.00 to -1.00 • Items which discriminate well are those which have difficulties between .3 and .7
2. Do the items discriminate between those students who really knew the material from those that did not? • Positive coefficient means that test-taker who got the item right generally did well on the test as a whole, while those who did poorly on the item did poorly on the test. • Negative coefficient means that the test-taker who did well on the test missed the item, while those who did poorly got the item right. • Zero coefficient means that all test-takers got the item correct or incorrect.
2. Do the items discriminate between those students who really knew the material from those that did not? The Discrimination Index Steps • Rank test scores from highest to lowest, so the highest is at the top of the list • Define high group (top 27%) • Define low group (bottom 27%) • Calculate DI= a-b / n
What does it mean? Point Biserial • Item # 1 = .48 • Item # 2 = .43 • Item # 3 = .47 • Item # 4 = .62 • Item # 5 = .83 Item 5 is close to not discriminating Overall the test does discriminate
3. What is the reliability of the exam • Kuder- Richardson 20 • Kuder-Richardson 21 • Cronbach alpha
3. What is the reliability of the exam • Range 0-1 • Higher value indicates a strong relationship between items and test • Lower value indicates a weaker relationship between test item and test r = n / n-1[s2 + Σp1q1 / s2 ] n = number of items on test s= standard deviation p1= proportion of correct responses q1= 1-p1
What does it mean? Kuder 20 • Item # 1 = .88 • Item # 2 = .63 • Item # 3 = .40 • Item # 4 = .76 • Item # 5 = .89 Item 3 may not relate as well Overall the test is reliable
Review Purpose - statistically analyze multiple choice test items to ensure items are effectively evaluating student learning. • Were any of the items too difficult or easy? (Difficulty index) • Do the items discriminate between those students who really knew the material from those that did not? (Discrimination index or Point Biserial) • What is the reliability of the exam? (Kuder 20)
Thank you for your Time Any Questions or Comments?