110 likes | 232 Vues
This piece explores the intricate process of test development under the No Child Left Behind (NCLB) initiative, emphasizing the importance of quality control and best practices. It discusses the essential steps in test development, from interpreting content standards to field testing, and highlights the challenges of ensuring fair and comparable assessments. Key considerations such as cost factors, validity, and the impact of group differences are examined, alongside practical strategies to achieve high standards.
E N D
Developing the Tests for NCLB:No Item Left Behind Steve Dunbar Iowa Testing Programs University of Iowa
Test Development: A Technical Concern • Procedures are well-established – it’s sortof a ‘rocket-art’ • Aspects of ‘quality’ that seem distinct to an observer are inseparable to a developer • Quality control requires resources – talent, time, and money – to do well • TD is the grunt work of assessment
Best Practice in Test Development • Interpret content standards; translate intotest specifications • Search for stimulus material; draft items • Do the 3Rs: REVIEW-REVISE-REPLACE • Prepare material for field testing • Oops – we forgot about finding the kids to participate in field testing, many comparable samples of them
More Best Practice in TD • Administer, retrieve, and score tryout materials; get item analysisresults to TDers • Do the 3Rs: REVIEW-REVISE-REPLACE • Prepare more material for field testing • Oops – more kids for field testing, more comparable samples
What do we get from Best Practice? • Something elusive (important content, interesting materials, good questions, cognitive complexity, comparability) • Something intangible (fairness, alignment with standards, intended consequences) • Something concrete (coverage, rater reliability, a validity or generalizability coefficient, acceptable cost)
Some TD Half Truths • Multiple Choice ItemsDevelopment is hard Scoring is easy (and public)Quality Control built in to TD process • Open-ended ItemsDevelopment is easyScoring is hard (and private)Quality Control elusive due to scoring
Comparability in Test Materials • Test form as the unit for judging comparability • Easy to achieve with many items on the test and many potential throwaways in the pool • Experienced test development staff • Good field testing and scoring needed
Group Differences and Fairness • TD seeks a balance • Tension is that balance requires questions, lots of them • Instructional influences confounded with group effects • DIF requires good matching questions
Cost Factors in Large-Scale Testing • Development CostsRecur with each test formAre fixed by instrument design • Scoring CostsRecur with each test administrationMay change because of ‘unexpected’ circumstances
Validity in Test Development • Best practice ensures content quality, balance, and alignment with standards – critical aspects of validity & reliability • TD is predicated on anticipated use • Other aspects of validity & reliability aren’t understood until it’s too late, i.e. when the test is operational
Validity & Capacity in NCLB • NCLB is census testing • Census testing places heavy demands on TD and other aspects of an accountability system • Limit on capacity in TD meansonly 1R, or 2Rsfewer rounds of field testing dwindling pools of test materials • No item left behind