Challenges of Piloting Test Items Branka Petek School of Foreign Languages Slovenia
Content • Challenges Slovenia had to face when piloting test items • What we learned from experience
Why pilot test items? To get a clear picture about candidates’ language skills. To get a clear picture we need good test items. Impossible to have good test items without pre- testing.
Challenges SFL had to face • Appropriate population for piloting • Administration of the items • Test format • Statistical analyses
Population for piloting • Size • Similarity to the Slovenian testing population • Level of proficiency • Test fatigue
Lessons learned • SIZE: the population should be as big as posible, (but) anything is better than nothing; • SIMILARITY: the population should be similar to the testing population; • LEVEL OF PROFICIENCY: normal (or near normal) distribution otherwise the results will be unreliable. • TEST FATIGUE: Have the canidates piloted before? Are they tired of taking the tests, piloting?
Administration • Administrators • Time • Courses • Collecting data on testakers
Lessons learned • ADMINISTRATORS: the most reliable results when we administer the tests; • TIME: depends on a course cycle; • COURSES: courses designed to prepare students for STANAG tests normally give the most reliable results; • QUESTIONNAIRES: help investigate face validity of tests, time allocated, clarity of rubrics, appropriacy of test methods, text topics (if well designed).
Test format • Length • Number of items • Task types • Topics (cultural background, influence of the course)
Lessons learned • LENGTH: Similar to the live test version; • NUMBER OF ITEMS: approximately the same number of items; • TASK TYPES: different countries use different methods – candidates might not be familiar with the task types we use; • FAMILIARITY WITH THE TOPICS: e.g. military topics (cultural background);
Statistical analyses • CTT • IRT • ‘Manual check’ • The influence of a particular population
Lessons learned • Small population, CTT – the only option; • Sometimes less than 30 - manual checking: odd answers and strange behaviour, can help eliminate some problems and improve the items; • With small population the data is less reliable - always an element of risk.
Perfect & real-world of piloting • A perfect world piloting session would mean at least 300 test takers, IRT, revising test items, repilot, IRT, final version of the test and experts to determine cut-off scores. • In real world piloting is difficult to plan and carry out. • Absolutely essential part of a testing cycle. • Piloting internationally can produce more reliable results but also represents many pitfalls we have to be aware of. • Being aware of possible problems might help us plan. • The more we invest (in the sense of time, effort and money), the more we get.
Thank you Questions, suggestions?