Scoring Provincial Large-Scale Assessments

Scoring Provincial Large-Scale Assessments María Elena Oliveri, University of British Columbia Britta Gundersen-Bryden, British Columbia Ministry of Education Kadriye Ercikan, University of British Columbia 1

Objectives Describe and Discuss Five steps used to score provincial large-scale assessments (LSAs) Advantages and challenges associated with diverse scoring models (e.g., centralized versus decentralized) Lessons learned in British Columbia when switching from a centralized to a decentralized scoring model

Scoring Provincial Large-Scale Assessments • LSAs are administered to • collect data to evaluate efficacy of school systems, • guide policy-making • make decisions regarding improving student learning • An accurate scoring process examined in relation to the purposes of the test, and the decisions the assessment data are intended to inform are key to obtaining useful data from these assessments

Accuracy in Scoring • Essential to having accurate & meaningful scores is the degree to which scoring rubrics: • (1) appropriately and accurately identify relevant aspects of responses as evidence of student performance, • (2) are accurately implemented • (3) are consistently applied across examinees • Uniformity in scoring LSAs is central to achieving comparability of students’ responses: ensure differences in results are attributable to differences among examinees’ performance rather than due to biases introduced by the use of differing scoring procedures • A five-step process is typically used

Step One: “Test Design Stage” • Design of test specifications • That match the learning outcomes or construct(s) assessed • Include particular weights & number of items needed to assess each intended construct

Step Two: “Scoring Open-Response Items” Decide which model to use to score open-response items: • Centralized models are directly supervised by provincial Ministries or Departments of Education in a central location • Decentralized models often take place across several locations & are performed by a considerably greater number of teachers; used for scoring medium to low-stakes LSAs

Step Three: “Preparing Training Materials” • Identify common tools to train scorers, including: • Exemplars of students’ work demonstrating each of the scale points in the scoring rubric • Illustrate potential biases arising in the scoring process (e.g., differences in scores given to hand- vs. type-written essays)

Step Four: “Training of Scorers” • Training occurs prior to scoring and can recur during the session itself, especially if the session spans more than one day • A “train the trainer” approach is often used • a small cadre of more experienced team leaders are trained first, then they train other scorers who will actually score the responses • Team leaders often make final judgement calls on the assignment of scores differing from exemplars • Serves to reinforce common standards and consistency in the assignment of scores and leads to having fair and accurate scores

Step Five: “Monitoring Scores” • Includes checks for inter-marker reliability, wherein a sample of papers is re-scored to check consistency in scoring across raters • May serve as re-training or “re-calibration” activity, with raters discussing scores and rationales for their scoring procedures

The Foundation Skills Assessment • The Foundation Skills Assessment (FSA) will be used as a case study to illustrate advantages and challenges associated with switching from a centralized and decentralized scoring model • The FSA assess Grade 4 and 7 students’ skills in reading, writing and numeracy • Several changes made to the FSA in 2008 as a response to stakeholders’ demands to have more meaningful LSAs that informed classroom practice

Changes to the FSA • Earlier administration • from May to February • Online administration of closed-response sections • Parents or guardians received child’s open-response test portions & summary statement of reading, writing and numeracy skills • Scoring model changed from a centralized to a decentralized model • Ministry held “train the trainer” workshops to prepare school district personnel to organize and conduct local scoring sessions • School districts could decide how to conduct scoring sessions • score individually, in pairs or in groups • double-score only a few, some or all the responses

Advantages of a Decentralized Model • Professional Development • A decentralized model allowed four times as many teachers to work with scoring rubrics and exemplars • Educators were able to develop a deeper understanding of provincial standards and expectations for student achievement • If scorers are educators, they may later apply knowledge of rubrics and exemplars in their classroom practice and school environments and consider the performance of their own students in a broader provincial context

Advantages of a Decentralized Model • Earlier return of test results & earlier provision of feedback to teachers, students and the school • More immediate feedback may lead to improving learning and guiding student teaching • Data informs teachers about students’ strengths and areas of improvement in relation to provincial standards • May be helpful in writing school plans and targeting the areas upon which particular schools may focus

Challenges of a Decentralized Scoring Model • Increased difficulty associated with • Less time allocated to implementing cross-check procedures • Decreased standardization of scoring instructions given to raters • Increased costs (higher number of teachers scoring) • Reduced training time

Potential Solutions • Provide teachers with adequate training time • e.g., one to two days of training prior to scoring the assessments • Increase discussion among teachers, which may involve reviewing exemplars falling in between scale points in the rubric • Have table leaders • e.g., teachers with prior scoring experience • Re-group teachers to verify difficulties or uncertainties related to the scoring process

Final Note • Closer collaboration among educators and Ministries and Departments of Education may lead to improved tests as educators bring their professional experience of how students learn in the classroom to bear on test design itself • Strong alignment between the overall purposes of the test, the test design and the scoring model used may add value to score interpretation and subsequent use of assessment results

Thank you 17

Scoring Provincial Large-Scale Assessments

Scoring Provincial Large-Scale Assessments

Presentation Transcript

Large Scale Weather

Large Scale Structure

Book 3: Use of Accommodations in Large-Scale Assessments

LARGE - SCALE ASSESSMENTS

large scale Refactoring

Issues relating to Large-scale Assessments

Large-scale matching

LARGE SCALE

Large- scale Organisations

Challenges in International Large-Scale Assessments

LARGE SCALE ORGANISATIONS

Large scale

Scale Scoring

Large-Scale Systems

Scale Scoring

Large Scale Sharing

Large Scale Operations

Large Scale Applications

Large Scale Pilot

Challenges in International Large-Scale Assessments

Large Scale Drupal