Assessment Literacy Module

California Department of Education Tom Torlakson, State Superintendent of Public Instruction Assessment Literacy Module Unit 7: Analyze and Act: Appropriate and Effective Use of Assessment Data

Welcome to Unit 7 The purpose of this unit is to help educators analyze and interpret assessment data, to modify and differentiate instruction, and to help them provide effective feedback and involvement opportunities to students.

Learning Objectives for Unit 7 By the end of this unit, participants will be able to: Describe the characteristics of a quality rubric Explain approaches for analyzing data in making instructional decisions State the characteristics of effective feedback Discuss the benefits of involving students in using assessment results

Assessment Data Assessment data is used in a number of ways throughout all levels of the educational system, from long-range policy making to guiding instructional decisions. This unit will focus on: Diagnosing student learning needs Monitoring student progress Identifying the need for additional support

Assessment Data Here is where we are in the teaching-assessment cycle and the assessment-literacy attributes to be highlighted.

Using Rubrics to Evaluate Student Work A rubric is a scoring tool that gives assessment-literate educators and students accurate information about current performance as well as feedback on how to improve. Rubrics are more sophisticated than checklists; they also articulate gradations of quality for each criterion. View this video for an explanation of rubric basics: “Creating and Judging Rubrics” http://www.k-state.edu/ksde/alp/module7/

Using Rubrics to Evaluate Student Work Scoring selected-response assessments is straightforward; answers are either correct or incorrect. Scoring constructed-response assessments, such as essays and performance tasks, requires a scoring rubric.

Developing Rubrics Constructed Response (CR) assessments prompt students to generate a text or numerical response in order to collect evidence about their knowledge or understanding of a given assessment target. CR assessments can be both short and extended responses. Short items may require test-takers to enter a single word, phrase, sentence, number, or set of numbers, whereas extended items require more elaborated answers and explanations of reasoning. These kinds of CR items allow students to demonstrate their use of complex thinking skills. Performance Tasks (PTs) measure a student's ability to integrate knowledge and skills across multiple standards ‒ a key component of college and career readiness. PTs are used to measure capacities such as depth of understanding, research skills, and complex analysis, which cannot be adequately assessed with selected- or constructed-response items. These item types challenge students to apply their knowledge and skills to respond to complex real-world problems.

Developing Rubrics Now view another video to hear highlights of the steps for developing quality rubrics. "How Are Rubrics Developed" http://www.k-state.edu/ksde/alp/module7/ Using rubrics in combination with exemplars or pieces of work that typify the score descriptors helps scorers interpret the rubrics the same way, increasing consistency between raters.

Exploring the Parts of a Rubric The video introduced the basic parts of a quality rubric: Criteria: “What counts” in a product or performance (e.g., purpose, organization, details, voice, and mechanics for a writing rubric). The criteria cover important features that are relevant to the learning target. Scoring Levels: Can be two or more levels of quality ranging from excellent to poor. The number of levels is appropriate to the complexity of the task; more complex tasks have more levels. The levels are just enough to allow a clear distinction between the different gradations of quality and adequately track student progress. Descriptors: Define the levels. They are clearly written so that students and teachers can agree not only on what makes work excellent or poor, but also on any level in between.

Exploring the Parts of a Rubric “Chocolate Chunk Cookies” rubric example: • Row headings name the criteria to judge cookie quality: • Chocolate Chunk Balance • Texture • Color/Appearance • Taste • Column headings identify the levels of quality with numerical values: • Poor (1) • Fair (2) • Good (3) Descriptors are in the body of the table. For example, a cookie with good texture is “chewy, soft, very few crumbs.”

Exploring the Parts of a Rubric Here's how the completed rubric looks: How would you rate a cookie that has a chocolate chunk in every bite and is burned on the bottom?

Smarter Balanced ELA Performance Task Review Grade 4 Smarter Balanced ELA Performance Task: “Animal Defenses” http://www.smarterbalanced.org/wordpress/ wp-content/uploads/2012/09/performance-tasks/animal-performance.pdf 1. What genre of writing is this? 2. What grade span? 3. How many scoring levels for this item? 4. In your opinion, are there enough levels to allow a clear distinction between the different gradations of quality? 5. What are the criteria?

Smarter Balanced ELA Performance Task Review 6. In your opinion, do the criteria cover important features that are relevant to this genre of writing? Explain your answer. 7. What is the descriptor under the criterion “Statement of Purpose/Focus and Organization” for a score of 4? 8. Compare a descriptor for a 4 with the corresponding 1. Are they written clearly so that you could rate student work? Explain. 9. How do the descriptors in this rubric compare to the “Chocolate Chunk Cookie” rubric?

Smarter Balanced ELA Performance Task Review Select the performance task closest to your grade span and familiarize yourself with the task. Consider each question presented on the previous slides as you review the rubric on the pages indicated. This information is useful to educators of all disciplines in evaluating student writing, not only for ELA: Grade 6 ELA Performance Task (see pp.16‒18): “Garden of Learning” http://www.smarterbalanced.org/wordpress/wp-content/uploads/ 2012/09/performance-tasks/garden.pdf Grade 11 ELA Performance Task (see pp.13‒15): “Nuclear Power: Friend or Foe?” http://www.smarterbalanced.org/wordpress/wp-content/uploads/ 2012/09/performance-tasks/nuclear.pdf

Smarter Balanced ELA Performance Task Review Evaluating Rubric Quality: An Internet search for rubrics will yield an abundant number of hits, but how does an educator sort the wheat from the chaff? Review a quality rubric that can be used to evaluate any rubric: http://www.k-state.edu/ksde/alp/activities/Activity7-5.pdf Source: Kansas Department of Education

Involving Students in Using Rubrics Here is where we are in the teaching-assessment cycle and the assessment literacy attributes that will be covered.

Benefits of Involving Students Recall the many benefits of involving students in the formative assessment process. Once students are familiar with how educators use rubrics, they can begin using rubrics to score their own work. Educators who create a climate where students are a community of learners can also have students score each other’s work.

Benefits of Involving Students Positive research outcomes of students being involved in assessing their own learning: All students show gains and lowest achieving students show the largest gains overall (Black and Wiliam 1998). Mistakes become feedback that students can use to adjust their learning activities and strategies (Black & Wiliam 1998; Butler & Nisan 1986; Butler 1987; Shepard & Smith 1987). They make active choices about their learning which has been demonstrated to increase achievement (Gearhardt & Wolfe 1995; Harlan & Deakin-Crick 2003; Jensen 1998). They have to think about their own learning and talk about their understanding which adds to their learning (Schon 1883, 1990; Walters, Seidel & Gardner 1994; Wolf 1989; Young 2000; Zessoules & Gardner 1991). Their self-assessments help teachers design instruction to better meet the needs of learners (Anthony, Johnson, Mickelson and Preece 1991; Davies, Cameron, Politano, and Gregory, 1992; Elbo 1986; Preece 1995; Wiggins 1993). Source: Making Classroom Assessment Work Davies, Herbst, & Reynolds 2008

Students Assessing the Quality of Their Own Work Watch students in a grade 7 math class assess the quality of their work using a rubric: “Quality Evidence Rubrics” http://successatthecore.com/teacher-development/featured-video.aspx?v=40 As you watch, think about the following questions: 1. What is the main objective of the Quality Evidence Rubric? 2. According to the teacher, how have Quality Evidence Rubrics affected his students’ work?

Students Assessing the Quality of Other’s Work Students are excellent learning resources for each other and peer assessment complements self-assessment. Consider how the logistics of peer assessment in this video compare to how peer assessment works in your situation. View students present their work and listen to what they say as they rate each other using rubrics. What happens to their knowledge of assessment language and understanding of the assignment over time? “That Would Never Work Here, Either” http://www.learner.org/vod/vod_window.html?pid=1039

Analyzing and Interpreting Data: Mining Item-Level Data Here is where we are in the teaching-assessment cycle with the assessment literacy attributes that are to be covered.

Analyzing and Interpreting Data: Mining Item-Level Data Item-level data is a gold mine for diagnosing student learning needs. Five approaches to mining item-level data are: 1. Looking at the percentage of items correct for a group of students (Selected-Response) 2. Analyzing patterns in incorrect answer choices for a group of students (Multiple Choice) 3. Analyzing patterns in incorrect answer choices for individual students (Multiple Choice) 4. Looking at the percentage of students scoring at each level and individual student scores (Constructed-Response) 5. Digging into individual test items (Selected- or Constructed-Response)

Item Analysis This table shows how to organize and display data for the first two approaches. Columns 1 and 2 show that data is organized by Domain and Content Standard. The data shown is from the items that test CA CCSS Mathematics Standard 3 of the Measurement and Data domain. Table 1: Item Analysis: Multiple-Choice Percentage Correct & Percentage Choosing Each Option

Item Analysis Approach 1: Looking at the percentage of items answered correctly for selected-response items. The data in Table 1 Columns 1–7 answer these questions: Which standards had the highest percentage of correct answers? The lowest percentage? Which items had the lowest percentage of correct answers? How did my class compare with the whole grade? How did my period compare with the whole department? With the whole district (for common assessments)?

Item Analysis 85% of students answered #6 correctly (the highest percentage of correct answers of the three items). This was about the same as the entire grade level and the district. Which item had the lowest percentage of correct answers? How does this compare with the rest of the grade? The district?

Item Analysis: Group Data Approach 2: Analyzing patterns in choices of incorrect responses — group data. Columns 8–11 go deeper into the data, showing the percentage of students who selected each option of the multiple-choice item. Which wrong answer was selected most often?

Item Analysis: Group Data Approach 2: Analyzing patterns in choices of incorrect responses — group data. The correct answer for 1.a was C, so look at the data for distractors A, B, and D. The distractor that was selected most often was D. What does this mean? When distractors (wrong answers) are written for the typical misunderstanding and errors that students make, diagnosing learning needs is easier.

Item Analysis: Group Data Approach 2: Analyzing patterns in choices of incorrect responses — group data(“error analysis”) 1.a. A rectangle is 6 feet long and has a perimeter of 20 feet. What is the width of this rectangle? A. 3.33 feet (confusing perimeter with area) B. 8 feet (didn’t divide width by 2) C. 4 feet (correct answer) D. 14 feet (didn’t multiply length by 2 or divide width by 2) What student learning needs do you diagnose in students who selected D for item #1?

Item Analysis: Analyzing Patterns in Individual Data Approach 3: Analyzing patterns in choices of incorrect responses–Individual student data Table 2: Item Analysis: Student Responses (Item #1, Answer C) When a learning need is diagnosed from looking at group data, the next step is to examine individual student data. Table 2 displays the multiple choice responses made by individual students for item 1. Column C is the correct answer. Which students have the learning need?

Item Analysis: Analyzing Constructed-Response Items Approach 4: Analyzing constructed-response items to determine the percentage scoring at each possible rubric score. Item 1b. “Explain your answer. (4 points)” Table 3 shows how the data for constructed-response items can be displayed: Table 3: Item Analysis: Percentage Scoring Each of the Possible Points onConstructed-Response Items E = Extended response (4 possible points); Proficient score = 3 or 4

Item Analysis: Analyzing Constructed-Response Items Approach 4: Analyzing constructed-response items to determine the percentage scoring at each possible rubric score. Table 3 answers the questions: What percentage of students got each of the possible rubric scores? What percentage of students got a proficient score? What percentage of students left the item blank? How did my class compare with the whole grade? How did my period compare with the whole department (for common assessments)?

Item Analysis: Analyzing Constructed-Response Items Approach 4: Analyzing constructed-response items to determine the percentage scoring at each possible rubric score. For item 1b, 12% of the students got a score of 3 and 8% got a score of 4, for a total of 20% of the students scoring proficient. Based on the criterion of proficiency for Extended-Response items, what percentage of students would benefit from re-teaching?

Item Analysis: Analyzing Constructed-Response Items The next step is to dig into the data for individual students scoring 0–2. Look at the scores each student got for each of the rubric criteria. Which of the rubric criteria had the lowest scores? Compare the Constructed-Response data for 1b to the multiple-choice data. 50% of the students answered the multiple-choice item correctly, while 20% were able to explain their thinking. Discuss:What’s the learning need? What might this imply about instruction?

Item Analysis: Item Deconstruction After identifying student learning needs, it is time to dig deeper into individual test items to unwrap the knowledge, skills, and concepts that are being assessed. When analysis reveals frequently missed items in the same content standard or cluster, the next step is to deconstruct those items into the knowledge, skills, and big ideas required to correctly answer them.

Item Analysis: Item Deconstruction Content standards must be broken into smaller building blocks to provide a road map for instruction and assessment. If items were developed by unwrapping standards, task analysis, and creating clear learning targets, the work is done and items only need to be revisited. Table 4 shows how to organize deconstructed information from ready-made tests or assessments developed from item banks. Table 4: Item Deconstruction Table

Item Analysis: Item Deconstruction After deconstructing frequently missed items for the same standard, educators should then look for patterns. The probing questions below may also suggest next steps for adjusting instruction. 1. What made the items challenging or difficult for your students? 2. Reflect on your practice: Have students been exposed this kind of item before? Have you provided opportunities to learn the knowledge, skills, and concepts for all students? 3. What kinds of learning experiences can you plan next to support student progress? What do you need to teach differently?

Hazards of Mining Item-Level Data Mining item-level data is a rich source of information, but it can also be hazardous. The following rescue strategies can be used when faced with mining hazards (Love et al. 2008):

Item Analysis Review The steps for detailed item analysis are consolidated below to assist you in completing a review and analysis of your own test items: 1. Fill in Table 1 using item level data from a multiple-choice assessment of your own where the distractors are diagnostic. Identify the items that were problematic or diagnostic for most students. What are the misunderstandings that need correction? 2. Using the data from the problematic or diagnostic items identified in Table 1, fill in Table 2 with the item-level data for individual students. Which students showed the misunderstandings? 3. Fill Table 3 with your constructed-response assessment data. What student learning needs are indicated? Now look at individual student scores. Which students have learning needs?

Item Analysis Review 4. Using Table 4, deconstruct frequently missed items that measure the same content standards and the same strands. What patterns do you see in the knowledge, skills, or big ideas? Do these items show the qualities discussed in Unit 6. Are they fair? Do they have any of Popham’s Roadblocks? 5. Reflect on instruction. Have students had opportunities to learn? 6. If you have selected and constructed response items measuring the same content standards, compare the data from both methods of assessment. If there are discrepancies, reflect on instruction and assessment. Have students had opportunities to be assessed in multiple ways?

Acting on the Data: Now What, So What? “Quality comes not from inspection but from improvement of the process." ―W. Edwards Deming, Out Of The Crisis (1982) Here is where we are in the teaching-assessment cycle and the assessment literacy attributes that will be covered:

Developing Instructional Adjustments and Interventions Once you have analyzed information from an assessment, it is time to act and adjust instruction appropriately. In this section, we will consider two examples of instructional adjustments and interventions. The adjustments are based on ongoing analyses of data and are intended to help support teaching and learning. These examples can be mapped onto the Sources of Assessment Data figure on the next slide, according to how quickly an educator acts on assessment information and how often.

Developing Instructional Adjustments and Interventions Example 1. In-the-moment instructional adjustments – Guided Groups Example 2. End-of-week or end-of-unit instructional adjustments – Alternate Ranking Sources of Assessment Data

Example 1: Guided Groups Observe how students in an 8th grade ELA class assess their conceptual understanding, and how their teacher uses the results to differentiate instructional support: “Guided Groups” http://successatthecore.com/teacher-development/featured-video.aspx?v=38 1. How does the teacher adjust instruction? 2. How do students respond to these adjustments? 3. What implications does this practice have for your teaching?

Example 2: Alternate Ranking Assessment-literate educators use weekly assessment data formatively to temporarily group low-performing students for re-teaching and high-performing students for enhanced learning. End-of-unit and annual assessment data can be used for establishing groups needing targeted intervention. Alternate Ranking is a simple and versatile way of analyzing constructed- or selected-response data for either formative or summative uses. It is a somewhat formal method of analysis to identify and group students for differentiated instruction.

Example 2: Alternate Ranking Assessment-literate educators use Alternate Ranking to rank students in an alternating fashion from highest to lowest performance to form temporary groups for targeted instruction. Follow the procedure in the Alternate Ranking handout using the data set provided. Handout Alternate Ranking for Flexible Grouping

Example 2: Alternate Ranking • Discuss: • If the data set is weekly data, what actions would you decide to take? • What if it is end-of-unit data? • What if it is final exam data?

Effective Feedback Here is where we are in the teaching-assessment cycle and the assessment literacy attributes that are to be covered.

The Effectiveness of Feedback “Feedback is effective when it consists of information about progress, and/or about how to proceed." ―Hattie & Timperley2007 There is abundant research clearly indicating that feedback is one of the most powerful positive influences on students’ academic achievement (Hattie 2009). Consider: What makes feedback effective?

The Effectiveness of Feedback In his book Transformative Assessment in Action, Popham (2008,130) describes some of the characteristics that make feedback effective: It needs to be given to students as quickly as possible in a useful format. Errors and mistakes should be treated as helpful indicators of what needs to be worked on. It should be descriptive and focus on areas of both strength and weakness. It should include suggestions about ways students might address their weaknesses.

Assessment Literacy Module