New England Common Assessment Program

New England Common Assessment Program Science Test Item Review Committee Meeting August 14-15, 2007 Killington, VT

New Hampshire Tim Kurtz Jan McLaughlin Brain Cochrane Stan Freeda Rhode Island Mary Ann Snider Heather Heineke Agnew Linda Jzyk Peter McLaren Vermont Michael Hock Gail Hall Pat Fitzsimmons Dave White Measured Progress Harold Stephens Elliot Scharff Amanda Smith Josh Evans Jim Manhart Tori Henkes Beneta Brown Susan Tierney Welcome and Introductions

New England Common Assessment Program An Emerging Vision Cabot School, Vermont, Web Project Artwork

NECAP – Where are we now? • Grades 3–8 (Reading, Math, and Writing) • Oct 2007 Third Administration • Jan 2008 Release Results • Grade 11 (Reading, Math, and Writing) • Oct 2007 First Operational Administration • Feb 2008 Release Results • Grades 4, 8, and 11 (Science) • May 2008 First Operational Administration • Oct 2008 Release Results

Science Overview • 2007–2008 Schedule • Test Form Construction • Bias/Sensitivity • Depth-of-Knowledge • Test Item Review & Role of Committees • Universal Design for Assessment

NECAP 2007–2008 Schedule • Item Review Committee meeting: August 14–15 • 36 teachers: 12 from each state • Bias Committee meeting: August 14–16 • 18 teachers: 6 from each state • Face-to-Face meetings: October/November • Test Form Production: January/February • DOE Reviews: late February / early March • Printing: March • Test Administration Workshops: April 2008 • Shipments to schools: April 25, 2008 • Test Administration Window: May 12–29, 2008 • 108,000 students from the 3 states

Overview of Test Design • Collaborative effort among NH, RI, and VT • Based on common content from all three states • Used “Big Ideas of Science” and the domains of science as organizing foundations • Less about isolated facts and more about use and application of information

Test Design – Who? • Who? • The NECAP includes “all” students educated at public expense in grades 3–8 and 11 in NH, RI, and VT. • Through explicit planning during test construction and the use of accommodations, the tests will be accessible to as many students as possible. • The NECAP does not include each state’s alternate assessment and English language proficiency assessment programs.

Test Design – What? • What? • The content, skills, and depth of knowledge contained in the Assessment Targets of each states’ Grade Span Expectations (GSEs). The Assessment Targets were developed jointly by the three states expressly for this assessment program. • Physical Science, Life Science, and Earth Space Science at the end of grades 4, 8, and 11. • Each test will be designed to measure a range of student achievement across four performance levels.

Test Design – Why Spring Testing? • Why spring testing? • Critical transition points • Grade 4 to 5, 8 to 9, and HS to beyond • National Standards • General agreement at transition points • High School Schedule • 4-by-4 block scheduling • Science is not (yet?) part of AYP

Test Design – How? • How? • Operational Test • Three Sessions • Sessions 1 and 2: MC and CR items grouped together in three domains—Life Science, Physical Science, and Earth Space Science • Session 3: Performance Task

Test Design – Performance Task • Performance Task • Session 3 will be a performance task • Looking at inquiry and science process • Focus on one assessment target within “INQ” code • Scenario (story) driven • Work in groups of two or three to begin the session, then answer questions individually • Focus will vary by grade • Grade 4: Always hands-on “design an experiment” • Grade 8: Sometimes like Grade 4, sometimes like Grade 11 • Grade 11: Students will be given data and asked to draw conclusions

Test Design – Forms Construction • Forms Construction—Common/Matrix Design • Common Items • A common set of items completed by all students • All achievement level scores (student, school, district, and state) are based solely on common items • Matrix-Sampled Items • Unique sets of items distributed across forms • Includes equating and field test items

Bias/Sensitivity Review How do we ensure that this test works well for students from diverse backgrounds?

What Is Item Bias? • Bias is the presence of some characteristic of an assessment item that results in the differential performance of two individuals of the same ability but from different student subgroups. • Bias is not the same thing as stereotyping, although we don’t want either in NECAP. • We need to ensure that ALL students have an equal opportunity to demonstrate their knowledge and skills.

Role of the Bias/Sensitivity Review Committee The Bias/Sensitivity Review Committee DOES need to make recommendations concerning… • Sensitivity to different cultures, religions, ethnic and socio-economic groups, and disabilities • Balance of gender roles • Use of positive language, situations, and images • In general, items and text that may elicit strong emotions in specific groups of students, and as a result, may prevent those groups of students from accurately demonstrating their skills and knowledge

Role of the Bias/Sensitivity Review Committee The Bias/Sensitivity Review Committee will not make recommendations concerning… • Reading Level • Grade-Level Appropriateness • Assessment Target Alignment • Instructional Relevance • Language Structure and Complexity • Accessibility • Overall Item Design

Depth of Knowledge How do we ensure that the test contains a range of complexity?

Depth of Knowledge Level 1 Recall and Reproduction Recall of a fact, information, or procedure Level 2 Skills and Concepts Use information or conceptual knowledge, two or more steps, etc. Level 3 Strategic Thinking Requires reasoning, developing plan or a sequence of steps, some complexity, more than one possible answer Level 4 Extended Thinking Requires an investigation, time to think and process multiple conditions of the problem

Test Item Review Committees This assessment has been designed to support a quality program in science. It has been informed by the input of hundreds of NH, RI, and VT educators. Because we intend to release assessment items each year, the development process continues to depend on the experience, professional judgment, and wisdom of classroom teachers from our three states.

Role of the Test Item Review Committees Today you will be looking at test items in science. The role of Measured Progress staff is to facilitate the discussion and capture recommendations that are clear and defensible for test items. The role of DoE content specialists is to listen, ask clarifying questions as necessary, and explain background information. Your role is to advise the states by actively offering opinions based on content knowledge and grade-level expertise.

Role of Test Item Review Committees • You will be asked to review all items against the following criteria: • Assessment Target Alignment • Correctness • Depth of Knowledge • Language • Universal Design • Finally you will recommend each item for field testing, revision, or rejection. • Each committee member will complete a form to gather this information about each item.

Role of Test Item Review Committees • You will also be asked to provide group feedback on the following question: Does this item measure more specific knowledge and ideas that might be part of an end-of-unit test or does it measure extended learning that would be part of a cumulative science assessment?

Role of Test Item Review Committees • You will also be asked to provide group feedback on the inquiry task by answering the following questions: 1.Is it possible for students at this grade level to answer the questions without completing the task? 2. Do the questions related to this task require scientific knowledge and understanding to answer?

Role of the Test Item Review Committees You are here today to represent your diverse perspectives. We hope that you… • share your thoughts vigorously and listen just as intensely—we have different expertise and we can learn from each other, • use the pronouns “we” and “us” rather than “they” and “them”—we are all working together to make this the best assessment possible, and • grow from this experience—I know we will. And we hope that today will be the beginning of some new interstate friendships.

New England Common Assessment Program