Teacher evaluation : A multiple-measures system

Teacher evaluation: A multiple-measures system Laura Goe, Ph.D. Research Scientist, ETS, and Principal Investigator for the National Comprehensive Center for Teacher Quality Workshop Honolulu, HI July 19, 2011

Today’s presentation available online • To download a copy of this presentation or look at it on your iPad, smart phone or laptop, go to www.lauragoe.com • Go to Publications and Presentations page. • Today’s presentation is at the bottom of the page

Laura Goe, Ph.D. • Former teacher in rural & urban schools • Special education (7th & 8th grade, Tunica, MS) • Language arts (7th grade, Memphis, TN) • Graduate of UC Berkeley’s Policy, Organizations, Measurement & Evaluation doctoral program • Principal Investigator for the National Comprehensive Center for Teacher Quality • Research Scientist in the Performance Research Group at ETS

The National Comprehensive Center for Teacher Quality • A federally-funded partnership whose mission is to help states carry out the teacher quality mandates of ESEA • Vanderbilt University • Learning Point Associates, an affiliate of American Institutes for Research • Educational Testing Service

The goal of teacher evaluation

How did we get here? • Value-added research shows that teachers vary greatly in their contributions to student achievement (Rivkin, Hanushek, & Kain, 2005). • The Widget Effect report (Weisberg et al., 2009) “…examines our pervasive and longstanding failure to recognize and respond to variations in the effectiveness of our teachers.” (from Executive Summary)

Defining teacher effectiveness

Definitions in the research & policy worlds • Anderson (1991) stated that “… an effective teacher is one who quite consistently achieves goals which either directly or indirectly focus on the learning of • their students” (p. 18).

Goe, Bell, & Little (2008) definition of teacher effectiveness • Have high expectations for all students and help students learn, as measured by value-added or alternative measures. • Contribute to positive academic, attitudinal, and social outcomes for students, such as regular attendance, on-time promotion to the next grade, on-time graduation, self-efficacy, and cooperative behavior. • Use diverse resources to plan and structure engaging learning opportunities; monitor student progress formatively, adapting instruction as needed; and evaluate learning using multiple sources of evidence. • Contribute to the development of classrooms and schools that value diversity and civic-mindedness. • Collaborate with other teachers, administrators, parents, and education professionals to ensure student success, particularly the success of students with special needs and those at high risk for failure.

Race to the Top definition of effective & highly effective teacher Effective teacher: students achieve acceptable rates (e.g., at least one grade level in an academic year) of student growth (as defined in this notice). States, LEAs, or schools must include multiple measures, provided that teacher effectiveness is evaluated, in significant part, by student growth (as defined in this notice). Supplemental measures may include, for example, multiple observation-based assessments of teacher performance. (pg 7) Highly effective teacher students achieve high rates (e.g., one and one-half grade levels in an academic year) of student growth (as defined in this notice).

Teacher evaluation measures • Wherein we will consider the statement “When all you have is a hammer, everything looks like a nail.”

Measures and models: Definitions • Measures are the instruments, assessments, protocols, rubrics, and tools that are used in determining teacher effectiveness • Models are the state or district systems of teacher evaluation including all of the inputs and decision points (measures, instruments, processes, training, and scoring, etc.) that result in determinations about individual teachers’ effectiveness

Multiple measures of teacher effectiveness • Evidence of growth in student learning and competency • Standardized tests, pre/post tests in untested subjects • Student performance (art, music, etc.) • Curriculum-based tests given in a standardized manner • Classroom-based tests such as DIBELS • Evidence of instructional quality • Classroom observations • Lesson plans, assignments, and student work • Student surveys such as Harvard’s Tripod • Evidence binder (next generation of portfolio) • Evidence of professional responsibility • Administrator/supervisor reports, parent surveys • Teacher reflection and self-reports, records of contributions

Measures that help teachers grow • Measures that motivate teachers to examine their own practice against specific standards • Measures that allow teachers to participate in or co-construct the evaluation (such as “evidence binders”) • Measures that give teachers opportunities to discuss the results with evaluators, administrators, colleagues, teacher learning communities, mentors, coaches, etc. • Measures that are directly and explicitly aligned with teaching standards • Measures that are aligned with professional development offerings • Measures which include protocols and processes that teachers can examine and comprehend

Considerations for choosing and implementing measures • Consider whether human resources and capacity are sufficient to ensure fidelity of implementation • Conserve resources by encouraging districts to join forces with other districts or regional groups • Establish a plan to evaluate measures to determine if they can effectively differentiate among teacher performance • Examine correlations among measures • Evaluate processes and data each year and make needed adjustments

Teacher observations: strengths and weaknesses • Strengths • Great for teacher formative evaluation (if observation is followed by opportunity to discuss) • Helps evaluator (principals or others) understand teachers’ needs across school or across district • Weaknesses • Only as good as the instruments and the observers • Considered “less objective” • Expensive to conduct (personnel time, training, calibrating) • Validity of observation results may vary with who is doing them, depending on how well trained and calibrated they are

Example: University of Virginia’s CLASS observation tool

Example: Charlotte Danielson’s Framework for Teaching

Example: Kim Marshall’s Rubric

Validity of classroom observations is highly dependent on training • A teacher should get the same score no matter who observes him • This requires that all observers be trained on the instruments and processes • Occasional “calibrating” should be done; more often if there are discrepancies or new observers • Who the evaluators are matters less than adequate training • Teachers should be trained on the observation forms and processes

Cincinnati study results • Study by Kane et al. (2010) used teacher evaluation scores plus value-added scores • “…policies and programs that help a teacher get better on all eight ‘teaching practice’ and ‘classroom environment’ skills measured by TES will lead to student achievement gains” (p. 28) • “…helping teachers improve their ‘classroom environment’ management will likely also generate higher student achievement” (p. 28) • “…[adding] pedagogy that utilizes ‘questioning and discussion’ practices will generate higher reading achievement, but not higher math achievement” (p. 28)

Teacher behaviors & practices that correlate with achievement • High ratings on learning environment (classroom observations (Kane et al., 2010) • Positive student/teacher relationships (Howes et al., 2008) • Parent engagement efforts by teachers and schools (Redding et al., 2004) • Teachers’ participation in intensive professional development with follow-up (Yoon et al., 2007) IN MANY CURRENT TEACHER EVALUATION MODELS, THESE ARE NEVER MEASURED.

Teacher evaluation models

Measuring teachers’ contributions to student learning growth: A summary of current models

Model highlight: Ensuring rigor

Austin Reach Program: Rubric for Determining SLO Rigor (DRAFT)

Model highlight: Multiple measures of student learning

Rhode Island DOE Model: Framework for Applying Multiple Measures of Student Learning Student learning rating The student learning rating is determined by a combination of different sources of evidence of student learning. These sources fall into three categories: + Professional practice rating Category 1: Student growth on state standardized tests (e.g., NECAP, PARCC) Category 2: Student growth on standardized district-wide tests (e.g., NWEA, AP exams, Stanford-10, ACCESS, etc.) Category 3: Other local school-, administrator-, or teacher-selected measures of student performance + Professional responsibilities rating Final evaluation rating

Model highlight: Triangulating results for validity

New Haven “matrix” Asterisks indicate a mismatch—teacher is very high on one area (practice or growth) and very low on the other area.

Model highlight: Transparency

Washington DC IMPACT:Educator Groups

Model highlight: Training, opportunity to discuss results, growth opportunity

Measuring teachers’ contributions to student learning growth (classroom)

Race to the Top definition of student growth • Student growth means the change in student achievement (as defined in this notice) for an individual student between two or more points in time. A State may also include other measures that are rigorous and comparable across classrooms. (pg 11) 35

Validity • There is little research-based support for the validity of using any measures, including student growth measures, for teacher evaluation • Herman et al. (2011) state, “Validity is a matter of degree (based on the extent to which an evidence-based argument justifies the use of an assessment for a specific purpose).” (pg. 1)

Validity is a process • Starts with defining the criteria and standards you want to measure • Requires judgment about whether the instruments and processes are giving accurate, helpful information about performance • Verify validity by • Comparing results on multiple measures • Multiple time points, multiple raters

Achievement Proficient Teacher A: “Success” on Ach. Levels Teacher B: “Failure” on Ach. Levels Start of School Year End of Year Growth vs. Proficiency Models In terms of growth, Teachers A and B areperforming equally Slide courtesy of Doug Harris, Ph.D, University of Wisconsin-Madison

Achievement Proficient Teacher A Teacher B Start of School Year End of Year Growth vs. Proficiency Models (2) A teacher with low-proficiency students can still be high in terms of GROWTH (and vice versa) Slide courtesy of Doug Harris, Ph.D, University of Wisconsin-Madison

Most popular growth models: Value-added and Colorado Growth Model • EVAAS uses prior test scores to predict the next score for a student • Teachers’ value-added is the difference between actual and predicted scores for a set of students • http://www.sas.com/govedu/edu/k12/evaas/index.html • Colorado Growth model • Betebenner 2008: Focus on “growth to proficiency” • Measures students against “academic peers” • www.nciea.org

Linking student learning results to professional growth opportunities Slide courtesy of Damian Betebenner at www.nciea.org

What value-added and growth models cannot tell you • Value-added and growth models are really measuring classroom, not teacher, effects • Value-added models can’t tell you why a particular teacher’s students are scoring higher than expected • Maybe the teacher is focusing instruction narrowly on test content • Or maybe the teacher is offering a rich, engaging curriculum that fosters deep student learning. • How the teacher is achieving results matters!

What assessments are teachers and schools going to use? • Existing measures • Curriculum-based assessments (come with packaged curriculum) • Classroom-based individual testing (DRA, DIBELS) • Formative assessments such as NWEA • Progress monitoring tools (for Response to Intervention) • National tests, certifications tests • Rigorous new measures (may be teacher created) • The 4 Ps: Portfolios/products/performance/projects • School-wide or team-based growth • Pro-rated scores in co-teaching situations • Student learning objectives • Any measure that demonstrates students’ growth towards proficiency in appropriate standards

Measuring other teacher practices and outcomes for special populations • “Stacy Bermingham, the head teacher at the Monarch School [for homeless students] , described her students as an invisible population with special needs. All are at least two grade levels behind their peers, and many routinely come to school hungry, without a shower, and with psychological challenges. Her students have large gaps of conceptual knowledge, and many have learning disabilities and behavioral issues that have been undiagnosed and untreated because they move frequently or simply attend schools that don’t attend to their needs.” Enrollment surges at schools for the homeless. EdWeek 4/11/11; vol 30, issue 28. http://www.edweek.org/ew/articles/2011/04/11/28homeless.h30.html?tkn=URNFtM06cJqs98qRZ2kG7WkUMyuGT7fafAEG&cmp=ENL-EU-NEWS1

How to use evidence of student learning growth • Teacher preparation for measuring student learning growth is limited or non-existent • Most principals, support providers, instructional managers, and coaches are poorly prepared to make judgments about teachers’ contribution to student learning growth • They need to know how to • Evaluate the appropriateness of various measures of student learning for use in teacher evaluation • Work closely with teachers to select appropriate student growth measures and ensure that they are using them correctly and consistently

Georgia CLASS KEYS

Washington DC IMPACT:Rubric for Determining Success (for teachers in non-tested subjects/grades)

Teachers as instructional leaders • “In February, fifth-grade teacher Miguel Aguilar stood in the front of a class, nervous and sweating. The subject — reading and comprehension — was nothing new. But on this day, his students weren't 11-year-olds in sneakers and sweatshirts: They were 30 of his fellow teachers. It was the first time anyone at Broadous Elementary School in Pacoima could remember a teacher there being singled out for his skill and called upon to share his secrets school-wide.” Singled-out L.A. Unified teacher shares skills with colleagues. Los Angeles Times 4/3/11. http://www.latimes.com/news/local/la-me-broadous-teachers-20110403,0,4961288.story

Teacher evaluation : A multiple-measures system

Teacher evaluation : A multiple-measures system

Presentation Transcript

Research Position Evaluation System (RPES)

Multiple Regression – Basic Relationships

Laboratory Measures of ADHD

Functional Capacity Evaluation using the Matheson System Alison Biggs, Healthywork Ltd healthywork.uk

Chapter 3 Retrieval Evaluation

Latin America APEC Funded Medical Device Regulatory Seminar

Enlisted Evaluation System

New York State’s Teacher and Principal Evaluation System

New Hampshire Teacher Effectiveness Task Force November 16, 2010

Multiple Regression – Basic Relationships

Ohio Teacher Evaluation System: Assessment of Teacher Performance

Teachscape Reflect User Guide Teacher Observation and Evaluation

Teacher Evaluation Phase III February 27 – 28, 2013

Teacher Self-Efficacy

AR 623-3/DA Pam 623-3 Evaluation Reporting System

Teacher Evaluation in Newark: Evaluator Training

EVAAS and the NC Educator Evaluation System

The Multiple West Coast Offense

ASSESSMENT AND EVALUATION TOOLS AND TECHNIQUES FOR CCE

Numerical Measures