Teacher Evaluation and Performance Measurement

Doug Staiger, Dartmouth College Teacher Evaluationand PerformanceMeasurement

Not this. Satisfactory (or equivalent) Unsatisfactory (or equivalent) Weisberg, D., Sexton, S., Mulhern, J. & Keeling, D. (2009) The Widget Effect: Our National Failure to Acknowledge and Act on Differences in Teacher Effectiveness. New York: The New Teacher Project. 2

Not this.

Transformative Feedback

Recent Work on Teacher Evaluation • Efforts to identify effective teaching using achievement gains • Work with Tom Kane & others in LAUSD, NYC, Charlotte…www.dartmouth.edu/~dstaiger • Efforts to better identify effective teaching • Measures of Effective Teaching (MET) Project (Bill & Melinda Gates Foundation)www.metproject.org • National Center for Teacher Effectiveness (NCTE)(US Department of Education)www.gse.harvard.edu/ncte 5

The Measures of Effective Teaching Project Participating Teachers • Two school years: 2009-10 and 2010-11 • Grades 4-8: ELA and Math • High School: ELA I, Algebra I and Biology

The MET data is unique … • in the variety of indicators tested, 5 instruments for classroom observations (use FFT here) Student surveys (Tripod Survey) Value-added on state tests • in itsscale, 3,000 teachers 22,500 observation scores (7,500 lesson videos x 3 scores) 900 + trained observers 44,500 students completing surveys and supplemental assessments in year 1 3,120 additional observations by principals/peer observers in Hillsborough County, FL • and in the variety of student outcomes studied. Gains onstate math and ELA tests Gains on supplemental tests (BAM & SAT9 OE) Student-reported outcomes (effort and enjoyment in class, grit)

What is “Effective” Teaching? Can be an inputs based concept Observable actions or characteristics Can be outcomes based concept Measured by student success Ultimately, care about impact on student outcomes Current focus on standardized exams Interest in other outcomes (college, non-cognitive) 8

Multiple Measures of Teaching Effectiveness

Student Achievement Gains (“Value Added”) Measure #1

Basics of Value Added Analysis Teacher value added compares actual student achievement at the end of the year to an expectation for each student Difference between actual and expected achievement, averaged over all of teacher’s students Expected achievement is typical achievement for other students who looked similar at start of year Same prior-year test scores Same demographics, program participation Same characteristics of peers in classroom or school Various flavors, all work similarly Student growth percentiles Average change in score or percentile Based on prior year test or Fall pre-test 11

There are Large Differences in Teacher Effects on Student Achievement Gains Most evidence from “value added” analysis, but similar findings from randomized experiments Huge literature about “teacher effects” on achievement Large persistent variation across teachers Difficult to predict at hire Partially predictable after hire Improve only in the first few years of teaching Not related to most determinants of pay Certification, degrees, experience beyond first few years

Large Variation in Value Added of LAUSD Teachers is Not Related to Teacher Certification

Variation in Value Added of LAUSD Teachersis Related to Prior Performance

Why Not Just Hire Good Teachers? • Wise selection is the best means of improving the school system, and the greatest lack of economy exists wherever teachers have been poorly chosen. • Frank Pierrepont Graves, NYS Commissioner, 1932 • Unfortunately, easier said than done • Decades of work on type of certification, graduate education, exam scores, GPA, college selectivity, TFA • (Very) small, positive effects on student outcomes

Large Variation in Value Added of NYC Teachers is Not Related to Recruitment Channel

Of Course, Teacher Impact on State Test Score is Not All We Care About Depends on design & content of test Test scores are proximate measures But recent evidence suggests they capture long-run impact on student learning and other outcomes Test scores are only one dimension of performance Non-cognitive skills (grit, dependability, …)

Value Added is Controversial “We need to find a way to measure classroom success and teacher effectiveness. Pretending that student outcomes are not part of the equation is like pretending that professional basketball has nothing to do with the score.” (Arne Duncan 2009) “There is no way that any of this current data could actually, fairly, honestly or with any integrity be used to isolate the contributions of an individual teacher.” (Randi Weingarten 2008) 18

What we learned from MET: Value-added measures • Identified teachers who caused students to learn more on state tests following random assignment. • Same teacher’s also caused students to learn more on supplemental assessmentsand enjoy class more. • Low year-to-year correlations in value-added (and other performance measures) understate year-to-career correlations.

Classroom Observations Measure #2

Classroom Observation Using Digital Video 24

What you can expect from us: Helping Districts Test Their Own New Classroom Observations Access to Validation Engine:

Two Cross-Subject Observation Instruments *not: “flexibility & responsiveness” & “organization of physical space”

FFT competencies scored: • CLASSROOM ENVIRONMENT • Creating an environment of respect and rapport • Establishing a culture of learning • Managing classroom procedures • Managing Student Behavior • INSTRUCTION • Communicating with Students • Using Questioning and Discussion Techniques • Engaging Students in Learning • Using Assessments in Instruction

Math Observation Instruments

ELA Observation Instrument

What we learned from MET: Classroom observations: • Observation scores were correlated with a teacher’s value-added (.15-.27). • Different instruments were highly correlated with each other (although subject-specific instruments were distinct from the general-pedagogical instruments). • Reliability requires certified observers and more than one observer per teacher (because rater judgments differ). • Principals rate their own teachers higher than other observers do, but their rankings are similar. • When teachers select their own videos, scores are higher, but ranking remains the same.

Four Steps Four Steps to High-Quality Classroom Observations

Step 1: Define ExpectationsFramework for Teaching (Danielson) Four Steps Actual scores for 7500 lessons.

Step 2: Ensure Accuracy of Observers Four Steps

Step 3: Monitor Reliability Four Steps

More than 1 observer One more observer +.16 One more lesson +.07

Step 4: Verify Alignment with Outcomes Teachers with Higher Observation Scores Had Students Who Learned More Four Steps

What do students say? Measure #3

Students Distinguish Between TeachersPercent of Students by Classroom Agreeing

What we learned from MET: Student surveys: • Surveys are a low-cost way to cover untested grades and subjects. • Student surveys are related to teacher value-added (.15-.25). • Student surveys are the most reliable measures we tested.

The “Dynamic Trio”: Classroom observations, student feedback and student achievement gains. Multiple Measures

Three Criteria: Dynamic Trio Predictive power: Which measure could most accurately identify teachers likely to have large gains when working with another group of students? Reliability:Which measures were most stable from section to section or year to year for a given teacher?Potential for Diagnostic Insight: Which have the potential to help a teacher see areas of practice needing improvement? (We’ve not tested this yet.)

Measures have different strengths …and weaknesses Dynamic Trio H M L M H M M/H L H

The Reliability and Predictive Power of Measures of Teaching: .25 VA alone Combined (Criterion Weights) .2 Combined (Equal Weights) .15 Student survey alone Difference in Math VA (Top 25% vs. Bottom 25%) .1 Observation alone (FFT) .05 0 .1 .2 .3 .4 .5 .6 .7 Reliability Note: Table 16 of the research report. Reliability based on one course section, 2 observations. Combining Measures Improved Reliability as well as Predictive Power Dynamic Trio Note: For the equally weighted combination, we assigned a weight of .33 to each of the three measures. The criterion weights were chosen to maximize ability to predict a teacher’s value-added with other students. The next MET report will explore different weighting schemes.

What we learned from MET: Combining measures: • The teachers identified as more effective caused students to learn more following random assignment. • Combining value added with student surveys and classroom observations produces two benefits: • Increased reliability • Increased correlation with other outcomes such as value-added on supplemental assessments and happiness in class • Weighting value-added below .33, though, lowered correlation with other outcomes and lowered reliability.

Can the measures be used for “high stakes”? • High-stakes decisions are being made now, with little or no data. • No information is perfect, but better information should lead to better decisions and fewer mistakes.

No information is perfect. But better information → better decisions How do these compare to existing measures? • Masters Degrees • Years of Experience • Classroom Observations Alone

Teacher Evaluation and Performance Measurement

Teacher Evaluation and Performance Measurement

Presentation Transcript

Progress and Performance Measurement and Evaluation

Performance Measurement (Evaluation Systems)

Virginia Teacher Performance Evaluation System

Virginia Teacher Performance Evaluation System

Teacher Performance Evaluation

Teacher Effectiveness Performance Evaluation System

Measurement and Evaluation of ENUM Server Performance

Teacher Performance Evaluation System

Teacher Performance Evaluation System

Teacher Performance Evaluation System

Ohio Teacher Evaluation System: Assessment of Teacher Performance

Measurement and Evaluation of Human Performance

Teacher and Leader Effectiveness Performance Evaluation System

Teacher and Leader Effectiveness Performance Evaluation System

Teacher Performance Evaluation Non-Tenured

Performance Based Teacher Evaluation

Teacher Performance Evaluation System Overview

Teacher Evaluation and Pay for Performance

Performance Measurement and Evaluation