1 / 61

Teacher Evaluation and Performance Measurement

Doug Staiger , Dartmouth College. Teacher Evaluation and Performance Measurement. Not this. Satisfactory (or equivalent) Unsatisfactory (or equivalent) .

dena
Télécharger la présentation

Teacher Evaluation and Performance Measurement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Doug Staiger, Dartmouth College Teacher Evaluationand PerformanceMeasurement

  2. Not this. Satisfactory (or equivalent) Unsatisfactory (or equivalent) Weisberg, D., Sexton, S., Mulhern, J. & Keeling, D. (2009) The Widget Effect: Our National Failure to Acknowledge and Act on Differences in Teacher Effectiveness. New York: The New Teacher Project. 2

  3. Not this.

  4. Transformative Feedback

  5. Recent Work on Teacher Evaluation • Efforts to identify effective teaching using achievement gains • Work with Tom Kane & others in LAUSD, NYC, Charlotte…www.dartmouth.edu/~dstaiger • Efforts to better identify effective teaching • Measures of Effective Teaching (MET) Project (Bill & Melinda Gates Foundation)www.metproject.org • National Center for Teacher Effectiveness (NCTE)(US Department of Education)www.gse.harvard.edu/ncte 5

  6. The Measures of Effective Teaching Project Participating Teachers • Two school years: 2009-10 and 2010-11 • Grades 4-8: ELA and Math • High School: ELA I, Algebra I and Biology

  7. The MET data is unique … • in the variety of indicators tested, 5 instruments for classroom observations (use FFT here) Student surveys (Tripod Survey) Value-added on state tests • in itsscale, 3,000 teachers 22,500 observation scores (7,500 lesson videos x 3 scores) 900 + trained observers 44,500 students completing surveys and supplemental assessments in year 1 3,120 additional observations by principals/peer observers in Hillsborough County, FL • and in the variety of student outcomes studied. Gains onstate math and ELA tests Gains on supplemental tests (BAM & SAT9 OE) Student-reported outcomes (effort and enjoyment in class, grit)

  8. What is “Effective” Teaching? Can be an inputs based concept Observable actions or characteristics Can be outcomes based concept Measured by student success Ultimately, care about impact on student outcomes Current focus on standardized exams Interest in other outcomes (college, non-cognitive) 8

  9. Multiple Measures of Teaching Effectiveness

  10. Student Achievement Gains (“Value Added”) Measure #1

  11. Basics of Value Added Analysis Teacher value added compares actual student achievement at the end of the year to an expectation for each student Difference between actual and expected achievement, averaged over all of teacher’s students Expected achievement is typical achievement for other students who looked similar at start of year Same prior-year test scores Same demographics, program participation Same characteristics of peers in classroom or school Various flavors, all work similarly Student growth percentiles Average change in score or percentile Based on prior year test or Fall pre-test 11

  12. There are Large Differences in Teacher Effects on Student Achievement Gains Most evidence from “value added” analysis, but similar findings from randomized experiments Huge literature about “teacher effects” on achievement Large persistent variation across teachers Difficult to predict at hire Partially predictable after hire Improve only in the first few years of teaching Not related to most determinants of pay Certification, degrees, experience beyond first few years

  13. Large Variation in Value Added of LAUSD Teachers is Not Related to Teacher Certification

  14. Variation in Value Added of LAUSD Teachersis Related to Prior Performance

  15. Why Not Just Hire Good Teachers? • Wise selection is the best means of improving the school system, and the greatest lack of economy exists wherever teachers have been poorly chosen. • Frank Pierrepont Graves, NYS Commissioner, 1932 • Unfortunately, easier said than done • Decades of work on type of certification, graduate education, exam scores, GPA, college selectivity, TFA • (Very) small, positive effects on student outcomes

  16. Large Variation in Value Added of NYC Teachers is Not Related to Recruitment Channel

  17. Of Course, Teacher Impact on State Test Score is Not All We Care About Depends on design & content of test Test scores are proximate measures But recent evidence suggests they capture long-run impact on student learning and other outcomes Test scores are only one dimension of performance Non-cognitive skills (grit, dependability, …)

  18. Value Added is Controversial “We need to find a way to measure classroom success and teacher effectiveness. Pretending that student outcomes are not part of the equation is like pretending that professional basketball has nothing to do with the score.” (Arne Duncan 2009) “There is no way that any of this current data could actually, fairly, honestly or with any integrity be used to isolate the contributions of an individual teacher.” (Randi Weingarten 2008) 18

  19. What we learned from MET: Value-added measures • Identified teachers who caused students to learn more on state tests following random assignment. • Same teacher’s also caused students to learn more on supplemental assessmentsand enjoy class more. • Low year-to-year correlations in value-added (and other performance measures) understate year-to-career correlations.

  20. Classroom Observations Measure #2

  21. Classroom Observation Using Digital Video 24

  22. What you can expect from us: Helping Districts Test Their Own New Classroom Observations Access to Validation Engine:

  23. Two Cross-Subject Observation Instruments *not: “flexibility & responsiveness” & “organization of physical space”

  24. FFT competencies scored: • CLASSROOM ENVIRONMENT • Creating an environment of respect and rapport • Establishing a culture of learning • Managing classroom procedures • Managing Student Behavior • INSTRUCTION • Communicating with Students • Using Questioning and Discussion Techniques • Engaging Students in Learning • Using Assessments in Instruction

  25. Math Observation Instruments

  26. ELA Observation Instrument

  27. What we learned from MET: Classroom observations: • Observation scores were correlated with a teacher’s value-added (.15-.27). • Different instruments were highly correlated with each other (although subject-specific instruments were distinct from the general-pedagogical instruments). • Reliability requires certified observers and more than one observer per teacher (because rater judgments differ). • Principals rate their own teachers higher than other observers do, but their rankings are similar. • When teachers select their own videos, scores are higher, but ranking remains the same.

  28. Four Steps Four Steps to High-Quality Classroom Observations

  29. Step 1: Define ExpectationsFramework for Teaching (Danielson) Four Steps Actual scores for 7500 lessons.

  30. Step 2: Ensure Accuracy of Observers Four Steps

  31. Step 3: Monitor Reliability Four Steps

  32. More than 1 observer One more observer +.16 One more lesson +.07

  33. Step 4: Verify Alignment with Outcomes Teachers with Higher Observation Scores Had Students Who Learned More Four Steps

  34. What do students say? Measure #3

  35. Students Distinguish Between TeachersPercent of Students by Classroom Agreeing

  36. Students Distinguish Between TeachersPercent of Students by Classroom Agreeing

  37. Students Distinguish Between TeachersPercent of Students by Classroom Agreeing

  38. Students Distinguish Between TeachersPercent of Students by Classroom Agreeing

  39. Students Distinguish Between TeachersPercent of Students by Classroom Agreeing

  40. What we learned from MET: Student surveys: • Surveys are a low-cost way to cover untested grades and subjects. • Student surveys are related to teacher value-added (.15-.25). • Student surveys are the most reliable measures we tested.

  41. The “Dynamic Trio”: Classroom observations, student feedback and student achievement gains. Multiple Measures

  42. Three Criteria: Dynamic Trio Predictive power: Which measure could most accurately identify teachers likely to have large gains when working with another group of students? Reliability:Which measures were most stable from section to section or year to year for a given teacher?Potential for Diagnostic Insight: Which have the potential to help a teacher see areas of practice needing improvement? (We’ve not tested this yet.)

  43. Measures have different strengths …and weaknesses Dynamic Trio H M L M H M M/H L H

  44. The Reliability and Predictive Power of Measures of Teaching: .25 VA alone Combined (Criterion Weights) .2 Combined (Equal Weights) .15 Student survey alone Difference in Math VA (Top 25% vs. Bottom 25%) .1 Observation alone (FFT) .05 0 .1 .2 .3 .4 .5 .6 .7 Reliability Note: Table 16 of the research report. Reliability based on one course section, 2 observations. Combining Measures Improved Reliability as well as Predictive Power Dynamic Trio Note: For the equally weighted combination, we assigned a weight of .33 to each of the three measures. The criterion weights were chosen to maximize ability to predict a teacher’s value-added with other students. The next MET report will explore different weighting schemes.

  45. What we learned from MET: Combining measures: • The teachers identified as more effective caused students to learn more following random assignment. • Combining value added with student surveys and classroom observations produces two benefits: • Increased reliability • Increased correlation with other outcomes such as value-added on supplemental assessments and happiness in class • Weighting value-added below .33, though, lowered correlation with other outcomes and lowered reliability.

  46. Can the measures be used for “high stakes”? • High-stakes decisions are being made now, with little or no data. • No information is perfect, but better information should lead to better decisions and fewer mistakes.

  47. No information is perfect. But better information → better decisions How do these compare to existing measures? • Masters Degrees • Years of Experience • Classroom Observations Alone

More Related