Evaluating Teacher Effectiveness

Evaluating Teacher Effectiveness Laura Goe, Ph.D. Presentation to the Hawaii Department of Education July 20, 2011  Honolulu, HI

Today’s presentation available online • To download a copy of this presentation or look at it on your iPad, smart phone or laptop, go to www.lauragoe.com • Go to Publications and Presentations page. • Today’s presentation is at the bottom of the page

Laura Goe, Ph.D. • Former teacher in rural & urban schools • Special education (7th & 8th grade, Tunica, MS) • Language arts (7th grade, Memphis, TN) • Graduate of UC Berkeley’s Policy, Organizations, Measurement & Evaluation doctoral program • Principal Investigator for the National Comprehensive Center for Teacher Quality • Research Scientist in the Performance Research Group at ETS

The National Comprehensive Center for Teacher Quality • A federally-funded partnership whose mission is to help states carry out the teacher quality mandates of ESEA • Vanderbilt University • Learning Point Associates, an affiliate of American Institutes for Research • Educational Testing Service

The goal of teacher evaluation

Trends in teacher evaluation • Policy is way ahead of the research in teacher evaluation measures and models • Though we don’t yet know which model and combination of measures will identify effective teachers, many states and districts are compelled to move forward at a rapid pace • Inclusion of student achievement growth data represents a huge “culture shift” in evaluation • Communication and teacher/administrator participation and buy-in are crucial to ensure change • The implementation challenges are enormous • Few models exist for states and districts to adopt or adapt • Many districts have limited capacity to implement comprehensive systems, and states have limited resources to help them

Research behind the push for new evaluation measures and systems • Value-added research shows that teachers vary greatly in their contributions to student achievement (Rivkin, Hanushek, & Kain, 2005). • The Widget Effect report (Weisberg et al., 2009) “…examines our pervasive and longstanding failure to recognize and respond to variations in the effectiveness of our teachers.” (from Executive Summary)

Definitions in the research & policy worlds • Anderson (1991) stated that “… an effective teacher is one who quite consistently achieves goals which either directly or indirectly focus on the learning of • their students” (p. 18).

Goe, Bell, & Little (2008) definition of teacher effectiveness • Have high expectations for all students and help students learn, as measured by value-added or alternative measures. • Contribute to positive academic, attitudinal, and social outcomes for students, such as regular attendance, on-time promotion to the next grade, on-time graduation, self-efficacy, and cooperative behavior. • Use diverse resources to plan and structure engaging learning opportunities; monitor student progress formatively, adapting instruction as needed; and evaluate learning using multiple sources of evidence. • Contribute to the development of classrooms and schools that value diversity and civic-mindedness. • Collaborate with other teachers, administrators, parents, and education professionals to ensure student success, particularly the success of students with special needs and those at high risk for failure.

Race to the Top definition of effective & highly effective teacher Effective teacher: students achieve acceptable rates (e.g., at least one grade level in an academic year) of student growth (as defined in this notice). States, LEAs, or schools must include multiple measures, provided that teacher effectiveness is evaluated, in significant part, by student growth (as defined in this notice). Supplemental measures may include, for example, multiple observation-based assessments of teacher performance. (pg 7) Highly effective teacher students achieve high rates (e.g., one and one-half grade levels in an academic year) of student growth (as defined in this notice).

Measures and models: Definitions • Measures are the instruments, assessments, protocols, rubrics, and tools that are used in determining teacher effectiveness • Models are the state or district systems of teacher evaluation including all of the inputs and decision points (measures, instruments, processes, training, and scoring, etc.) that result in determinations about individual teachers’ effectiveness

Multiple measures of teacher effectiveness • Evidence of growth in student learning and competency • Standardized tests, pre/post tests in untested subjects • Student performance (art, music, etc.) • Curriculum-based tests given in a standardized manner • Classroom-based tests such as DIBELS • Evidence of instructional quality • Classroom observations • Lesson plans, assignments, and student work • Student surveys such as Harvard’s Tripod • Evidence binder (next generation of portfolio) • Evidence of professional responsibility • Administrator/supervisor reports, parent surveys • Teacher reflection and self-reports, records of contributions

Measures that help teachers grow • Measures that motivate teachers to examine their own practice against specific standards • Measures that allow teachers to participate in or co-construct the evaluation (such as “evidence binders”) • Measures that give teachers opportunities to discuss the results with evaluators, administrators, colleagues, teacher learning communities, mentors, coaches, etc. • Measures that are directly and explicitly aligned with teaching standards • Measures that are aligned with professional development offerings • Measures which include protocols and processes that teachers can examine and comprehend

Considerations for choosing and implementing measures • Consider whether human resources and capacity are sufficient to ensure fidelity of implementation • Conserve resources by encouraging districts to join forces with other districts or regional groups • Establish a plan to evaluate measures to determine if they can effectively differentiate among teacher performance • Examine correlations among measures • Evaluate processes and data each year and make needed adjustments

Validity of classroom observations is highly dependent on training • Even with a terrific observation instrument, the results are meaningless if observers are not trained to agree on evidence and scoring • A teacher should get the same score no matter who observes him • This requires that all observers be trained on the instruments and processes • Occasional “calibrating” should be done; more often if there are discrepancies or new observers • Who the evaluators are matters less than that they are adequate trained and calibrated • Teachers should also be trained on the observation forms and processes to improve validity of results

Most popular growth models: Value-added and Colorado Growth Model • EVAAS uses prior test scores to predict the next score for a student • Teachers’ value-added is the difference between actual and predicted scores for a set of students • http://www.sas.com/govedu/edu/k12/evaas/index.html • Colorado Growth model • Betebenner 2008: Focus on “growth to proficiency” • Measures students against “academic peers” • www.nciea.org

What nearly all state and district models have in common • Value-added or Colorado Growth Model will be used for those teachers in tested grades and subjects (4-8 ELA & Math in most states) • States want to increase the number of tested subjects and grades so that more teachers can be evaluated with growth models • States are generally at a loss when it comes to measuring teachers’ contribution to student growth in non-tested subjects and grades

Measuring teachers’ contributions to student learning growth: A summary of current models

SLOs + “Ask a Teacher” (Hybrid model) • Concerns about SLOs are 1) rigor, 2) comparability, and 3) administrator burden • A “rigor rubric” helps with first concern • Combining SLOs with aspects of the “Ask A Teacher” model will help with all 3 concerns • Teachers discuss and agree to use particular assessments and measures of student learning growth, ensuring great rigor and comparability • Teachers work together on aspects of scoring which improves validity and comparability and lightens the administrator burden

A “Frontier Model”

Frontier Model: Assessing student growth for teacher evaluation • Mobile student populations • Short-cycle assessments will work better for students who are highly mobile • High student absenteeism • Develop specific guidelines for how many total days, consecutive days, etc. a student must be on a teacher’s role to “count” for that teachers’ score on contribution to student learning • Students who need support • Evaluate teachers’ efforts to address students’ physical, social, and emotional needs • Evaluate contacts and relationships with parents

Frontier Model: Teacher collaboration • Teachers don’t need to assess in isolation • Collaborate/share great lesson plans, materials, assessments, etc. across classrooms, schools, and districts (by content area, grades taught) • Work together to grade projects, essays, etc. by using technology when meeting in person is not feasible • Develop consistency in scoring, ensuring that results from student assessments are more valid • Webex and other web-based programs allow you to share files, videos, assessments, and rubrics

Frontier Model: Gaining parent support for teaching and learning • Support teachers in building relationships with community and parents • Especially important for teacher retention • Connect them with a community guide/mentor • Engage community in celebrating student success • Share student work throughout the year in community exhibits, performances, etc. • Ask parents to assist in and contribute their talents and skills to these events

Frontier model: District/state support • Invest in technology and infrastructure that will enable teachers to connect with each other and with internet-based resources • Form regional consortiums to share resources including personnel • Isolated rural schools may not be able to afford their own data analysts, curriculum specialists, etc. • Need a model of sharing personnel across regions

Considerations • Consider whether human resources and capacity are sufficient to ensure fidelity of implementation • Poor implementation threatens validity of results • Establish a plan to evaluate measures to determine if they can effectively differentiate among teacher performance • Need to identify potential “widget effects” in measures • If measure is not differentiating among teachers, may be faulty training or poor implementation, not the measure itself • Examine correlations among results from measures • Evaluate processes and data each year and make needed adjustments

Final thoughts • The limitations: • There are no perfect measures • There are no perfect models • Changing the culture of evaluation is hard work • The opportunities: • Evidence can be used to trigger support for struggling teachers and acknowledge effective ones • Multiple sources of evidence can provide powerful information to improve teaching and learning • Evidence is more valid than “judgment” and provides better information for teachers to improve practice

Evaluation System Models Austin (Student learning objectives with pay-for-performance, group and individual SLOs assess with comprehensive rubric) http://archive.austinisd.org/inside/initiatives/compensation/slos.phtmlDelaware Model (Teacher participation in identifying grade/subject measures which then must be approved by state) http://www.doe.k12.de.us/csa/dpasii/student_growth/default.shtml Georgia CLASS Keys (Comprehensive rubric, includes student achievement—see last few pages) System: http://www.gadoe.org/tss_teacher.aspx Rubric: http://www.gadoe.org/DMGetDocument.aspx/CK%20Standards%2010-18-2010.pdf?p=6CC6799F8C1371F6B59CF81E4ECD54E63F615CF1D9441A92E28BFA2A0AB27E3E&Type=D Hillsborough, Florida (Creating assessments/tests for all subjects) http://communication.sdhc.k12.fl.us/empoweringteachers/

Evaluation System Models (cont’d) New Haven, CT (SLO model with strong teacher development component and matrix scoring; see Teacher Evaluation & Development System) http://www.nhps.net/scc/index Rhode Island DOE Model (Student learning objectives combined with teacher observations and professionalism) http://www.ride.ri.gov/assessment/DOCS/Asst.Sups_CurriculumDir.Network/Assnt_Sup_August_24_rev.ppt Teacher Advancement Program (TAP) (Value-added for tested grades only, no info on other subjects/grades, multiple observations for all teachers) http://www.tapsystem.org/ Washington DC IMPACT Guidebooks (Variation in how groups of teachers are measured—50% standardized tests for some groups, 10% other assessments for non-tested subjects and grades) http://www.dc.gov/DCPS/In+the+Classroom/Ensuring+Teacher+Success/IMPACT+(Performance+Assessment)/IMPACT+Guidebooks

References • Betebenner, D. W. (2008). A primer on student growth percentiles. Dover, NH: National Center for the Improvement of Educational Assessment (NCIEA). • http://www.cde.state.co.us/cdedocs/Research/PDF/Aprimeronstudentgrowthpercentiles.pdf • Braun, H., Chudowsky, N., & Koenig, J. A. (2010). Getting value out of value-added: Report of a workshop. Washington, DC: National Academies Press. http://www.nap.edu/catalog.php?record_id=12820 Finn, Chester. (July 12, 2010). Blog response to topic “Defining Effective Teachers.” National Journal Expert Blogs: Education. http://education.nationaljournal.com/2010/07/defining-effective-teachers.php • Glazerman, S., Goldhaber, D., Loeb, S., Raudenbush, S., Staiger, D. O., & Whitehurst, G. J. (2011). Passing muster: Evaluating evaluation systems. Washington, DC: Brown Center on Education Policy at Brookings. • http://www.brookings.edu/reports/2011/0426_evaluating_teachers.aspx# • Glazerman, S., Goldhaber, D., Loeb, S., Raudenbush, S., Staiger, D. O., & Whitehurst, G. J. (2010). Evaluating teachers: The important role of value-added. Washington, DC: Brown Center on Education Policy at Brookings. • http://www.brookings.edu/reports/2010/1117_evaluating_teachers.aspx

References (continued) • Goe, L. (2007). The link between teacher quality and student outcomes: A research synthesis. Washington, DC: National Comprehensive Center for Teacher Quality. • http://www.tqsource.org/publications/LinkBetweenTQandStudentOutcomes.pdf • Goe, L., Bell, C., & Little, O. (2008). Approaches to evaluating teacher effectiveness: A research synthesis. Washington, DC: National Comprehensive Center for Teacher Quality. • http://www.tqsource.org/publications/EvaluatingTeachEffectiveness.pdf • Hassel, B. (Oct 30, 2009). How should states define teacher effectiveness? Presentation at the Center for American Progress, Washington, DC. • http://www.publicimpact.com/component/content/article/70-evaluate-teacher-leader-performance/210-how-should-states-define-teacher-effectiveness Howes, C., Burchinal, M., Pianta, R., Bryant, D., Early, D., Clifford, R., et al. (2008). Ready to learn? Children's pre-academic achievement in pre-kindergarten programs. Early Childhood Research Quarterly, 23(1), 27-50. http://www.eric.ed.gov/ERICWebPortal/detail?accno=EJ783140 Kane, T. J., Taylor, E. S., Tyler, J. H., & Wooten, A. L. (2010). Identifying effective classroom practices using student achievement data. Cambridge, MA: National Bureau of Economic Research. http://www.nber.org/papers/w15803

References (continued) • Koedel, C., & Betts, J. R. (2009). Does student sorting invalidate value-added models of teacher effectiveness? An extended analysis of the Rothstein critique. Cambridge, MA: National Bureau of Economic Research. • http://economics.missouri.edu/working-papers/2009/WP0902_koedel.pdf McCaffrey, D., Sass, T. R., Lockwood, J. R., & Mihaly, K. (2009). The intertemporal stability of teacher effect estimates. Education Finance and Policy, 4(4), 572-606. • http://www.mitpressjournals.org/doi/abs/10.1162/edfp.2009.4.4.572 • Pianta, R. C., Belsky, J., Houts, R., & Morrison, F. (2007). Opportunities to learn in America’s elementary classrooms. [Education Forum]. Science, 315, 1795-1796. • http://www.sciencemag.org/cgi/content/summary/315/5820/1795 Prince, C. D., Schuermann, P. J., Guthrie, J. W., Witham, P. J., Milanowski, A. T., & Thorn, C. A. (2006). The other 69 percent: Fairly rewarding the performance of teachers of non-tested subjects and grades. Washington, DC: U.S. Department of Education, Office of Elementary and Secondary Education. http://www.cecr.ed.gov/guides/other69Percent.pdf Race to the Top Application http://www2.ed.gov/programs/racetothetop/resources.html Rivkin, S. G., Hanushek, E. A., & Kain, J. F. (2005). Teachers, schools, and academic achievement. Econometrica, 73(2), 417 - 458. http://www.econ.ucsb.edu/~jon/Econ230C/HanushekRivkin.pdf

References (continued) • Sartain, L., Stoelinga, S. R., & Krone, E. (2010). Rethinking teacher evaluation: Findings from the first year of the Excellence in Teacher Project in Chicago public schools. Chicago, IL: Consortium on Chicago Public Schools Research at the University of Chicago. • http://ccsr.uchicago.edu/publications/Teacher%20Eval%20Final.pdf • Schochet, P. Z., & Chiang, H. S. (2010). Error rates in measuring teacher and school performance based on student test score gains. Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. • http://ies.ed.gov/ncee/pubs/20104004/pdf/20104004.pdf Redding, S., Langdon, J., Meyer, J., & Sheley, P. (2004). The effects of comprehensive parent engagement on student learning outcomes. Paper presented at the American Educational Research Association http://www.adi.org/solidfoundation/resources/Harvard.pdf Weisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The widget effect: Our national failure to acknowledge and act on differences in teacher effectiveness. Brooklyn, NY: The New Teacher Project. http://widgeteffect.org/downloads/TheWidgetEffect.pdf

References (continued) • Yoon, K. S., Duncan, T., Lee, S. W.-Y., Scarloss, B., & Shapley, K. L. (2007). Reviewing the evidence on how teacher professional development affects student achievement (No. REL 2007-No. 033). Washington, D.C.: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Southwest. • http://ies.ed.gov/ncee/edlabs/regions/southwest/pdf/REL_2007033.pdf

Questions?

Laura Goe, Ph.D. 609-734-1076 lgoe@ets.org National Comprehensive Center for Teacher Quality 1100 17th Street NW, Suite 500Washington, DC 20036-4632877-322-8700 > www.tqsource.org

Evaluating Teacher Effectiveness