Designing a Statewide System for Measuring Teacher and Leader Effectiveness

Designing a Statewide System for Measuring Teacher and Leader Effectiveness New Hampshire Task Force on Effective Teaching: Phase II Scott Marion Center for Assessment September 20, 2011

Overview of presentation… Some background Outline just some of the key decision that go into creating educator evaluation systems Focus on the growth model being proposed as one source of evidence Our purpose today is to highlight some of the focused work we will be doing throughout this year I’ll be asking a lot more questions than providing answers, but we will need to answer these questions going forward… Center for Assessment. NH EE Task Force. September 20, 2011

The Center for Assessment Center for Assessment. NH EE Task Force. September 20, 2011 • National non-profit consulting firm based in Dover since 1998 • Focus on improving assessment (large and small scale) and accountability systems • 11 professionals, all with doctoral degrees in psychometrics, measurement, evaluation, and/or curriculum • All worked at high levels in state DOEs, districts, and/or testing companies • Currently contracting with 30 states, several large districts, and numerous federal projects including all five assessment consortia • Working on educator evaluation is many of these entities including: CO, RI, GA, VA, UT, WY, and NYC

Introduction • New Hampshire, like an increasing number of states, intends to revise its teacher and leader evaluation practices • Educator effectiveness will be determined “in part by student achievement” • This enterprise holds great promise, but also presents real challenges • We are fortunate to be able to build off of the good work of the Phase I task force… Center for Assessment. NH EE Task Force. September 20, 2011

Rationale Why the interest in new forms of teacher evaluation? Nobody doubts the critical influence of teacher quality on student achievement Current (traditional) evaluation systems rarely identify either highly effective or ineffective teachers The following slide depicts what might be a fairly typical policy maker theory of action for these reformed systems Center for Assessment. NH EE Task Force. September 20, 2011

Hiring Measures of Educator Effectiveness and Evaluation Processes Professional Development Student Outcomes Improve Placement Compensation Dismissal Career Ladder A Simplified Theory of Action for Reformed Educator Evaluation Systems Center for Assessment. NH EE Task Force. September 20, 2011

A Theory of Action… • Grounds our design • Clarifies the assumptions, purposes, and goals of the system • Specifies the various indicators and mechanisms by which the system will fulfill its purposes (and minimize unintended negative consequences) • Serves as a framework for evaluation • The ToA on the previous slide was oversimplified and somewhat naïve. We’ll be working with more complex and honest ToAs as we do our work. Center for Assessment. NH EE Task Force. September 20, 2011

Basic Structure of a Theory of Action Assumptions or Antecedents Proximal Indicators Distal Indicators (Intended Outcomes) Intermediate Indicators Activities and Mechanisms Activities and Mechanisms Consequences Accountability Select Committee Meeting. August 17, 2011

Current reform initiatives • Almost all reformed educator evaluation being proposed by states and districts include two major components: • Student growth/achievement • Data from “tested” and “non-tested” grades and subjects • Indicators of educator knowledge & skills • Often multiple standards that characterize the teaching profession. From Phase I, these include: • Learner and Learning • Content knowledge • Instructional Practice • Professional Responsibility Center for Assessment. NH EE Task Force. September 20, 2011

Sounds simple right…. Determinations of Educator Effectiveness Student Performance Educator Knowledge & Skills Center for Assessment. NH EE Task Force. September 20, 2011

Not so simple….. Determinations of Educator Effectiveness Effective Teaching Aggregate Student Performance Information Instructional Practice, etc.. Learner & Learning Student Growth Measures from NECAP-tested grades Aggregations of student performance from non-NECAP grades & content areas Indicator #1 Indicator #2 Indicator…n Performance Level…n Performance Level #2 Performance Level #1 Center for Assessment. NH EE Task Force. September 20, 2011

Not so simple…. As complex as this graphic is, it is still an abbreviation of the full system Each box AND arrow represents at least one and often many very challenging decisions This graphic represents certain assumptions that should be challenged or at least examined Center for Assessment. NH EE Task Force. September 20, 2011

Decisions and challenges: Knowledge & Skills • What are the indicators that operationalize the knowledge & skills defined in Phase I? • What are the measurement approaches for collecting information on these indicators (hint: classroom observations can’t do it all)? • What are the various levels of performance on each of the indicators? • How should indicator ratings be combined to arrive at a judgment about each aspect of knowledge & skills? • Should these ratings be combined across the various components of knowledge & skills to arrive at a single “knowledge and skills” determination? If so, how? • Weighting schemes • Scales • Validity • Reliability Center for Assessment. NH EE Task Force. September 20, 2011

Decisions and challenges: Student Performance • What indicators of student growth should be used for NECAP grades and content areas? • What performance (growth) indicators should be used for non-NECAP grades and content areas? • This is a huge issue! • Should state-level measures of student growth be combined with local measures of student performance for each educator determination? If so, how? Center for Assessment. NH EE Task Force. September 20, 2011

Student Performance: Analyzing Growth • What analytic approach (model) will be used? • What are the technical and policy issues that need to be considered in choosing a model? • What is the standard for ‘good enough’ growth? • Should growth expectations be “conditioned” on factors other than prior performance such as poverty, etc.? • What information should be reported to whom and at what level? Center for Assessment. NH EE Task Force. September 20, 2011

Major Decisions: Combining Measures • How should we arrive at an overall judgment of educator effectiveness? • Weighting of student performance and knowledge & skills • What are the different types of information that should be employed when evaluating principals compared with teachers? • We know the specific indicators and even standards will differ. Center for Assessment. NH EE Task Force. September 20, 2011

Major Decisions: Attribution • Attribution: linking educator behavior to student outcomes • Assigning accountability • Data system requirements • Dealing with student mobility • Multiple educators contribute to instruction Center for Assessment. NH EE Task Force. September 20, 2011

Major Decisions: Consequences & Support • How will results be communicated? • What sanctions, rewards, and/or consequences are appropriate to advance prioritized outcomes? • Strategies to use information to support schools/ teachers/ students? • Is there capacity in the state (in the districts) to improve educator quality in NH? • What resources will be required for this improvement to occur? • Where will they come from? Center for Assessment. NH EE Task Force. September 20, 2011

Negative Consequences • As we consider the implementation plans of NH’s new educator evaluation system, we must be mindful that the likelihood of getting this wrong (i.e., leading to unintended negative consequences) are at least as high as the chances of getting it right (i.e., improving teacher quality and student learning) • Unintended consequences could include: • Narrowing curriculum • Competition vs. Cooperation • Assignment of students or teachers to selected classes for reasons unrelated to educational benefit • Educator transition • Educator attrition Center for Assessment. NH EE Task Force. September 20, 2011

Monitoring and Evaluation • What types of formative evaluation approaches need to be put in place to monitor implementation and consequences? • Evaluate claims in theory of action • Evaluate impact • Establish criteria to determine if results are reasonable • Develop methods and standards to assess the precision and stability of results • Does the system meet important utility criteria? Center for Assessment. NH EE Task Force. September 20, 2011

Coherence For both the accountability and teaching and learning systems to work well, the various components of the full assessment system need to be coherent and send a consistent message… In other words, if we want students to learn things deeply and engage in rich formative activities, then focusing the accountability system on low level outcomes only will lead to incoherence and possible corruption. Accountability Select Committee Meeting. August 17, 2011

Coherence Center for Assessment. NH EE Task Force. September 20, 2011 • Also involves ensuring that the school accountability and educator accountability systems are sending similar messages to schools and stakeholders • The Commissioner’s Task Force designing the adequacy performance-based accountability system has already selected Student Growth Percentiles as the growth measure • It would make sense to use the same system for educator evaluation • Only for tested subjects and grades

Student Growth Percentiles Student Growth Percentiles (SGP) is a regression based measure of growth that works by evaluating current achievement based on prior achievement and describing performance relative to other students with the “same” prior achievement histories. This provides a familiar basis to interpret performance – the percentile, which indicates the probability of that outcome given the student’s starting point. This can be used to gauge whether or not the student’s growth was atypically high or low Advisory Committee Meeting. August 18, 2011

The Binominal Distribution How does it work? Think of a group of students, where each student has two test scores – one for 2009 and one for 2010. We could show the distribution of these scores at the same time as pictured. Advisory Committee Meeting. August 18, 2011

Slicing the distribution at the Year 1 score We could ‘slice’ through the picture to show the 2010 distribution for just one 2009 score. This is called a conditional distribution. The red shaded curve shows the conditional distribution in 2010 for all students who scored 600 in 2009. Advisory Committee Meeting. August 18, 2011

Comparing Year 2 scores for all who scored 600 in Year 1 Assume we are interested in just one score, 650, in 2010. We could ask, what percentage of students who scored 600 in 2009 scored at or below a 650 in 2010? In this case, that turns out to be 75%. In other words, a score of 650 is at the 75th percentile. Advisory Committee Meeting. August 18, 2011

It all starts from the individual student Advisory Committee Meeting. August 18, 2011

And moves to various types of aggregation Advisory Committee Meeting. August 18, 2011

Student Growth Percentiles • Answers: To what degree is performance higher or lower than expectations, based on students with similar academic history? • Advantages: • Provides a familiar basis to interpret performance – the percentile • Provides a definition of ‘typical growth’ • Expectations are adjusted for students of various abilities • Disadvantages: • More complex to implement than simple gain scores (but likely more straightforward than some VAM approaches). • No ‘built-in’ relationship to status, but growth targets can account for this Advisory Committee Meeting. August 18, 2011

Student Growth Percentiles Center for Assessment. NH EE Task Force. September 20, 2011 • SGPs are really limited to statewide analyses of standardized test data • What about interim/benchmark assessments? • If administered statewide, it is possible to use SGPs to evaluate changes in student performance • But do we want to… • What is the major purpose of the interim/benchmark tests? • If the answer is “improving teaching and learning” or something similar, we run a tremendous risk of corrupting those purposes by incorporating these results into accountability systems

Campbell’s Law • "The more any quantitative social indicator is used for social decision-making [e.g., accountability], the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.“ • Campbell, Donald T., Assessing the Impact of Planned Social Change The Public Affairs Center, Dartmouth College, Hanover New Hampshire, USA. December, 1976. Accountability Select Committee Meeting. August 17, 2011

Non-tested subjects & grades Center for Assessment. NH EE Task Force. September 20, 2011 • We have not discussed the other 65-75% of educators in the non-tested subjects and grades • This is one of the most challenging aspects of reformed approaches to educator evaluation • This does not mean we don’t have any ideas and proposals • It does mean that there are not any silver bullets in this area now! • We will spend considerable time at upcoming meetings talking about this topic

Major steps along our path: measurement & evaluation • How will we operationalize and measure: • Learner and Learning • Content knowledge • Instructional Practice • Professional Responsibility • Who will conduct these measurements? • How shall we attribute responsibility? • How will we measure educators’ contribution to student learning in “tested” and “non-tested” grades and subjects? • How will we weight and combine the various indicators into an overall judgment? Center for Assessment. NH EE Task Force. September 20, 2011

Major steps along our path: policy & practice Center for Assessment. NH EE Task Force. September 20, 2011 • What will be the relative roles of the state and schools in teacher evaluation? • Will all • Will the state monitor local evaluation practices? If so, how? • What policies and policy changes are necessary to support a reformed system? • What kind of training is necessary to implement and learn from this system?

Next steps… Center for Assessment. NH EE Task Force. September 20, 2011 How should we plan our work going forward? Who’s going to do what? How will we work? Goals for next meeting…

Designing a Statewide System for Measuring Teacher and Leader Effectiveness

Designing a Statewide System for Measuring Teacher and Leader Effectiveness

Presentation Transcript

Stronge Teacher and Leader Effectiveness Performance Evaluation System

Teacher Professional Growth and Effectiveness System

Teacher and Leader Effectiveness: An Overview for Counselors

Measuring Teacher Effectiveness

Leader Effectiveness Performance Evaluation System

Measuring teacher and principal effectiveness

Stronge Teacher and Leader Effectiveness Performance Evaluation System

Leader Effectiveness Performance Evaluation System

Measuring Teacher Effectiveness in Physical Education

Teacher Keys Effectiveness System

Measuring Teacher Effectiveness (MTE)

Cobb Keys for Teacher and Leader Effectiveness

Models for Evaluating Teacher/Leader Effectiveness

Teacher and Leader Effectiveness Performance Evaluation System

Teacher and Leader Effectiveness Performance Evaluation System

Teacher / Leader Effectiveness

Designing for Effectiveness

Teacher and Leader Effectiveness (TLE) Observation and Evaluation System

Student Learning and Achievement in Measuring Teacher Effectiveness

Teacher Keys Effectiveness System

Teacher Keys Effectiveness System

Teacher and Leader Effectiveness (TLE) Observation and Evaluation System