Rachel A. GordonUniversity of Illinois at Chicago Kerry G. Hofer Peabody Research Institute, Vanderbilt University Assuring Quality Preschool: Where Are We and Where Do We Need to Go? Presentation in the Presidential Session on Universal Preschool: What Have We Learned, and What Does It Mean for Practice and Policy? Annual Meeting of the American Educational Research Association (April 6, 2014).
Acknowledgments: Current • Graduate Students (con’t). • Rowena Crabbe • Fang Peng • Danny Lambouths • Ken Fujimoto • Consultant • Betsy Becker • Institute for Education Sciences Grant #R305A130118 • Principal Investigators • Rachel Gordon • Kerry Hofer • Other Investigators • Sandra Wilson • Everett Smith • Graduate Students • Elisabeth Stewart • Jenny Kushto-Hoban • Hillary Rowe • Anna Colaner
Word of Caution We present some results from our preliminary investigations in this new project. Although we have confidence in what is presented here, these analyses are the first steps towards a more thorough look at the validity of two quality measures. The results may change as we move forward, including as we revise the details of the regression models and the meta-analytic techniques used and as we take the products through peer review.
Acknowledgments: Prior • Graduate Students • Ken Fujimoto • Kristin Abner • Anna Colaner • Nicole Colwell • Xue Wang • IES R305A090065 • NIH R01HD060711 • Principal Investigator • Rachel Gordon • Other Investigators • Everett Smith • Robert Kaestner • Sanders Korenman
Policy Focus on Quality Early Care and Education Policy initiatives focus on high-quality preschool…
Policy Focus on Quality Early Care and Education Source: http://www.whitehouse.gov/issues/education/early-childhood
Policy Focus on Quality Early Care and Education high quality early childhood education Source: http://www.whitehouse.gov/issues/education/early-childhood
Policy Focus on Quality Early Care and Education high-quality early learning programs Source: http://www.whitehouse.gov/issues/education/early-childhood
But, what is high quality? • This question seems simple at first glance, but upon reflection is difficult to answer.
But, what is high quality? (cont.) • In the case of early care and education, thinking carefully about this question, and how to answer it, has increasing importance. • With the push toward expanding access to high quality preschool, measures designed for other purposes have been adopted for high stakes use.
Example: Illinois’ QRISLearning Environment http://www.excelerateillinois.com/docman/resources/2-gold-excelerate-illinois-chart/file
Example: Illinois’ QRISLearning Environment ECERS-R:Average overall score: At least 4.5 with no classroom below a 4.0, verified by on-site independent assessment http://www.excelerateillinois.com/docman/resources/2-gold-excelerate-illinois-chart/file
Example: Illinois’ QRISLearning Environment CLASS:Emotional support and classroom organization average scores above 5.0 with no classroom below 4.0, as verified by on-site independent assessment http://www.excelerateillinois.com/docman/resources/2-gold-excelerate-illinois-chart/file
Is There Evidence for this High Stakes Use of the Measures? • I will review the case that: • Both were developed for other purposes. • Both have limitations for these high stakes uses. • And discuss the rationale for: • Rethinking how we verify program quality. • Establishing research-policy-practice partnerships to support a “next generation” of quality measures.
Do Measures Predict Child Outcomes? I will begin by examining evidence regarding whether the scales predict child outcomes. This aspect of validity is relevant to policy, to the extent that public investments in early care and education are meant to promote optimal child development and school readiness.
Other Aspects of Scale Validity • There may be many different reasons for these small associations of ECERS-R and CLASS scores with child achievement outcomes. • One possibility is low validity in other, more fundamental features of the measures… • at least for assessing aspects of quality that promote school readiness in ways suitable for high stakes uses.
Preview • ECERS-R • Items mix different aspects of quality. • Standard scoring makes it difficult to “pull out” aspects of quality most relevant for school readiness. • CLASS • Highly inferential rating process may limit inter-rater reliability. • Limited empirical evidence for theoretical dimensions.
ECERS-R • Developed in 1970s from a checklist to help practitioners improve the quality of their settings. • Reflects the early childhood education field’s concept of developmentally appropriate practice: • predominance of child-initiated activities selected from a wide array of options; • a “whole child” approach that integrates physical, emotional, social and cognitive development; • teacher facilitation of development by being responsive to children’s age-related and individual needs.
ECERS-R Over 400 indicators across the 43 items! • Standard “stop scoring” structure reflects this checklist, practice and philosophical origin. • Categories from 1 to 7 have several “indicators” • Conditions in the indicators of lower scores must be met before indicators of higher scores are evaluated. • Especially within some items, indicators often organized around contexts of practice and reflect multiple aspects of quality.
ECERS-R Item 10: Meals/Snacks Source: Harms, T., Clifford, R.M., & Cryer, D. (1998). Early Childhood Environment Rating Scale, Revised Edition. New York, NY: Teachers College Press.
Evidence of Category Disorder If higher scores reflect higher quality, then average quality scores should be higher for centers rated in higher categories versus lower categories. In item response theory models, the thresholds between categories should also show a stair-step progression, if they are ordered so that higher categories mark higher quality.
Evidence of Category Disorder Source: Gordon, Rachel A., Ken Fujimoto, Robert Kaestner, Sanders Korenman, and Kristin Abner. 2013. “An Assessment of the Validity of the ECERS-R with Implications for Assessments of Child Care Quality and its Relation to Child Development.” Developmental Psychology, 49: 146-160
Evidence of Category Disorder SourceGordon, Rachel, Kerry Hofer, Ken Fujimoto, Nicole Colwell, Robert Kaestner, Sanders Korenman. “Measuring Aspects of Child Care Quality Specific to Domains of Child Development: An Indicator-level Analysis of the ECERS-R.” Presented in the Paper Symposium "Measuring Early Care and Education Quality: New Insights about the Early Childhood Environment System Rating Scale - Revised" (Chair: Rachel Gordon Discussant: Margaret Burchinal) (Saturday April 20 2013, Seattle WA).
CLASS Source: Pianta, R. C., La Paro, K. M., & Hamre, B. K. (2008). Classroom Assessment Scoring System Manual, PreK. Baltimore MD: Brookes Publishing. • Unlike the checklist and practice origins of the ECERS-R several decades ago… • The CLASS was developed more recently based on “developmental theory and research suggesting that interactions between students and adults are the primary mechanism of student development and learning.” (Pianta, La Paro & Hamre, p. 1) • Its predecessor was part of a research study, and it was aimed at professional development and coaching use before being adopted in high stakes policy contexts. • The CLASS manual requires observers to assimilate what they see in order to assign scores to just a few items. • The manual advises: “Because of the highly inferential nature of the CLASS, scores should never be given without referring to the manual.” (Pianta, La Paro & Hamre, p. 17, bold in original)
CLASS Results • A recent publication from the CLASS developers (Cash, Hamre, Pianta, & Myers, 2012) reveals: • Exact reliability is low: 41% overall exact agreement with master score in training of 2,093 Head Start staff. • Black and Latino raters placed their Instructional Support scores farther from the master score as did raters who disagreed with intentional teaching beliefs.
CLASS Results (cont). • The CLASS developers also recently found (Hamre, Hatfield, Pianta & Jamil, in press): • a bi-factor structure with one general dimension (responsive teaching) and two specific dimensions (proactive management and routines; cognitive facilitation). • these differ from the subscales written into policy. • In our work, we are replicating these results, and also examining the targeting and content of items with IRT models.
Next Steps This body of evidence highlights the way in which measures developed for other purposes have been adopted for high stakes policy uses. Not surprisingly, there are limitations in the validity of these measures for this high stakes purpose.
Next Steps (cont.) • Consistent with the latest Standards for Educational and Psychological Testing we contend there is a need to step back and consider the intents of these policy uses, build in continuous and local validation of measures selected for these uses, and allow for the refinement of measures over place and time. • We need to consider big picture questions like: • What are the goals of public investments in preschool? • How do we design quality measures to help assure we are meeting those specific goals? http://www.apa.org/science/programs/testing/standards.aspx
Next Steps (cont.) As a concrete example, if it is desirable to distinguish classrooms that fall above and below specific thresholds, as in current policy uses, then measures with very high information (and low error) at those thresholds are needed. If instead it is desirable to invest public dollars in improving quality through coaching, then we would like to have a measure (or two linked measures) that cover the continuum of quality over which growth is expected.
Next Steps (cont.) As another example, it is essential to think carefully about variation in quality across children, classrooms, times of day, days of week, and weeks of year. We currently have very little evidence about such variation -- and the extent to which choices about when and how to observe classrooms affects measure validity -- including for high stakes uses.
Next Steps (cont.) • Even with these challenges, there is much potential to build new evidence.