School administration and student assessment A comparative analysis using SACMEQ data

School administration and student assessment A comparative analysis using SACMEQ data SACHES conference 2015 October 2015

Contribution to the work on improving standardisation of school level administration of student assessments • The context • The literature • Using quantitative and statistical means to uncover problems in school administration • Comparative application • SACMEQ data and analysis • Preliminary analysis and comparative work • Policy implications Topics to be covered

> The literature CN • Context of school administration of student assessment • US state-based test burden high and anti-testing sentiment high since federal legislation institutionalised state-based testing rewards and incentives in 2001 (Jacob and Levitt [2003], Bishop [2007]). Although decades of different forms of state-based student assessment preceded federal legislation. • SA weak, diverse and inconsistent assessment capacity, practice and effectiveness with some good practice. Taylor and Vinjevold (1999), Umalusi (2004), Grussendorff et al (2014), and Department of Basic Education (2015).

> The literature (contd.) • Statistical analysis can provide means to detect (teacher or official initiated) cheating of a particular magnitude using item-level data (Gustafsson, 2014). • Cheating related to incentives and perceptions of own weaknesses. Documented instances of manipulation, cheating, exclusion, teaching to the test, extra feeding on the day reported: Jacob (2002), Hanushek and Raymond (2002), Carnoy and Loeb (2002). • Focus of our analysis on empirical analysis of teacher initiated cheating: • Direct instruction or explanation/assistance during the test • Illicit prior access to the test (leading to drilling on items) • Not able to pick up test score adjustment and teaching to the test

> The literature (contd.) • Jacob and Levitt (2003) research in Chicago Public Schools using Iowa Test of Basic Skills (Gr 3 to 8 from 1993 to 2000) • Two types of strange patterns explored empirically (both associated with classrooms with high cheating probability) • Unexpected mass test score fluctuations from year to year (large gains followed by small or insignificant gains thereafter. • Unexpected patterns of answers within a classroom • 4 indicators developed to capture the different manifestations of illicit activities by teachers • Applied of methodology to verification Annual National Assessment data for 2013 and SACMEQ 2000 and 2007 mathematics data (Gustafsson, 2014) with emphasis on detection of abuses of school administration of student assessment.

> The literature (contd.) • Jacob and Levitt (2003) indicators of teacher initiated cheating (detected in 5% of classes) • Measure 1 (M1): Large fluctuations between years of average test score relative to others (same subject, grade and year) – requires large year on year data • M2: Suspicious blocks or strings of answers: • identical answers on consecutive questions irrespective of student background, subject, grade and year results in within-classroom correlation. • Higher within-classroom correlation of responses (can also be due to emphasis of certain topics) based on regression how unexpected each student’s responses is to predicted response per item. J & L used specific response, Gustafsson used correct responses.

> The literature (contd.) • Jacob and Levitt (2003) indicators • M3: High variance in cross-question correlation in cheating classrooms. • In items where cheating occurred, within-class variation is higher, versus other items where it is normal – leading to higher variance in correlations in cheating classrooms. • M4: High levels of similarity in response patterns between students with similar performance in the system is given. • Answers students in one class give compared to similar students in the system with same score are compared. High performers get easy questions right in general so if in a particular class (compared with rest of the system), students of a particular ability get easy questions wrong and hard questions right, cheating likely to be occurring.

> The literature (contd.) These graphs essentially explain the J&L Measure 4. Illustration of non-strange school Illustration of strange school

> The SACMEQ programme • Regional learning assessment programme spanning 15 countries in 2000, 2007 and more recently in 2014. SACMEQ: Botswana, Kenya, Lesotho, Malawi, Mauritius, Mozambique, Namibia, Seychelles, South Africa, Swaziland, Tanzania (Mainland), Tanzania (Zanzibar), Uganda, Zambia, and Zimbabwe. • Grade 6 assessment in Maths and Language national random sample survey (grade level sampling) – English in all countries and translations in Mozambique (Port), Tanzania and Zanzibar (kiSwahili). • Comparative education quality data – including effective enrolment, teacher competence, subject knowledge, learner background, achievement (Spaull and Taylor, 2012)

> Preliminary analysis MG • How dissimilar are the item-level patterns of countries? Across-country similarity in item-level results (2000 math.) The same neat pattern not found when 2007 data used. Language test results not analysed, but this should be interesting. Note in 2007 language situation became more complex as SOU introduced Afrikaans. Not clear if NAM did too. The expected correlation between item-level similarity and overall performance not found. But language of the tests does seem to matter for results. Calculation of vertical axis: First each country’s correlation of percentage correct per item was found against every other country. Then the simple mean across each country’s correlation coefficients was found.

> Preliminary analysis (contd.) • Certain countries display marked discrepancies between 2000 and 2007 item-level distributions. School weights used. Kenya In the case of Tanzania (but also Zanzibar), certain items which were difficult in 2000 were easy in 2007. The graphs focus on comparing the distributions in 2000 and 2007. If in each year the percentage correct had been identical across all items, horizontal lines at 100 would appear. Tanzania

> Preliminary analysis (contd.) Effect on rankings of ignoring problematic items Excluding problematic items makes, on the whole, little difference to the 2000 to 2007 ranking changes.

> Application of J&L cheating measures • M4 easy to compute and intuitive and easier to explain (Gustafsson, 2014). • M4 indicator computed per school within countries using item level SACMEQ data in 2000 and 2007. • Composite M4 indicators identified for each country at the 90th percentile (high values). • Answers students in one class give compared to similar students in the system with same score are compared. High performers get easy questions right in general so if in a particular class (compared with rest of the system), students of a particular ability get easy questions wrong and hard questions right, cheating likely to be occurring. • Also proportion of schools exceeding the threshold in the SACMEQ 2007 country data. • Some items eliminated in analysis due to their performance • Some test items were eliminated in SACMEQ process between years (60 out of 63 in 2000 analysis, 45 out of 47 in 2007). • Only 44 common items in both years were used for this analysis

> Application of J&L cheating measures (contd.) Test performance and predicted Measure 4 value 2000 To check the original J&L assumption that Measure 4 is not directly influenced by average test performance in a school, average correct per school was regressed on Measure 4. Whilst several countries do demonstrate a statistically significant and mostly negative correlation, five countries do not (KEN, MOZ, SEY, SWA, ZAN). In one country a positive correlation is found (MAU). The J&L assumption seems to hold.

> Application of J&L cheating measures (contd.) Only the 44 items used in both years were counted. This provided more consistent results, as one would expect. 2000 2007 The maps reflect item-level inconsistencies using Jacob and Levitt’s Measure 4. A value of 10 for a school was considered the maximum acceptable Measure 4 value, so any school with a value exceeding 10 was considered to have a ‘high’ level of item-level inconsistency. Essentially, the patterns for the best countries served as a basis for setting the threshold of 10. The key finding is probably that overall inconsistencies fell for the SACMEQ countries between 2000 and 2007 insofar as 11 countries experienced improvements, whilst 3 did not.

> Application of J&L cheating measures (contd.) Improvements driven by many countries, and all but three countries improved at least a little. By 2007 Swaziland and Botswanawere leading the way. South Africa has emerged as the ‘item-consistency laggard’ in 2007.

> Application of J&L cheating measures (contd.) What do breakdowns by region within country tell us? For a few countries especially telling patterns insofar as the same region shows consistently poor or good values in two years. The South Africa graph is roughly consistent with other data on the effectiveness of provincial administrations. For several SACMEQ countries the conclusion can be drawn that regions do influence the integrity of the SACMEQ process (and presumably other system-wide testing programmes). Best in both years Worst in both years % of ‘problem schools’ by region

> Application of J&L cheating measures (contd.) % of ‘problem schools’ by school type in Lesotho What incentives may exist for private schools to cheat? Are Measure 4 values higher or lower in private schools? It is only really Lesotho which has enough data on each sub-sector, and displays clear differences between the two.

> Policy conclusions CN • Translation (or ‘versioning’) of tests should be done very carefully. Different languages do produce different result patterns. • Requires some deeper research on factors underlying observed country level patterns. • Propose focus on effective assessment practices, procedures and support at classroom and school level in support of formative and summative assessment. • Assessment administration should include disincentive to cheat (or take into account likely levels of cheating) especially in large and diverse systems at country and sub-country level • Need to monitor fraud (low levels or pockets)

Department of Basic Education (2015). Action Plan to 2019: Towards the realisation of Schooling 2030. Pretoria. Available from: <http://www.education.gov.za> [Accessed June 2015]. • Grussendorff S, Booysen C, Burroughs E. (June 2014). What’s in the CAPS package? A comparative study of the National Curriculum Statement (NCS) and the Curriculum and Assessment Policy Statement (CAPS) Further Education and Training (FET) Phase: Overview Report. Umalusi. Pretoria: South Africa. • Gustafsson M (2014). A check on item-level data patterns in the 2013 ANA • associated with possible cheating. Unpublished mimeograph. • Jacob, B.A. & Levitt, S.D. (2003). Rotten apples: An investigation of the prevalence and predictors of teacher cheating. The Quarterly Journal of Economics, 118(3): 843-877. • Onsomu, E, Nzomo, J. & Obiero, C. (2005). The SACMEQ II project in Kenya: A study of the conditions of schooling and the quality of education. Harare: SACMEQ. • Spaull, N., & Taylor, S. (2012). "Effective Enrolment": Creating a composite • measure of educational access and educational quality to accurately describe • educations system performance in sub-Saharan Africa. Stellenbosch Economic • Working Papers (21/12). • Umalusi (2004). Investigation into the Standard of the Senior Certificate • Examination: A Report on Research conducted by Umalusi. Umalusi. Pretoria: South Africa. Key sources

School administration and student assessment A comparative analysis using SACMEQ data

School administration and student assessment A comparative analysis using SACMEQ data

Presentation Transcript

Comparative Public Administration

Comparative Public Administration

Student Assessment Data Portal

Using Data to Improve Student Learning and Strengthen School ...

Comparative Public Administration

13. Analysis Demonstration: Descriptive/Comparative Analysis Using Longitudinal Data

Student Assessment Data

Assessment Data Analysis

A Comparative Analysis

Comparative Data Analysis Ontology (CDAO)

Using Assessment Data to Improve Student Achievement

Risk Assessment using Flight Data Analysis

Comparative Data Analysis Ontology (CDAO)

Student Assessment and Data Analysis Oakland Schools

Using Assessment Data in High School

School Data Analysis Diagnostic Using MISchoolData and ASSIST

Student Assessment and Data Analysis RFP Committee Recommendation

Using Assessment Data

Comparative Public Administration

2013-14 Student Assessment Data and NJ School Performance Report

Using Assessment Data to Improve Student Achievement