Issues relating to Large-scale Assessments

Issues relating to Large-scale Assessments Margaret Wu Victoria University

International large-scale assessments • Main problem: interpretations of the results • Focus on country rankings • An example: • In August 2012, Julia Gillard, then Prime Minister of Australia, declared that Australia would strive to be ranked in the ‘top five’ in international education assessments by 2025. • So strong is this ambition that it has been inscribed into the Australian Education Act of 2013 as its very first objective, which reads: ‘Australia to be placed, by 2025, in the top 5 highest performing countries based on the performance of school students in reading, mathematics and science’ (Australian Education Act, 2013, p. 3)

Does high ranking mean good education system? • A Vietnamese researcher queried why Vietnam did well in PISA despite poor education system in Vietnam • Gorur & Wu, 2014, (Former OECD official– interview transcript): What’s the good of [the rankings]? what is the benefit to the US to be told that it is number seven or number 10? It’s useless, meaningless, except for a media beat up and political huffing and puffing. It’s very important for the US to know, having defined certain goals like improving participation rates for impoverished students from suburbs in large cities – whether in fact that is happening, and if it is, why it is happening and if not, why not. And it is irrelevant whether Chile or Russia or France is doing better or worse – that doesn’t help one bit – in fact it probably hinders. Makes people feel uncertain, unsure, nervous, and they rush over there and find out why they are doing better.

And rushed there, they did… • (Australian) Grattan Institute’s Catching Up: Learning from the Best School Systems in East Asia (Jensen et al., 2012) • … researchers from Grattan Institute visited the four education systems [Hong Kong, Shanghai, Korea and Singapore] studied in this report. They met educators, government officials, school principals, teachers and researchers. They collected extensive documentation at central, District and school levels. Grattan Institute has used this field research and the lessons taken from the Roundtable to write this report (p. 6)

Suggested factors for high ranking (performance) • One observation made by the Grattan Institute… • “Shanghai, for example, has larger class sizes to give teachers more time for school-based research to improve learning and teaching.” (p.2) • (Observation also made by OECD PISA, 2010) • New Zealand government proposed to increase class size to free up money to fund initiatives to raise the quality of teaching (NZ Treasury briefing paper, March, 2012)

Discussion points • These “policies” are often said to be “evidence-based”, where large-scale assessments are frequently quoted as the sources of evidence. • Why should we be concerned with these policies? • Consider • Validity - issues • Reliability - issues

Validity issues • Linking factors to performance • Korea and China perform well, and have large class sizes. • Can we conclude large class size leads to good performance? • Making inferences: • No. of storks positively correlated with no. of babies born • Crime rate positively correlated with ice cream sale • People who take care of their teeth have better general health • Mediating variables at play

Linking PISA to Policies • PISA tells us about student performance, and background of students/schools/countries • Linking background to performance is done by people, not proven by statistics. • Any interpretation is an inference. • PISA cannot substantiate the validity of the inferences. • Need other in-depth studies.

A common misunderstanding about statistical analysis • regression equation Y = a + bX • X is termed explanatory variable • Y is termed dependent variable • Does X explain Y? • Try X = a + bY • Exactly the same results • Regression does not test for causal inference. Regression only reflects correlation.

Regress Reading on GDP scores

Reliability Issues • How strong is the relationship between two variables? • P value = 0.11 n.s. at 95% level

Top five in what? • Interview transcript of a senior OECD official (Gorur & Wu): • OECD Official: Well, Australia is doing pretty well! • RG: It’s doing well, right? But you know what we want to do now? Our Prime Minister says we want to be in the top five in PISA! • OECD Official: Top five in what? • RG: In PISA. • OECD Official: Yes, but for which students? The average student in Canada, in Korea, Finland, Shanghai, China – that’s one thing. If you then look at high performing students or how low performing students do, then we may get a completely different picture. And that’s where policy efforts are most interesting for me.

Australian 2009 PISA Reading results, by state In top 5 already Below OECD average

Ranking by item content

Differential Item Functioning (DIF) • Australia performed extremely well on Items M408Q01TR and M420Q01TR, ranking third and second respectively internationally. For ItemM408Q01TR, Shanghai-China ranked 20th, despite the fact that Shanghai took the top spot internationally in mathematics literacy, with a mean score much higher than the second place country, Singapore. For Item M420Q01TR, Australia outperformed all top ranking countries. • In contrast, for Item M462Q01DR, Australia ranked 43 internationally, with an average score of only 0.1 out of a maximum of two, while Shanghai had an average score of 1.5 out of a maximum of two.

Implications of DIF • Average score (and ranking) hides DIF. • Existence of DIF threatens comparisons across countries, as the achievement results depend on which items are in the test.

An Example - Japan • PISA reading • 2000: 522 • 2003: 498 • a 24 point drop, about 6 months of growth! • Triggered huge reactions in Japan • Blame on reform started two years before • New reforms and policies

How PISA trends are established • Select some items from 2000 as “anchoring items” • Place in 2003 test • So 2003 results can be placed on the 2000 scale

Item Bias • Items don’t work in the same way in all countries. • One item may be relatively more difficult for one country than for other countries. • Differential Item Functioning (DIF)

Differential Item Functioning • Hypothetical example: Biased against B /Favours A Biased against A /Favours B

Japan vs International Item Parameters

Anchoring items in Reading 2003 • Many anchoring items were biased against Japan • Japan’s mean score would increase by 10 score points if one particular reading unit was removed from the set of eight anchoring units. (Monseur & Berezner, 2007).

Fluctuation of Country Results • Owing to items selected for a test for reasons such as • Cultural differences • Language differences • Curriculum differences

2000 – 2009 trends • It has often been claimed that Australia is slipping in Reading.

-13 points

What PISA tells us • Big picture • Australia is doing pretty well • Australia and New Zealand lead the English speaking countries • (Confucius culture) Asian countries lead in academic performance • Finland does very well in non Asian countries • May suggest something for further investigation

Limitations of large-scale assessments • Not able to collect data on all factors related to education. • For example, private spending on education has not been captured • Students’ lives outside schools. • Look beyond international ranks • Focus on within country comparisons • Don’t jump to conclusions on policy implications

Issues relating to Large-scale Assessments

Issues relating to Large-scale Assessments

Presentation Transcript

Security Issues Relating to

Book 3: Use of Accommodations in Large-Scale Assessments

LARGE - SCALE ASSESSMENTS

Evidentiary Issues Relating to Forensic Reports

Relating Scale Drawings to Ratios and Rates

Large-scale matching

LARGE SCALE

Large- scale Organisations

Legal Issues Relating to Banking Credit

Challenges in International Large-Scale Assessments

Some initial thoughts on issues relating to Large Facilities

Large scale

Some Topical Issues relating to Trusts

Introduction to Large Scale Change

Key Issues relating to Permanent Establishment

Cartoons relating to Population issues

Laboratory research relating to large woody design

Challenges in International Large-Scale Assessments

Large Scale Drupal

Key Issues relating to Permanent Establishment