1 / 46

Strengthening Year 12 learning and certification through school-based and external assessment

Strengthening Year 12 learning and certification through school-based and external assessment. Barry McGaw Director, University of Melbourne Education Research Institute Former Director for Education, OECD. Principals’ Forum. WA Curriculum Council. Perth, 4 March 2008. PISA 2003. PISA 2000.

Télécharger la présentation

Strengthening Year 12 learning and certification through school-based and external assessment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Strengthening Year 12 learning and certification through school-based and external assessment Barry McGawDirector, University of Melbourne Education Research InstituteFormer Director for Education, OECD Principals’ Forum WA Curriculum Council Perth, 4 March 2008

  2. PISA 2003 PISA 2000 PISA 2006 Finland Finland Ahead of Australia KoreaCanadaNZHong Kong KoreaCanadaNZ Same as Australia Hong Kong Behind Australia Australia’s ranking in OECD/PISA Reading • Reading ranks • PISA 2000: 4th but tied for 2nd • PISA 2003: 4th but tied for 2nd • PISA 2006: 7th but tied for 6th FinlandKoreaCanadaNZHong Kong

  3. Mean performances in OECD/PISA reading Higher performers improved. Korea Finland Hong KongChina Lowerperformers improved. Canada New Zealand Higherperformersdeclined. Australia Changes for Finland, Canada&New Zealandare not significant. OECD (2007), PISA 2006: science competencies for tomorrow’s world, Vol. 1 - analysis, Fig. 6.21, p.319.

  4. Trends in Australian reading performances 95th %ile 90th %ile 75th %ile Mean 25th %ile 10th %ile 5th %ile OECD (2007), PISA 2006: science competencies for tomorrow’s world, Vol. 1 - analysis, Fig. 6.21, p.319.

  5. Observations We have just mixed ranks and performance levels in educational assessment. This is mixing norm-referenced and criterion-referenced assessment. Question Could we do this with Year 12 assessment?

  6. Results when examiners allow for difficulty differences Result accounting for difficulty statistically McGaw (1997), Shaping their future: Recommendations for reform of the Higher School Certificate. Sydney: Department of Training and Education Co-ordination, Appendix C. Allowing for differences in question difficulty In examinations offering choice of questions, examiners may consider question difficulty in marking. How well do they do? The 1996 New South Wales Grade 12 Geography examination gave choice of one of 32 combinations of three questions. Taking statistical account of question difficulty, students with the same overall performance would have received around 4 marks less from the examiners if they had chosen the most difficult 3-question combination compared with the easiest 3-question combination.

  7. Observation We make some strange assumptions in examinations and examination marking when there is a choice of questions to answer. We either assume all questions are of equal difficulty or that markers can somehow adjust for the differences. Question Could we do better than this with Year 12 assessments? Could we do so with school-based assessments?

  8. Potential power of assessments…

  9. Potential power of assessment • Assessment is a powerful educational tool • Helps students see own progress • Enables teachers to monitor students and themselves • Expresses what systems take to be important • Can drive reform • When stakes are high, it can be counterproductive • Driving attention to the narrow and measurable • Causing others to ignore the important but un-measurable • Causing others to ignore the longer term • Discouraging risk-taking • Assessment must be focused on what is important • Debate about whether assessment is properly focused is… • A debate about validity

  10. Conceptions of validity • Content validity • Face validity • Curricular validity • Criterion-related validity • Predictive – note problems of attenuation with selection • Concurrent – e.g. paper & pencil as proxy for performance • Construct validity • Convergent – similar to other tests of same construct • Discriminant – different from tests of other constructs • Consequential validity • Not any adverse consequence • Only adverse consequences due to test invalidity • Does not mean unreasonable expectations must be satisfied • Having to tell teachers only what they don’t know • Providing the mechanism for improvement

  11. Distinguishing purposes of assessment

  12. Purpose of educational assessment E. F. Lindquist (Ed). (1951). Educational Measurement “the functions of educational measurement are concerned…with the facilitation of learning” (Cook, 1951). “educational measurement is conceived, not as a process quite apart from instruction, but rather as an integral part of it” (Tyler, 1951).

  13. Formativevs summativeassessment

  14. Distinguishing purposes of assessment • Formative • Frequent, interactive assessments of a student to identify learning needs and shape teaching • Barriers to widespread use: • tension between classroom-based formative assessment and high-visibility summative assessments • lack of connection between systemic, school and classroom approaches to assessment and evaluation • Summative • To provide summary assessments of a student’s at a particular stage. • Status of stages can vary: • Annual reports to parents from schools: low stakes • System-level cohort assessment: low stakes for students but high stakes for schools • Public examinations: high stakes

  15. Benefits of formative assessment Paul Black & Dylan Wiliam (1998). Assessment and Classroom Learning. Assessment in education: Principles, policy and practice, 5, 7-74. “Assessment becomes ‘formative assessment’ when evidence is actually used to adapt the teaching work to meet student needs.” Formative assessment experiments produce effect sizes of .40 - .70, larger than found for most educational interventions. Many studies show that improved formative assessments help low achievers most.

  16. Aligning formative and summative assessment • Scope of alignment • Basic: ensure policies do not conflict • Sophisticated: formative and summative reinforce other • Strategies • Ensure summative assessments measure key skills on which development is expected to occur • Convince teachers that use of formative assessment will lead to better summative assessment results • Encourage risk-taking in teachers as they explore better ways of assessing and teaching • Broaden basis for judging teachers to include, for example, students’ capacity to judge own progress and (possibly) progress of others, student motivation... OECD (2005), Formative assessment: improving learning in secondary classrooms. Paris: Author.

  17. Norm-referenced vs criterion-referenced assessment

  18. Point of reference in measurement: external criteria to norms…

  19. Point of reference for judging individuals • Using an independent external measure (Psychophysics) • Judgements of phenomenon (e.g. brightness of light) • requiring judgements of differences, not absolute values • comparing judgements with direct measures of phenomenon • developing a scale of human judgement of phenomenon • Interest in nature of human judgement not phenomenon • Using performance of others to judge individuals • Psychological phenomena without external measure • developed in the context of studies of individual differences • Individual performances judged in relation to: • the performance of others • in particular, the average performance of others (or norm) • Want to look better? • Choose different company!

  20. In search of external criteria…

  21. In search of an external criterion • Criterion-referenced measurement • Specify learning required (Glaser, 1963) • Judge students against requirements (not each other) • Criteria disaggregated, often to level of items Glaser, R. (1963), Instructional technology and the measurement of learning outcomes: some questions. American Psychologist, 18, 519-521.

  22. In search of well defined external criteria…

  23. In search of an external criterion • Criterion-referenced measurement • Specify learning required (Glaser, 1963) • Judge students against requirements (not each other) • Criteria disaggregated, often to level of items • New psychometric models • Simultaneous scale construction and measurement • locate tasks on scale by difficulty • locate individuals on same scale by performance • interpret performance with reference to tasks Glaser, R. (1963), Instructional technology and the measurement of learning outcomes: some questions. American Psychologist, 18, 519-521.

  24. 6 669 [586] 5 607 4 544 3 [439] 482 2 420 1 358 [406] <1 PISA 2003 mathematics item Mei-Ling from Singapore was preparing to go to South Africa for 3 months as an exchange student. She needed to change some Singapore dollars (SGD) into South African rand (ZAR). During these 3 months the exchange rate had changed from 4.2 to 4.0 ZAR per SGD. Was it in Mei-Ling’s favour that the exchange rate now was 4.0 ZAR instead of 4.2 ZAR, when she changed her South African rand back to Singapore dollars? Give an explanation to support your answer. On returning to Singapore after 3 months, Mei-Ling had 3900 ZAR left. She changed this back to Singapore dollars, notingthat the exchange rate had changed to: 1 SGD = 4.0 ZAR. How much money in Singapore dollars did Mei-Ling get? Mei-Ling found out that the exchange rate between Singapore dollars and South African rand was: 1 SGD = 4.2 ZAR. Mei-Ling changed 3000 Singapore dollars into South African rand at this exchange rate. How much money in South African rand did Mei-Ling get?

  25. Tapping science beliefs Doig, B.A. & Adams, R.J. (1993), Tapping students' science beliefs: a resource for teaching and learning. Hawthorn Vic: Australian Council for Educational Research

  26. Classification of responses condensation when air cools 4 from atmosphere - no mechanism 3 condensation - no atmosphere 2 liquid on outside comes from inside 1 liquid passed through sides of jug 0 uninterpretable responses 0 Content of response Score

  27. condensation when air cools 2 16 from atmosphere - no mechanism 7 30 condensation - no atmosphere 36 23 liquid on outside comes from inside 23 15 liquid passed through sides of jug 6 3 uninterpretable responses 25 13 Percent of responses by type Grade 5 9

  28. Category boundaries condensation when air cools 63 from atmosphere - no mechanism 61 condensation - no atmosphere 60 liquid on outside comes from inside 58 liquid passed through sides of jug uninterpretable responses

  29. 64 62 60 58 56 Structure of matter scale (using further items) notion of chemical reactions particulate model of matter knows processes not easily observed identifies only observed components simple observations and definitions examples or 'magic' as explanation

  30. Walking: score 3 Walking: score 2 Walking: score 1 Exchange rate Q3: score 1 Skateboard: score 2 Skateboard: score 1 Exchange rate Q2: score 1 Exchange rate Q1: score 1 Map of selected PISA 2003 mathematics tasks OECD (2004), Learning for tomorrow’s world: First results from PISA 2003, p.48.

  31. Using school-based and external assessments…

  32. Use of school achievement data • School-based assessments • Can measure things not readily measured in examinations • Can measure work requiring sustained effort • Comparability across schools difficult to assure • Examinations • Comparability across schools is assured • Can measure only a limited range of things • Constraints of form • Constraints of time • Combining the two • Need to be measures of the same construct • Bring to same scale before adding • Use normative or criterion information to ‘scale’ the two forms of assessment

  33. Using criterion-referencing in public examinations…

  34. Public examinations • High-stakes assessments based on curriculum • secondary certification and university entrance • selection for highly competitive courses (top 1½%) • The comparability-over-time problem… • Grade distributions used to monitor standards • failure rate used as a measure of ‘standards’ • claim that if participation rates grow, grades should decline to ensure that an ‘A’ still and ‘A’, etc • do enough students fail? • Criterion (standard) and norm (cohort)-referencing • ‘standards’ are never absent (in curriculum, examination) • ‘standards’ are ignored in the norm-based award of results • cannot use link items over time, whole test made public • marrying criterion and norm-referencing with judgments

  35. Marrying criterion and norm-referencing • England • use of criteria defined for some grade boundaries • review of previous years’ scripts at grade boundaries • reference to prior grade distributions • reference to evidence of change in student cohort to justify shifts in grade distributions between years • Australia (New South Wales) • development of band descriptors • ‘consistent’ definition of bands over years. • reporting with norm and criterion-referencing

  36. Student’s examination mark Student’s school assessmentmark Student’s overall mark Norm referenced: distribution of results for all students Mark Range:0–100 Number of candidates Minimum standard expected (50) Standards referenced:bands describing what students know and can do

  37. All HSC courses listed with:(School) Assessment Mark, Examination Mark,(Overall) HSC Mark,Performance Band All Preliminary courses listed

  38. How New South Wales got there… • Review and recommendations for change • New NSW Higher School Certificate • McGaw, (1997). Shaping their future: Recommendations for reform of the Higher School Certificate. Sydney: Department of Training and Education Co-ordination • Scaling process • standards-referencing to curriculum and over-time • Bennett, J. (2001), Standards-setting and the NSW Higher School Certificate www.boardofstudies.nsw.edu.au/manuals/pdf_doc/bennett.pdf • Developing grade descriptors • Used past examinations • experienced examiners for each subject • reviewed examination papers and students’ marked papers • Developed band descriptors • described performance for Band 6 to 2, low Band 1 not described

  39. Using grade descriptors • Stage 1 • Examiners independently form ‘image of band’ • Set cut mark for each band boundary on each question • Stage 2 • Examiners work together to reach agreement on boundary locations for bands on each question • Boundary locations for total scores also established • Stage 3 • Student work at boundaries on total scores reviewed • Cut points reviewed and determined • Boundaries located on mark scale • 5/6 boundary set to 90 • 4/5 boundary set to 80 • … • 1/2 boundary set to 50

  40. Back to the validity…

  41. Conceptions of validity • Content validity • Face validity • Curricular validity • Criterion-related validity • Predictive • Concurrent – school-based and external (within schools) • Construct validity • Convergent – similar to other tests of same construct • Discriminant – different from tests of other constructs • Consequential validity

  42. And to the benefits…

  43. Benefits of good assessment – for students • Makes goals clear to learners • Makes improvement clear • Normative assessment offers only improvement in rank (thus improvement at the expense of others) • Criterion (or standards) referenced assessment shows improvement in terms of knowledge and skills • Can teach learners how to monitor own learning • Key meta-cognitive capacity • Builds the base for lifelong learning

  44. Benefits of good assessment – for schools • Makes clear what is possible for learners like theirs • Shows schools where they stand in a broad picture • Shows where they stand with fair comparisons • Makes improvement clear • School’s movement on a criterion scale clear • Buy-in will depend on: • Data being relevant • Information being helpful (and novel) • Comparisons being fair • Results being used productively (not necessarily protectively)

  45. Benefits of good assessment – for systems • Establishes current status • Makes improvement clear • Permits richer reflection on the state of system…

  46. Thank-you Contactbarry.mcgaw@mcgawgroup.orgbmcgaw@unimelb.edu.au

More Related