1 / 24

EALTA MILSIG: Standardising the assessment of writing across nations

EALTA MILSIG: Standardising the assessment of writing across nations. Ülle Türk Language Testing Unit Estonian Defence Forces. STANAG 6001 testing conference 7-9 July 2009 Zagreb, Croatia. Outline. Background Aims of the project Procedure Standard setting Results Conclusions.

foerster
Télécharger la présentation

EALTA MILSIG: Standardising the assessment of writing across nations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EALTA MILSIG:Standardising the assessment of writing across nations Ülle Türk Language Testing Unit Estonian Defence Forces STANAG 6001 testing conference 7-9 July 2009 Zagreb, Croatia

  2. Outline Background Aims of the project Procedure Standard setting Results Conclusions

  3. Background: EALTA • EALTA = European Association for Language Testing and Assessment • Established in 2004 as a professional association for language testers in Europe. • Mission: to promote the understanding of theoretical principles of language testing and assessment, and the improvement and sharing of testing and assessment practices throughout Europe. • Annual conferences • Discussion lists • ealta-members@lists.lancs.ac.uk • specialist lists

  4. Background: MILSIG • March 2008 – MILSIG mailing list established: ealta-mil@lists.lancs.ac.uk • EALTA conference in 2008: • a meeting of language testers working in the military • participating countries/ institutions: Denmark, Estonia, Latvia, Lithuania, SHAPE, Slovenia, Sweden • agreement to co-operate in standardising writing assessment

  5. Aims of the project • To select a number of sample scripts that • have been written in response to a variety of prompts • demonstrate English language proficiency at STANAG levels 1-3 (4) • could later be used as • benchmark performances in assessing writing and in rater training • sample performances for teachers and test takers • To study the possibility of carrying out standardisation via email.

  6. Procedure and timeline • Each participating country/institution selects 4 scripts, including problem scripts, at levels 1-3 – end of May • Scripts are collected, coded and sent to all participants – middle of June • Scripts are marked following the procedures established in each country – end of September • STANAG level descriptors used • Weak, standard and strong performances at each level identified • Comments provided • Results analysed; decisions taken

  7. Participants • Denmark (1) • Estonia (5) • Latvia (4) • Lithuania (3) • SHAPE (2) • Slovenia (5)

  8. Council of Europe: A manual Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR) Pilot version: September 2003 Final version: January 2009 ‘Relating an examination or test to the CEFR can best be seen as a process of “building an argument” based on a theoretical rationale.’ (p 9) Familiarisation Specification Standardisation training/benchmarking Standard setting Validation Standard setting procedures

  9. Table 5.2: Time Management for Assessing Written Performance Samples

  10. Familiarisation: Raters rating descriptors Mean correlation: 0.89 (SD =.04) Range: 0.83 (R14) to 0.98 (R05)

  11. 27 scripts: 12 letters: 4 (+ 5) essays: 1 report: 1 memorandum: -------------------- A first draft of a lecture (2): Paper for a newsletter (1): Paper/letter/essay (1): 6 L1, 14 L2, 7 L3 3 L1, 8 L2, 1 L3 2 L1, 4 L2, 3 L3 L3 L2 1 L2, 1 L3 L1 L3 Task types and original ratings

  12. Rating scripts • Task: • Use STANAG 6001 writing descriptors, NOT your own rating scale. • If the script was written for a STANAG 6001 test in your country/ institution, which level would it be awarded? • Do you consider it a weak, standard or strong performance at the awarded level? • Why?

  13. Analysis of ratings • Coding: • L1 weak = 1 • L1 standard = 2 • L1 strong = 3 • L2 weak = 4 • L2 standard = 5 • L2 strong = 6 • L3 weak = 7 • L3 standard = 8 • L3 strong = 9

  14. Scripts recoded • MILSIGPR_01–MILSIGPR_12a = MSP-01–MSP-12 • MILSIGPR_12b = MSP-13 • MILSIGPR_12c = MSP-14 • MILSIGPR_12d = MSP-15 • MILSIGPR_12e = MSP-16 • MILSIGPR_12f = MSP-17 • MILSIGPR_12g = MSP-18 • MILSIGPR_12h = MSP-19 • MILSIGPR_13 = MSP-20 • MILSIGPR_14 = MSP-21 • etc

  15. Script ratings • Mean rating: 2.8–7.8 (St dev: 0.00-1.47) • 1-3 (L1): 1 script (6 scripts) • 4-6 (L2): 24 scripts (12 scripts) • 7-9 (L3): 2 scripts (7 scripts) • 15 scripts (55.6%) – agreement on the level, though usually not on whether it is weak, standard or strong performance at that level

  16. Three examples • MILSIGPR_07 (MSP-07) • A lot of grammatical mistakes, spelling, very basic range. Not enough for Level 2. • MILSIGPR_13 (MSP-20) • task at level 3, but the writing is not coherent, very incorrect, sometimes difficult to understand the meaning and very uninteresting – getting even worse towards the end • MILSIGPR_14 (MSP-21) • well written with control of grammar, good vocabulary and abstract concepts and arguments clearly conveyed, the person might be able to write at a high level 3, but does not quite prove it here

  17. Mean ratings for scripts Mean rating: 5.2 (SD = 1.44)

  18. Script ratings by country

  19. Correlations between country ratings N = 27; N = 23 All significantat 0.01 level

  20. Mean ratings by task type

  21. Conclusions • Such a project is indeed needed!

  22. Way forward • 1 L1 script, 12 L2 scripts, 2 L3 scripts • Analysis of scripts  good benchmarks? • Collecting more scripts, particularly at L3 • Scripts based on a variety of task types • Did we start at the wrong end? • Looking at scripts that caused disagreement • Can we reach agreement? • What features make them problematic? • Expanding the circle to include more countries

  23. References • EALTA website: http://www.ealta.eu.org • Council of Europe. 2009.Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR): http://www.coe.int/t/dg4/linguistic/Manuel1_EN.asp

More Related