1 / 29

Scoring Validity in Austrian E8 National Writing Tests E8 Baseline-Test 2009

Scoring Validity in Austrian E8 National Writing Tests E8 Baseline-Test 2009. Klaus Siller BIFIE (Federal Institute for Education Research, Innovation and Development of the Austrian School System) IATEFL TEA-SIG and University of Innsbruck Conference Innsbruck, September 2011.

perry
Télécharger la présentation

Scoring Validity in Austrian E8 National Writing Tests E8 Baseline-Test 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scoring ValidityinAustrian E8 National Writing TestsE8 Baseline-Test 2009 Klaus Siller BIFIE (Federal Institute for Education Research, Innovation and Development ofthe Austrian School System) IATEFL TEA-SIG and University of Innsbruck Conference Innsbruck, September 2011

  2. Overview • Background: • Baseline 2009 • Test-takers • Purpose • Structure Shaw, S. D. & Weir, C. J. 2007. Examining Writing. Research and practice in assessing second language writing. Cambridge: University Press.

  3. Overview • Rating • Criteria/Rating Scale • Raters/Rating Process • Data Analyses • Methods • Results • Rater Feedback

  4. Background: Test Takers • Pupilsfrom last form oflowersecondaryschools in Austria (Year 8) • 14-year-olds • All abilitygroups • General Secondary School (APS) • Academic Secondary School (AHS)

  5. Background: Purpose • Identifyingstrengthsandweaknesses in testtakers‘ writingcompetence • System monitoring • Improvementofclassroomprocedures • [Individual feedbackfortesttaker] • Low-stakes exam Motivation?

  6. Background: Structure /1 • Difficultylevel: A2/B1 • Short Task: • Expectedresponse 40-60 words • 10 minutes • Long Task: • Expectedresponse 120-150 words • 20 minutes • 5 minutesrevision/editing

  7. Background: Structure /2 • 2 different shortrespectivelylongtasks in 4 booklets • N = ca. 5100 students/task/form

  8. Rating: Criteria & Rating Scale • Range • Accuracy • Relevance • Range ofgrammaticalstructures • Accuracy • Clear andmeaningfulmention/ • elaborationofexpectedcontentpoints • Text-type • Text-length • Productionoffluenttext • (usingadequatedevicesatsentence, paragraph, textlevel) Adaptedfrom: Tankó 2005, 127 Tankó, G. 2005. Into Europe. The Writing Handbook. Budapest: TelekiLászló Foundation.

  9. Rating: Raters & Rater Training • 43 Teachersof English • Different experientalbackgroundand professional training • 4 Writing-Rater-Trainings • 2006/07; 2007/08; 2008/09; 2009

  10. Rating: Rating Process/1 • Standardisation-Meeting (2 days) • Standardisation withbenchmarkedscripts • On-Site-Rating • Individual Rating-Phase • Ca. 6 -8 weeks

  11. Rating: Rating Process/2 • Scanning oftextsat BIFIE • 8.1% APS / 1.1% AHS excludedfromscanningprocess • Productionof Rating-Booklets • 1booklet per rater incl. 300 Short Texts • 1 bookletper rater incl. 300 Long Texts • Overlapfor multiple/double-rating • 10 texts / 500 texts per task • 2 correspondingbookletswith rating-sheets

  12. Rating: Rating Process/3 • Rating-Sheets: Ratings electronicallyscannedat BIFIE

  13. Data Analyses: CalibrationandScaling Student ability Dimension Task difficulty Rater leniency Interaction effects Ratings Toquantifytheextentofvariancesofeffect Togivefeedbacktoraters (self-reflexion) Toimproveprocedures

  14. Data Analyses: Methods VarianceComponent Analysis Quantification Comparisonofmeans Rater Leniency Rater Feedback Correlations* Rater Agreement * c. betweentheobservedratingsandthe „true“ ratings (i.e. mostfrequentratingof all ratings in multiple marking(43 ratings)

  15. Purpose: VarianceComponent Analysis • Howbigistheeffectofthestudent‘swritingability on the score? Source ofVariance = 100% • Howmuchisthestudent‘swritingabilityaffectedbycomponentsliketask, dimensionorinteractioneffects?

  16. Results: VarianceComponent Analysis

  17. Purpose: VarianceComponent Analysis • Howbigistheeffectofraterseverityon the score? Source ofVariance = 0% • Israterseverityaffectedbycomponentsliketask, dimensionorinteractioneffects? Variance= 0% • Howbigistheeffectofmeasurementerrors? (Halo Effect; Residuum) Variance= 0%

  18. Results: VarianceComponent Analysis

  19. Individual Rater Feedback • Purpose: • Tohighlighteffectson ratings • Tostarta processofself-reflexion • Individual Rater Brochure: • General explanations • Sample chartsandinterpretations (incl. „ideal“ values) re. rateragreementandraterseverity • Guidingquestionstosupportself-reflexion • Individual results (charts) re. rateragreementandseverity

  20. Rater Feedback: Rater Agreement

  21. Rater Feedback: Rater Agreement

  22. Rater Feedback: Rater Agreement

  23. Rater Feedback: Rater Leniency/Harshness

  24. Rater Feedback: Rater Leniency/Harshness

  25. Rater Feedback: Rater Leniency/Harshness

  26. Rater Feedback: Sample Texts + Individual Ratings

  27. Conclusions / Further Research • Rater Training/Rating: • Political decisionstobeapplied (e.g. durationoftraining) • Improved material fortrainings • Clarificationsre. ratingscale (e.g. additional scaleinterpretationsfor all dimensions) • Further Research: • On all aspectsofthescoringprocess (e.g. correlationbetweenschool type, gender, yearoftraining, ageandraterleniency) • CEF-Linking!

  28. References • Breit, S. & Schreiner, C. (Eds.) (2010). Bildungsstandards: Baseline 2009 (8. Schulstufe). Technischer Bericht. Salzburg: BIFIE. Availableasdownloadfromhttp://www.bifie.at/buch/1056 [14. April, 2011] • Eckes, T. (2011). Introduction to Many-Facet Rasch Measurement. Frankfurt: Peter Lang • Gassner, O., Mewald C., Brock, R., Lackenbauer, F. & Siller, K. (tobepublished). Testing Writing forthe E8 Standards. Technical Report 2011. Salzburg: BIFIE • Lumley, T. (2005). Assessing Second Language Writing. The Rater’s Perspective. Frankfurt: Peter Lang. • Shaw, S. D. & Weir, C. J. (2007). Examining Writing. Research and practice in assessing second language writing. Cambridge: University Press. • Tankó, G. (2005). Into Europe. The Writing Handbook. Budapest: TelekiLászló Foundation.

  29. Thankyou! www.bifie.at/bildungsstandards k.siller@bifie.at

More Related