Aligning Program Goals, Instructional Practices, and Outcomes Assessment

Aligning Program Goals,Instructional Practices, andOutcomes Assessment Dr. Ray T. Clifford BILC Conference, Budapest 29 May 2006

What connects the instructional components that are included in this year’s conference theme?

BILC Standards-Based Projects • The BILC-developed interpretation of STANAG 6001 approved as an official part of that STANAG. • A BILC Working Group has prepared descriptors for optional plus levels. • A survey was conducted on the desirability of a producing a STANAG 6001 BILC-sponsored, “benchmark” test with Advisory ratings.

Participation in the Survey • 16 countries responded to the survey: Austria Bulgaria Canada Denmark Estonia Finland Germany Hungary Italy Latvia Lithuania Poland Romania Spain Sweden Turkey

Survey Results • Would your country use a Benchmark Test if one were available? Definitely yes: 8 Probably yes: 5 Perhaps: 2 Most likely not: 0 Definitely not: 1

Survey Results • Does your country use “plus levels” when assigning STANAG ratings? Definitely yes: 3 Probably yes: 0 Perhaps: 1 Most likely not: 1 Definitely not: 11

Survey Results • Would you like to have plus levels incorporated into a Benchmark Test? Definitely yes: 5 Probably yes: 5 Perhaps: 2 Most likely not: 2 Definitely not: 2

Summary • A “benchmark” test would be welcomed by most countries. • The scores should be advisory in nature. • Providing “plus” level ratings would allow those ratings to be used or ignored. • BILC should proceed with plans to: • Develop a benchmark STANAG test of reading comprehension. • Explore internet delivery options.

ACT will Assist with Funding • ACT has approved funding to support the development of the BILC Advisory Test (Reading): • Part-time project coordinator. • Computer programming and server support. • Travel expenses for the next meeting of the Test Working Group. • Work is underway. • Test specifications have been completed. • Texts and items are being reviewed.

A Comparison ofTesting Standards • STANAG 6001 • The Common European Framework of Reference for Languages: Learning, teaching, assessment

Every Performance Standard has Three Essential Components • Task • A statement of what is to be done or accomplished. • Conditions • A description of the conditions under which (or context in which) the task is to be performed. • For language this includes the topics to be addressed. • Accuracy • A definition of how well the task must be performed under the conditions stated.

STANAG 6001 - Speaking (Summarized) as a Standard LEVEL TASKS CONTEXT/TOPICS ACCURACY All expected of an educated NS Accepted as an educated NS All subjects 5 Tailor language, counsel, motivate, persuade, negotiate Wide range of professional needs Extensive, precise, and appropriate 4 Errors never interfere with communication & rarely disturb Support opinions, hypothesize, explain, deal with unfamiliar topics Practical, abstract, special interests 3 Concrete, real-world, factual Intelligible even if not used to dealing with non-NS 2 Narrate, describe, give directions Intelligible with effort or practice 1 Q & A, create with the language Everydaysurvival 0 Use memorized phrases Random Unintelligible

STANAG 6001Scale Validation Exercise Conducted at Sofia, Bulgaria 13 October 2005

Instructions • On the top of a blank piece of paper, write the following information: • Your current work assignment: Teacher, Tester, Administrator, Other______ • Your first (or dominate) language: _________ • You do not need to write your name!

Instructions • Next, write the numbers: 0 1 2 3 4 5 down the left side of the paper.

Instructions • You will now be shown 6 descriptions of language speaking proficiency. • Each description will be labeled with a color.

Instructions • Rank the descriptions according to their level of difficulty by writing their color designation next the appropriate number: 0 (easiest) = Color ? 1 (next easiest) = Color ? 2 (next easiest) = Color ? 3 (next easiest) = Color ? 4 (next easiest) = Color ? 5 (most difficult) = Color ?

Ready? • The descriptions will now be presented… • One at a time, • In a random sequence, • For 15 seconds each. • You will see each of the descriptors 4 times. • Thank you for participating in this experiment.

STANAG 6001 Scale Validation: A Timed Exercise Without Training • 74 people turned in their rankings. • They marked their current work assignments as: • Administrator 49 • Teacher 26 • Tester 19 • Other 1

Results of theSTANAG Scale Validation( n = 74 )

The CEF can also be presented as a standard by dividing each of the descriptions into the three components of… • Task(s) • Conditions/Topics • Accuracy expectations

CEF: OVERALL ORAL PRODUCTION(CEF, p. 58)

CEF: OVERALL ORAL PRODUCTION(CEF,p. 58)

CEF: OVERALL ORAL PRODUCTION(CEF, p. 58)

Why are topics not specified at the higher ability levels? • The CEF manual gives the answers…

Ambiguity of Expectations • Three types of “proficiency” are recognized in CEF “communicative testing” (Pages 180 and 184): • “Emerging competence” in relevant situations. • Competence on tasks in a “relevant syllabus”. • “The generalisable competencies” evidenced by a candidate’s overall performance. • For STANAG 6001, only the last type of generalisable competence is considered “proficiency”.

Ambiguity of Expectations • CEF acknowledges a third “blended” category, between achievement and real-world proficiency, but does not label it. (p. 184.) • STANAG tester training documents label this “in-between” category as “rehearsed performance” or “pro-chievement” ability to distinguish it from unrehearsed, general ability.

Some Other Examples • “Table 2. Common Reference Levels: Self Assessment” (CEF p. 24) • Contains almost no accuracy statements. • “Table 3. Common Reference Levels: qualitative aspects of spoken language use” (CEF pp. 28 and 29) • Contains accuracy statements not only under the column labeled “ACCURACY”, but also interwoven in the descriptions found under the columns labeled “RANGE”, “FLUENCY”, “INTERACTION”, and “COHERENCE”.

Why not combine two CEF scales to match the “standard” format? • This evidently creates too many rating options for the CEF developers. • However, every testing system should decide how to deal with the complexity of the interactions between two factors: • The difficulty of the Communication Tasks tested. • The varying levels of competency demonstrated by the test candidates.

Example # 1 • Consider for instance, the combination of the CEF “Overall Oral Production” scale and the “General Linguistic Range” scale. (pp. 58 and 110) • The “Overall Oral Production” scale has 7 defined levels. • The “General Linguistic Range” has 9 defined levels. • The combination could yield 63 different rating combinations.

Options forReducing Complexity • Select a progressive subset of the possible combinations as major progress milestones. • Conclude as the CEF does that… • It is not “practical” to “use all the scales at all levels”. (p. 192) • The test rating criteria should be linked to the learner’s textbook and defined by criteria that are appropriate to the “requirements of the assessment task concerned”. (p. 193)

The CEF Approach to Handling Language Complexity • Therefore, the CEF suggests… • “Features need to be combined, renamed and reduced into a smaller set of assessment criteria appropriate to the needs of learners”. (p. 193) [Emphasis added] • Test rating criteria should be restricted to those criteria that are appropriate to the “style of the pedagogic culture concerned”. (p. 193) [Emphasis added]

Example # 2 • Compare this approach with how STANAG 6001 deals with rating complexity. • 6 task levels. • 6 content levels. • 6 accuracy levels. • The combination could yield 216 different rating combinations.

STANAG 6001 Approach to Handling Language Complexity • Therefore, STANAG 6001… • Combined, renamed and reduced features into a smaller set of assessment criteria appropriate to the needs of employers. • Reduced rating complexity by aligning each task level with an appropriate level of expanding content areas, and an increasing level of accuracy that correspond to the type of tasks being tested. • Stipulated that (as with other performance standards) all of the task, condition, and accuracy statements for a given level must be satisfied before that level proficiency can be awarded.

STANAG 6001 - Speaking (Summarized) as a Standard LEVEL TASKS CONTEXT/TOPICS ACCURACY All expected of an educated NS Accepted as an educated NS All subjects 5 Tailor language, counsel, motivate, persuade, negotiate Wide range of professional needs Extensive, precise, and appropriate 4 Errors never interfere with communication & rarely disturb Support opinions, hypothesize, explain, deal with unfamiliar topics Practical, abstract, special interests 3 Concrete, real-world, factual Intelligible even if not used to dealing with non-NS 2 Narrate, describe, give directions Intelligible with effort or practice 1 Q & A, create with the language Everydaysurvival 0 Use memorized phrases Random Unintelligible

Technically, STANAG 6001 also adheres to the recommendations of the CEF, because it… • Is a “metasystem”. (CEF, pp. 192 - 196) • Has combined features to create a reduced set of assessment criteria… • That match the tasks being assessed. (CEF, p. 193) • With between 4 and 7 rating levels. (CEF, p. 193) • Meets the needs of “employers” by testing for generalisable, real-world proficiency. (CEF, p. 183)

STANAG 6001 Diverges from the recommendations of the CEF, because it… • Assigns ratings based on employment needs without considering “the needs of the learners concerned” or “the style of the pedagogic culture concerned.” (CEF, p. 193) • Uses criterion-referenced grading of a “task, topics, and accuracy” hierarchy – rather than a norm-referenced scalar analysis. (CEF, p. 185) • Rates ability based on one’s unrehearsed, real-world proficiency in the language being tested.

STANAG 6001 The primary purpose is to test individuals’ general proficiency across a wide range of topics regardless of their course of study. The primary users of the information are employers and administrators. By design, STANAG 6001 is under-specified for measuring step-by-step progress within a specific curriculum. CE Framework of Reference The primary purpose is to check learners’ progress in developing communicative competence within a specific course of study. The primary users of the information are the teachers and students. By design, the CE Framework of Reference is under-specified for testing of general, real-world proficiency. A Summary of the Major Contrasts

These contrasts are not a problem! • No single test or testing framework can meet both the formative needs of learners and the summative needs of employers, so… • Use the CE Framework of Reference for designing curriculum-appropriate achievement and performance tests. • Use the STANAG 6001 assessment as a culminating, independent measure of graduates’ general, real-world ability.

STANAG 6001 Proficiency Scale 5 4 STANAG 6001 focus 3 “General, unrehearsed, real-world proficiency” CEF focus 2 “Competence in a relevant syllabus” 1 “Emerging competency”

What happens when you comparerehearsed performance ratings with unrehearsed proficiency ratings? • Those who can pass an unrehearsed, general proficiency test can also pass a curriculum-based performance test. • Those who can pass a rehearsed performance test may or may not be able to pass a general, unrehearsed proficiency test.

Conclusion • “The solutions to our problems should be as simple as possible, but no simpler.” Albert Einstein • Language tests should match the purpose for which the results will be used. • Use achievement tests for testing mastery of lessons in a textbook. • Use performance tests for checking rehearsed abilities. • Use proficiency tests for determining general, unrehearsed ability in real-world situations.

Aligning Program Goals, Instructional Practices, and Outcomes Assessment