Designing LSP tests

Designing LSP tests Pointers for good practice LSP TESTING: Good Practice Procedure

Designing LSP tests Pointers for good practice Executive summary Ideally a test is written by a small team of writers and reviewers. Subject specialists should be consulted for specific knowledge. Before a test is ready for administration, it passes through four phases of development during which feasibility, validity, authenticity and reliability are determined. When composing an LSP test, it is important to make sure that the skill-based tasks are representative for professional practice. Test tasks are ideally composed of a rubric an subject-specific input materials, contributing to the authenticity. Quality control of a test is an essential step in the writing process. A number of qualitative and quantitative procedures are available.

Designing LSP tests Pointers for good practice Contents Test writing Test timing Test content Test tasks Test analysis Further reading

Test writers Designing LSP tests Pointers for good practice

Designing LSP tests Pointers for good practice Test writing • The people involved in test development: • Test writers: Ideally, two or more people are responsible for designing, writing and revising the test • Test reviewers: One or two reviewers give feedback to the writers • Subject specialists: work in the LSP subject field give input concerning test content and goal • Representative end users: resemble the actual test taking population as closely as possible. make up for the sample population in test piloting

Test timing Designing LSP tests Pointers for good practice

Designing LSP tests Pointers for good practice Test timing Phase 1: Planning Test writers decide on test goals, test format and test content Phase 2: Design Test writers collect material and compose a first draft Test reviewers evaluate and rewrite the draft Phase 3: Development Final draft is piloted to a group of representative testees Final draft is adjusted based on qualitative and quantitative conclusions from pilot Phase 4: Live test Test is ready for use

Designing LSP tests Pointers for good practice Test timing • Phase 1: Planning • Test writers decide on test goals, test format and test content • Goal: Ideally the test goals should be linked to the professional reality. To ensure this, subject specialists might be consulted. Getting an idea of their routine language tasks may help in drawing up the test goals. Research has shown that non representative tasks cause irritation or uncertainty with test takers. • Format: Although computer-based tests are as reliable as their paper-based ones, their limitations and possibilities are different from their paper-based counterparts. • Task type: Depending on the language level, the test goals, the format and the time available, various task types are at hand.

Designing LSP tests Pointers for good practice Test timing Phase 2: Design Test writers collect material and compose a first draft Test reviewers evaluate and rewrite the draft Collecting material In collaboration with subject specialists, the writers collect authentic and representative material. First draft Ideally, the first draft contains a large number of tasks and task types, so the reviewers can select which tasks are taken down to the next phase. Evaluation The reviewers compare the draft to the test goals they had in mind. They check the test for validity (i.e. do we test what we want to test) and authenticity (i.e. does this test represent realistic interactions and situations) and suggest revisions.

Designing LSP tests Pointers for good practice Test timing • Phase 3: Development • The final test draft is now piloted: a group of representative end users take the test in conditions similar or identical to the live test setting. Ideally 30 to 50 respondents are used. This pilot will offer information concerning the… • - authenticity: to what extent does the test include situations / interactions that are meaningful or representative for the test taker? • - validity: to what extent does the test test what it means to test? • - reliability: to what extent do test scores reflect actual language ability? • - feasibility: this concerns practicalities such as timing and rating • … of the test Note: If representative end users cannot be reached, a group of colleagues can also be used. In this case, the test’s reliability cannot be determined, but the feedback will be useful nonetheless.

Designing LSP tests Pointers for good practice Test timing Phase 4: Live tests When the test has been adjusted based on the conclusions from the third phase, it can be taken down to the final phase; live testing. The test is ready for use. If any remarks still arise during the administering, they are reported to the test writers, who keep the remarks in mind for later versions.

Test content Designing LSP tests Pointers for good practice

Designing LSP tests Pointers for good practice Test content The specificity of content in LSP tests is the cause of many debates among researchers. One side of the spectrum states that both content and tasks cannot be too specific, whereas the other extreme advocates that LSP testing does not make sense, since all language has got a specific purpose. Example: A test of English for biomedical science should not use the same material as a test of English for the humanities, even if the required language proficiency is identical. Both sides of the debate do agree on the importance of face validity (i.e. how do test takers perceive a test; how representative do they feel it is). Interviews with representative end users also the importance of using familiar material dealing with familiar topics.

Designing LSP tests Pointers for good practice Test content Example: Whereas writing a reflective essay might be a representative task within the humanities, it is alien to the biomedical sciences. Test takers will respond negatively to tasks they perceive as non-representative. For skill-based exercises the importance of authenticity cannot be overstressed. Make sure that both the content and the context of the task relate to the specific purpose professional reality. Task content: Ask the students to write, speak, read on something within their professional field of expertise. Task context: Clarify the context in which a communicative act is taking place as accurately as possible. If you ask students to present at a conference, state which one and give ample information regarding the setting and audience.

Designing LSP tests Pointers for good practice Test content For knowledge-related exercises, such as grammar or vocabulary exercises the context does not appear to matter as much as for skill-based tasks. Face validity is an important element here as well though; the texts, examples and stimuli should be related to the test takers’ field of expertise. Quote: “LSP testing cannot be about testing for subject specific knowledge. It must be about testing for the ability / abilities to manipulate language functions appropriately in a wide variety of ways. […] No doubt for face validity reasons, the stimuli in such tests will be field related, however.” (Davies, 2001)

Designing LSP tests Pointers for good practice Test content The role of subject specialists Using the expertise of subject specialists is a much contested theme in LSP testing research. In any case, when designing a test for specific purposes within a field outside of your expertise, it is always useful to get in touch with people who are in tune with the specific purpose. They will be able to tell you which tasks and texts are representative and which aren’t.

Test tasks Designing LSP tests Pointers for good practice

Designing LSP tests Pointers for good practice Task types There is a myriad of possible task types, that can be used in LSP testing. Since this overview is restricted to online testing with Curios however, the next pages will only cover those task types that are available through Curios. To ensure task completeness Bachman & Palmer (1996) and Douglas (2001) suggest including the following elements in each task.

Designing LSP tests Pointers for good practice Task types

Bachman & Palmer (1996) suggest a framework of analysis which considers the following: Designing LSP tests Pointers for good practice Task types Also according to Bachman & Palmer (1996, the rater’s manual should include at least the following: Example 3: assessment criteria (taken from TOEFL iBT)

Designing LSP tests Pointers for good practice Task types and Curios • Curios is the Ghent university online testing environment. It can be accessed through Minerva and Zephyr andallows for the following multiple choice task types • Single response: A multiple choice task where only one option is possible. • Multiple response: A multiple choice task where more than one option is possible. • True/False: The student is given a statement and should indicate whether it is correct or not. • Matching: A multiple choice task in which the test taker combines two or more items. • Hotspot: the test taker digitally pinpoints or highlights areas on a picture or in a text.

Designing LSP tests Pointers for good practice Task types and Curios • Curios is the Ghent university online testing environment. It can be accessed through Minerva and Zephyr andallows for the following open answer task types • Text/numeric: As an answer to a question, students fill in words, short sentences or numbers. • Cloze: In a running text one or more words or numbers have been deleted. It is up to the test taker to fill in the gaps. • C-Cloze: A cloze test which includes the first letter of each deleted word. • Extended: Students can be asked to reply to an open question or to produce longer answers.

Designing LSP tests Pointers for good practice Step 1: Accessing Curios Step 2: “Nieuwe vragenreeks” Access Curios via Zephyr of Minerva Each new test starts with this. Click image for video Click image for video Getting started with Step 3a: “Nieuwe vraag” (MC) Step 3b: “Nieuwe vraag” (cloze) Creating a multiple choice question. Creating a cloze question. Click image for video Click image for video

Designing LSP tests Pointers for good practice Step 4: Double checking the scoring Step 5: Publishing the test Always double check the questions using “geavanceerde editeermedthode” Students can only access tests through Minerva or Zephyr. Click image for video Click image for video Getting started with Step 6: Taking the test Step 7: Checking the results Try a test before making it public. Have a look at the scores. Click image for video Click image for video

Test analysis Designing LSP tests Pointers for good practice

Designing LSP tests Pointers for good practice Test analysis • Determining the quality of a language test should take place in the third phase of development, but it should also be a persistent concern of test developers. • When the test has been piloted, qualitative and quantitative analyses can help to improve the reliability and validity of the live test. • In the case of LSP tests, three concepts are of vital importance: • RELIABILITY: reliable scores reflect one’s ability • VALIDITY: valid questions test what is intended to be tested • AUTHENTICITY: authentic tests reflect real-life interactions and situations

Designing LSP tests Pointers for good practice Test analysis Reliability An efficient way to check a test’s reliability, is performing an Item Reliability Analysis. This statistical application indicates the discriminating potential of a test item. In other words: it checks to what degree able students get a hard item right and lesser able student’s don’t. The graph on the right shows the statistical data resulting from a reliability analysis. The column showing the Corrected Item-Total Correlation indicates the reliability of each item. Items scoring within the -.3 ↔ .3 spectrum are considered unreliable and should be removed or rewritten.

Designing LSP tests Pointers for good practice Test analysis Reliability: How to perform an Item Reliability Analysis Enter all the results of all the test takers on all test items in SPSS (available on Athena). The easiest way to do this, is by assigning a score of 1 to a correctly answered question and 0 to an incorrect answer. Click herefor information on quantitative test analysis. Next, click Analyze – Scale – Reliability analysis and indicate the items you want an analysis for. Do not forget to check the box which states “scale if item deleted”. Please click herefor a clip on performing an Item Reliability Analysis.

Designing LSP tests Pointers for good practice Test analysis Validity • The extent to which scores on a test enable inferences to be made which are appropriate, meaningful and useful, given the purpose of the test (i.e.: does the test measure what it intends to measure?). • There are various subclassifications of validity. The most significant ones in this LSP testing project are: • A test has construct validity if scores reflect a theory about a construct. It could be predicted, for example, that two valid tests of listening comprehension would rank learners in the same way, but each would have a weaker relationship with scores on a test of grammatical competence. • A test is said to have content validity if the items or tasks of which it is made up constitute a representative sample of items or tasks for the area of knowledge or ability to be tested. These are often related to a syllabus or course. • Face validity refers to the extent to which a test appears to candidates, or those choosing it on behalf of candidates, to be an acceptable measure of the ability they wish to measure. This is a subjective judgement rather than one based on any objective analysis of the test.

Designing LSP tests Pointers for good practice Test analysis • Determining construct validity implies a thorough knowledge of the construct to be tested. • A construct is a theoretical concept related to linguistic knowledge: i.e. listening comprehension, metacognition or pragmatic competence. • i.e. If you mean to test listening skills and ask students to write an essay about an audiosample, are you then testing receptive or productive skills? • i.e. If you ask students to type an essay within thirty minutes, are you then testing writing skills or typing speed.

Designing LSP tests Pointers for good practice Test analysis The most effective way of determining an LSP test’s content validity is by having interviews with subject specialists. Various interview types are possible to determine whether the test tasks correspond to reality. Note that the interviewer should get the chance to practice his/her interview skills beforehand. Ideally, the pilot settings will resemble the actual conditions as accurately as possible.

Designing LSP tests Pointers for good practice Test analysis Since face validity is a subjective measure imposed by the test takers, only test takers can be the judge of it. During or after the pilot test, ask the representative end users to give a verbal report. There are various ways of going about this. Click herefor information on qualitative test analysis.

Designing LSP tests Pointers for good practice Test analysis • Authenticity • If you have interviewed subject specialists and representative end users have given a verbal report, you will also have a good understanding of a test’s situationalauthenticity (the extent to which a test / task represents real situations) and interactional authenticity (the extent to which a test / task represents realistic conversational interactions). • Note that using material destined for L1-users does not always appear interactionally authentic to test takers, since it depicts interactions that are meaningless to them. • i.e. the abovementioned example of the OET of English for veterinary sciences is not representative for the professional practice of researchers within the field of veterinary sciences. A doctor-patient dialogue is very relevant for students who would like to start working in a practice later.

Further reading Designing LSP tests Pointers for good practice

Designing LSP tests Pointers for good practice • For more info on LSP testing, please consult • ABRASKEVICIUTE, Ausra et al. (2003). Handbook of LSP Examinations. Tut Press. • BACHMAN, Lyle F. (2000). “Modern Language Testing at the Turn of the Century: assuring that what we count counts”. Language Testing. 17(1) • BROADFOOT, Patricia and Paul Black. (2004). “Redefining Assessment? The first the years of Assessment in Education”. Assessment in Education.11(1) • Clapham, C. (2000). "Assessment for academic purposes: where next?" System28(4) • DAVIES, A. (2001). “The logic of testing Languages for Specific Purposes”. Language Testing. 18(2) • DOUGLAS, Dan. (2001). “Language for Specific Purposes assessment criteria: where do they come from?”. Language Testing. 18(2) • DOUGLAS, Dan. (2000). Assessing Languages for Specific Purposes. Cambridge University Press • Dovey, T. (2006). "What purposes, specifically? Re-thinking purposes and specificity • in the context of the ‘new vocationalism’." English for Specific Purposes25(4) • ROEVER, Carsten. (2001) “Web-Based Language Testing” Language Learning & Technology. 5(2) • Hyland, K. (2002). "Specificity revisited: how far should we go now?" English for Specific Purposes21(4) Further reading

Designing LSP tests

Designing LSP tests

Presentation Transcript

Designing and Developing Useful Language Tests

LSP 121

Designing Effective DQs/Tests

Designing Tests and Paper Questions

Designing pre- and post-tests

LSP 120

LSP 120

LSP Accreditation

LSP

LSP 121

LSP 121

LSP 120

LSP 120

LSP 121

LSP

LSP 121

LSP 121

LSP 121