The Item Bias Detection of the Reading Tests and the Development of the Item Bank Software

The Item Bias Detection of the Reading Tests and the Development of the Item Bank Software Used for English Reading Courses at King Mongkut’s University of Technology North Bangkok Supalak Nakhornsri, Ph.D. supalak_nak@yahoo.com Department of Languages, King Mongkut’s University of Technology North Bangkok, Thailand

Outline of the Presentation • Background of the study • Objectives • Research Questions • Definitions of Terms • Methodology • Test Development • Item Bias Detection • Conclusion • Results • Item Bank Software

Background of the Study • Trying out a test • The results can be used to make changes to improve the potential usefulness of the test. (Bachman, 2004) • For classroom teachers, it is not advisable to try out a test with the same students who will take it. (Bachman, 2004)

Background of the Study The essential part of analyzing the results of tests for improving their usefulness. (Brown, 2004) • Item Analysis • Classical Item Analysis (IA): calculating descriptive statistics for individual items • Item Response Theory (IRT): a more sophisticated procedure for estimating the statistical characteristics of item. (Bachman, 2004)

Background of the Study • IA vs. IRT • IA • Subpopulation dependent • Implicitly averaged across all ability levels (Bachman, 1990) • IRT • Independent of group of examinees used • Independently estimated test taker ability • Known precision of ability estimates

Background of the Study • Bias Vs. Unfairness • Unfairness: Concerning with test administration • Bias: A type of invalidity (Shepard, 1982)

Background of the Study • Differences among three methods IA IRT Delta-plot Chi-square Three-parameter model Transforming item difficulty into Delta value and plot the value onto the Bivariate Graph (Angoff, 1982) Measuring the frequency of correct or incorrect responses in each ability groups (Shepard, 1982) The presence of some characteristic of an item that results in differential performance for individuals of the same ability but from different ethnic, sex, cultural, or religious groups. (Hambleton et.al, 1991.)

Background of the Study • Delta-plot

Background of the Study • Chi-square

Background of the Study • IRT: Three-parameter model

Background of the Study • Sources of Bias • Gender • In some languages there are differences between the vocabulary items used by women and men. • Women tend to use more of the standard forms than men do. (Homes, 2001)

Background of the Study • Background Knowledge • Schema Theory • Readers can fully comprehend what they read when their schema or prior knowledge, is activated by the words on the page. (Chandavimol, 1998)

Background of the Study • Sources of Bias Studied • Gender: male and female • Prior Knowledge: Faculties they are studying.

Background of the Study • Problem of the Assessment of Reading Courses • Regarding Reading I course, it is an elective course for second – fourth year undergraduate students, King Mongkut’s University of Technology North Bangkok. • The same reading tests are used to all students to evaluate their achievement. Therefore, the important judgment on students’ grade is up to both midterm and final tests. • The quality of test is necessary to be evaluated. Although both Classical IA and IRT can help analyze the quality of test items, the issue of test bias cannot be neglected. • Since students who possesses different background knowledge or even different genders may perform differently in each test item.

Research Questions • 1. Are there any biased items in the reading tests when comparisons are made for Faculties and genders? • 2. What is the quality of the reading tests after being analyzed by the statistical procedures (classical item analysis and Item Response Theory)? • 3. What is the efficiency of the item bank software reported by the English instructors (both native and non-native)?

Research Objectives • 1. To detect item bias in the reading tests when comparisons are made for Faculties (Engineering, Applied Science, Technical Education and Industrial Technology and Management) and genders (males and females); • 2. To analyze the test items in the reading tests used for the English Reading course; • 3. To develop the item bank software for managing the test items.

Significance Regarding the analysis of test quality by both classical IA and IRT, it can provide the following advantages. 1. Diagnostic feedback to test takers on how students performed on individual test tasks can be illustrated. 2. The findings can provide feedback to teachers and course developers relevant to the improvement of instruction. 3. Test developers and test writer will obtain feedback to improve the usefulness of the test, enabling the test developer to: 3.1 Control the characteristics of the total score distribution, specifically the level of difficulty of the test and the dispersion of test scores; 3.2 Increase the internal consistency reliability of test; and 3.3 Diagnose why items fail to function appropriately.

Significance Since IRT can offer the advantages over classical IA, the results obtained from IRT analysis possess the following significances. 1. Item parameter estimates are independent of the group of examinees used. 2. Test takers ability estimates are independent of the particular set of test items used. 3. Precision of ability estimates is known. In terms of the development of the item bank software, the following results will be gained. 1. Test developers can easily build tests to measure objectives of interest. 2. Test developers, within the limits of an item bank, can produce tests with the desired number of test items per objective. 3. If item banks consist of content – valid and technically sound items, test quality will usually be better than test developers could produce if they were to prepare the test items themselves.

Significance The item bias detection can lead to the following advantages. 1. Having a test writer provide the item containing content that is not different or unfamiliar to all students. 2. Having a student gets the item correct or incorrect by his or her true ability. 3. Test writers can provide the content of the item reflecting information and/or skills within the educational background of all students. 4. Test writers can provide clues included in the item that would not facilitate the performance of students from one faculty over another. 5. Test writers can provide adequacies or clarity in the test instructions, item stem, keyed response, or distractors.

Definitions of Terms • 1. Item bias detection: Item bias detection in this study refers to the detection of the presence of some characteristic of an item that results in differential performance for individuals of the same ability when comparisons are made in the following conditions: • 1.1 when comparisons are made for the four Faculties: Engineering, Applied Science, Technical Education and Industrial Technology and Management • 1.2 when comparisons are made for gender: males and females • 2. Item bias detection methods: This study will detect the item bias according to both Classical IA and IRT. • 2.1 Regarding Classical IA, Delta Plot method and chi square technique are employed. • 2.2 In terms of IRT, three-parameter model is used. Accordingly, item parameters that vary across groups indicate Differential Item Functioning (DIF) or Item Bias. • 3. Item analysis: This study will analyze the tests by being based on two main test theories, Classical Item Analysis (IA) and Item Response Theory (IRT). The indices obtained from the two theories are as follows: • 3.1 The indices obtained from Classical IA are: Item difficulty and Item discrimination. • 3.2 The indices obtained from IRT are: a-parameter (discrimination), b-parameter (difficulty) and c-parameter (guessing). • 4. Text readability: Readability index derived from Fry Readability formula and Flesch Reading Ease formula. • 5. The efficiency of the item bank software: The efficiency of the item bank software is from the opinions of the English instructors who use the software. They are required to respond to the questionnaire. In order to triangulate the data, the retrospective method using the semi-structured interview will be conducted with them.

Methodology Scope of the study 1. Population 1.1 In terms of analyzing the quality of the reading tests, the population is second-fourth year undergraduate students at King Mongkut’s University of Technology North Bangkok in the first semester of the academic year 2008. The students study in the Faculties of Engineering, Applied Science, Technical Education and Industrial Technology and Management. Typically, most are male, aged between 18 – 22. 1.2 Regarding the efficiency of the item bank software, the population in this study is all the English instructors (both native and non-native instructors) from the Department of Languages, Faculties of Applied Arts, King Mongkut’s University of Technology North Bangkok. 2. Reading tests Reading tests used in this study are the tests written for evaluating students’ learning achievement for the Reading I course. The tests are the ones used for both midterm and final tests in the first semester of the academic year 2008.

Methodology • Phase I: The development and validation of the research instruments. The instruments used in this study can be divided into three main sets: 1) Reading Tests and 2) Item bank software 3) the questionnaire and the Retrospective Method. • Phase II: The implementation of the study • Phase III: Analysis of the data and report writing

Methodology • Test Development Phase 1:The development and validation of the research instruments. Stage 1: Design Stage 2: Operationalization Stage 3: Test administration

Bloom’s Taxonomy • Course Objectives • Preview: Previewing to get an idea of what you will find in the text • Scanning: Looking for specific information • Skimming: Getting the general sense of a passage • Using vocabulary knowledge for effective reading: Guessing word meaning in context • Making inferences: Guessing about the text or the writer’s idea when some ideas are not directly stated. • Finding topics: Identifying topics

Table of Test Specification for Reading I Midterm Test Midterm score (30%) = Raw score (Full point is 50) X 0.6

Proportion of the Test Items (based on Boom’s Taxonomy)

Phase II: The implementation of the study 1. Research design 1.1 This study is conducted as a Research and Development (R&D) project. Hence, the findings, which will be obtained from the systemic and objective analyzing method, can lead to the development of the quality of the language assessing instruments including an advance technology like item bank software. 1.2 The population of this study consists of two main groups: 1.2.1 The population used for trying out the tests in order to analyze the test quality is second-fourth year undergraduate students from the Faculties of Engineering, Applied Science, Technical Education and Industrial Technology and Management. . 1.2.2 The population asked to evaluate the efficiency of the item bank software is all the English instructors (both native and non-native instructors) from the Department of Languages, Faculties of Applied Arts. 6 instructors will be randomly selected. Each instructor will be recorded about 10–15 minutes. All the reports will be transcribed. The data will be coded and decoded by the researchers.

2. Research procedures 2.1 All the students who enroll in the Reading course will be assigned to have the reading tests written for midterm and final tests. 2.2 The students’ answer sheets will be collected to analyze the test quality. The values obtained from the analysis are: the text readability, test reliability, item difficulty and item discrimination (from Classical IA), a-parameter, b-parameter and c-parameter (from IRT), item bias values obtained from Delta Plot method, Chi Square technique and 3-parameter model IRT. 2.3 The values obtained from 2.2 will be used to create the item bank software. 2.4 The English instructors will be asked to use the item bank software. Then the questionnaire will be given to evaluate its efficiency. A week after trying out the software, the retrospective method will be conducted in order to collect information about the efficiency of the software.

Phase III: Data Analysis In order to answer the research questions, the data will be analyzed according to the following techniques. Research questions: 1. Are there any biased items in the reading tests when comparisons are made for Faculties and genders? 1. Mathcad will be used to detect the item bias by means of Delta Plot method and IRT. 2. SPSS will be implemented to analyze the item bias by means of Chi Square technique. Research questions: 2. What is the quality of the reading tests after being analyzed by the statistical procedures (classical item analysis and Item Response Theory)? 1. Regarding Classical IA, SPSS will be used to analyze the tests. Accordingly, test reliability (KR-20), item difficulty and item discrimination will be obtained. 2. In terms of IRT, XCALIBRE will be used. The indices obtained are a-, b- and c- parameters. Research questions: 3. What is the efficiency of the item bank software reported by the English instructors (both native and non-native)? 1. In terms of the questionnaire used for evaluating the efficiency of the item bank software, mean and standard deviation are calculated. 2. Regarding the retrospective interview, the mode is used as a descriptive technique to examine the English instructors’ opinions. The highest frequency of the opinions will be reported.

Research questions: 1. Are there any biased items in the reading tests when comparisons are made for Faculties and genders? 1. Mathcad will be used to detect the item bias by means of Delta Plot method and IRT. Delta-plot • Major Axis: Y = bx+a • slope b=

Research questions: 1. Are there any biased items in the reading tests when comparisons are made for Faculties and genders? 1. Mathcad will be used to detect the item bias by means of Delta Plot method and IRT. Delta-plot • constant a =

Research questions: 1. Are there any biased items in the reading tests when comparisons are made for Faculties and genders? 1. Mathcad will be used to detect the item bias by means of Delta Plot method and IRT. Three-Parameter Model Analyze the test items in order to get the three parameters:

Research questions: 1. Are there any biased items in the reading tests when comparisons are made for Faculties and genders? 1. Mathcad will be used to detect the item bias by means of Delta Plot method and IRT. Three-Parameter Model Calculate the area of Item Characteristic Curve of each item.

Research questions: 1. Are there any biased items in the reading tests when comparisons are made for Faculties and genders? 1. Mathcad will be used to detect the item bias by means of Delta Plot method and IRT. Three-Parameter Model Compare the area of Item Characteristic Curve of the same item obtained from different groups.

Research questions: 1. Are there any biased items in the reading tests when comparisons are made for Faculties and genders?2. SPSS will be implemented to analyze the item bias by means of Chi Square technique.

Results • Test Quality: IA Vs. IRT IA Test Reliability = .8893 IRT Test Reliability = .884 IA Item Selecting Criteria Difficulty Level = 0.2 -0.8 Discrimination Index = .20 up IRT Item Selecting Criteria a-parameter (Discrimination) = a> 0.50 b-parameter (Difficulty) = b + 3.0 c-parameter (Guessing) = c<0.30 Number of good items 36 of 50 items Difficulty Level ranged from .26 - .80. Discrimination Index was between .21 - .54 Number of good items 49 of 50 items a-parameter (Discrimination) = .56 – 1.37 b-parameter (Difficulty) = -2.41 – 3.00 c-parameter (Guessing) = .17 - .26

Results • Comparison was made for faculty

Results • Comparison was made for faculty Biased Item Characteristics (faculty)

Results • Comparison was made for faculty Reading Ability Levels (faculty)

Results Biased Reading Topics (faculty) • Comparison was made for faculty

Results • Comparison was made for faculty Biased Text Types (Faculty)

Results • Comparison was made for gender Compare Gender

Results • Comparison was made for gender

The Item Bias Detection of the Reading Tests and the Development of the Item Bank Software

The Item Bias Detection of the Reading Tests and the Development of the Item Bank Software

Presentation Transcript

Formative Assessment Item Bank

What determins the price of an item?

Menu Item Recognition Front of the House

Designing Good Tests: Item Analysis is Part of the Equation!

Discrete-item tests

Effective Use of the Vermont Item Bank in the Grade Two Classroom “The UnHappy Giant”

Item Writing for CMAS Item Bank

Item Development

The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning

Detection of Item Degradation

Creating An Exam Using the Item Bank

USING THE ITEM INFORMATION REPORTS

Application Keying Scope of the work item

LOOK FOR THE ITEM

Developing the Tests for NCLB: No Item Left Behind

IDEAL Software + Item Bank

The Georgia Formative Item Bank

Item 3.3.2 of the Agenda

Item 10 of the Provisional Agenda