Adventures in Equating Land:

Adventures in Equating Land: Facing the Intra-Individual Consistency Index Monster* *Louis Roussos retains all rights to the title

Overview of Equating Designs and Methods • Designs • Single Group • Random Groups • Common Item Nonequivalent Groups (CING) • Methods • Mean • Linear • Equipercentile • IRT True or Observed

Guidelines for Selecting Common Items for Multiple-Choice (MC) Only Exams • Representative of the total test (Kolen & Brennan, 2004) • 20% of the total test • Same item positions • Similar average/spread of item difficulties (Durans, Kubiak, & Melican, 1997) • Content representative (Klein & Jarjoura, 1985)

Challenges in Equating Mixed-Format Tests(Kolen & Brennan, 2004; Muraki, Hombo, & Lee, 2000) • Constructed Response (CR) scored by raters • Small number of tasks • Inadequate sampling of construct • Changes in construct across forms • Common Items • Content/difficulty balance of common items • MC only may result in inadequate representation of groups/construct • IRT • Small number of tasks may result in unstable parameter estimates • Typically assume a single dimension underlies both item types • Format Effects

Current Research • Number of CR Items • Smaller RMSD with larger numbers of items and/or score points (Li and Yin, 2008; Fitzpatrick and Yen, 2001) • Misclassification (Fitzpatrick and Yen, 2001) • Fewer than 12 items, more score points resulted in smaller error rates • Greater than 12 items, error rates less than 10% regardless of score points • Trend Scoring (Tate, 1999, 2000; Kim, Walker, McHale, 2008) • Rescoring samples of CR items • Smaller bias and equating error

Cont. • Format Effects (FE) • MC and CR measure similar constructs (Ercikan et al., 1993; Traub, 1993) • Males scored higher on MC; females higher on CR ( DeMars, 1998; Garner & Engelhard, 1999) • Kim and Kolen, 2006 • Narrow-range tests (e.g., credentialing) • Wide-range tests (e.g., achievement) • Individual Consistency Index (Tatsuoka & Tatsuoka, 1982) • Detecting aberrant response patterns • Not specifically in the context of mixed-format tests

Purpose and Research Questions Purpose: Examine the impact of equating mixed format tests when student subscores differ across item types. Specifically, • To what extent does the intra-individual consistency of examinee responses across item formats impact equating results? • How does the selection of common items differentially impact equating results with varying levels of intra-individual consistency?

Data • “Old Form” (OL) treated as “truth” • Large-scale 6th grade testing program • Mathematics • 54 point test • 34 multiple choice (MC) • 5 short answer (SA) • 5 constructed response (CR) worth 4 points each • Approx. 70,000 examinees • “New Form” (NE) • Exactly the same items as OL • Samples of examinees from OL

NE (new form) Samples of 3,000 Examinees OL (old form) All Examinees 2006-07 Scoring Test 39 Items 2006-07 Scoring Test 39 Items Both OL and NE contain the exact same items Only difference between the forms are the examinees

Intra-Individual Consistency • Consistency of student responses across formats • Regression of dichotomous item subscores (MC and SA) onto polytomous item subscores (CR) • Standardized residuals • Range from approximately -4.00 to +8.00 • Example: Index of +2.00 • Student subscores on CR under-predicted by two standard deviations based on MC subscores

Samples • Three groups of examinees based on intra-individual consistency index • Below -1.50 (NEG) • -1.50 to +1.50 (MID) • Above +1.50 (POS) • 3,000 examinees per sample • Sampled from each group based on percentages • Samples selected to have same quartiles and median as whole group of examinees

Sampling Conditions • 60/20/20 • 60% sampled from one of the groups (i.e., NEG, MID, POS) • 20% sample from each of the remaining groups • Repeated for each of the three groups • 40/30/30

Common Items • Six sets of common items • MC only (12 points) • CR only (12 points) • MC (4) and CR (8) • MC (8) and CR (4) • MC (4), CR (4), and SA (4) • MC (7), CR (4), and SA (1) • Representative of total test in terms of content, difficulty and length

Equating • Common-item nonequivalent groups design • Item parameters calibrated using Parscale 4.1 • 3-parameter logistic model (3PL) for MC items • 2PL model for SA items • Graded Response Model for CR items • IRT scale transformation • Mean/mean, mean/sigma, Stocking-Lord, and Haebara • IRT true score equating

Equating OL and NE All items shared in common “Common” Items OL NE Equating conducted using only a selection of items treated as common “Truth” established by equating NE to OL using all items as common items

Evaluation • Bias and RMSE • At each score point • Averaged over score points • Classification Consistency

Results: 60% Mid

Results: 40% Mid

In the extreme…

Across the Score Scale: Average Bias

Across the Score Scale: Average RMSE

Across the Score Scale: Misclassification Rates

Classification Consistency: Proficient

Discussion • Different equating results based on sampling conditions • Differences more exaggerated when using common items sets with mostly CR items • Mid 60 most similar to data, small differences across common item selections

Limitations and Implications • Limitations • Sampling conditions • Common item selections • Only one equating method • Implications for future research • Sampling conditions, common item selections, additional equating methods • Other content areas and grade levels • Other testing programs • Simulation studies

Thanks! • Rob Keller • Mike, Louis, Won, Candy, and Jessalyn

Adventures in Equating Land:

Adventures in Equating Land:

Presentation Transcript

Land Reclamation By Sea

Greek Theater

Land Area

Big Question: What adventures do you have through the day?

Land Navigation

Projects Land 75 and Land 125 Battle Management System Industry Brief Bandiana , VIC 22 May 07

Adventures of Ulysses

CONCEPT AND APPROACHES OF LAND EVALUATION

The Adventures of Tom Sawyer

Introduction to Tobin Enterprise Land

Land Development

Environmental Science Chapter 14 “Land”

Land Use and Zoning 101

Lesson 16 A Young Detective’s Adventures: The Deaths of the Three Mrs. Fitches

Land Biomes

Vietnam: The Land

Amphibians

Quiet Lake Maintenance Camp bis Carmacks – 354 km by Yukon Wide Adventures - Thomas de Jager

Presented by : Vincent James NSP Research Group (British Isles)

Big Question: What adventures can you have on a lucky day?

Introduction to Property II