Identification of Misfit Item Using IRT Models Dr Muhammad Naveed Khalid

Identification of Misfit Item Using IRT Models Dr Muhammad Naveed Khalid

Outline • Item Response Theory • Model Fit • Fit Procedures • Issues and Limitations • Lagrange Multiplier (LM) Test • Simulation Design • Results • Conclusions

Item Response Theory • Item response theory (IRT) also known as latent trait theory, strong true score theory, or modern mental test theory, is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. • Some well documented advantages over CTT are • Invariance Item and Ability Estimates • Computer Adaptive Testing • Equating • Development of Item Bank • Reliability

Model Fit • IRT models are based on a number of explicit assumptions. • Uni-dimensionalty: Assumption entails that the item/test should measure only one ability, trait or construct. • DIF (MI): The assumption entails that the item responses can be described by the same parameters in all sub-populations. • ICC: The shape of item response function which describes the relation between the latent variable and the observable responses to items is invariant. • Local Independence:The local independence, assumes that responses to different items are independent given the latent trait variable value. • Speededness: The score-oriented perspective focuses on the effect of speededness on examinees’ test scores, while the fairness-oriented perspective focuses on the degree to which speededness adversely affects some examinees relative to others.

Consequences of Misfit • Yen (1981) and Wainer & Thissen (1987) have shown inadequacy of model-data fit have adverse consequences such as • Biased ability estimates • Unfair ranks • Wrongly equated scores • Validity

Fit Procedures • The fit of item response theory models can be evaluated by the computation of residuals and the associated test statistics. Chi – Square Statistics • Tests of the discrepancy between the observed and expected frequencies. • Pearson-Type Item-Fit Indices (Yen, 1984; Bock, 1972). • Likelihood Ratio Based Item-Fit Indices (McKinley & Mills, 1985).

Issues and Limitations • Glas and Suarez Falcon (2003) note that the standard theory for chi-square statistics does not hold in the IRT context because the observations on which the statistics are based do not have a multinomial or Poisson distribution. • Glas and Suarez Falcon (2003) have also criticized these procedures for failing to take into account the stochastic nature of the item parameter estimates. • Orlando and Thissen (2000) argued that because the observed proportions correct are based on model-dependent trait estimates, the degrees of freedom may not be as claimed.

Continue’d • The problem of huge power in large samples. • The fact that they lose their validity when the model is grossly violated. • The fact that they do not directly reveal the impact of the model violation for the envisioned application. • They do not provide diagnostic information.

Lagrange Multiplier (LM) Test • Glas(1999) proposed the LM test to the evaluation of model fit. • The LM tests are used for testing a restricted model against a more general alternative. • LM test is based on the evaluation of the first-order partial derivatives of the log-likelihood function of the general model, evaluated using the maximum likelihood estimates of the restricted model. Consider a null hypothesis about a model with parameters This model is a special case of a general model with parameters

LMItem Fit Statistics DIF LOC ICC Null Model Alternative Model Null Model Alternative Model Alternative Model Null Model

Simulation Design • The 1-PL,2-PL & 3-PL Model is used for generation and calibration. • Test length (10, 20, 40) and examinee sample size (100, 400,1000). • Item difficulty and discrimination parameters were drawn from standard normal and log normal distribution respectively. • Ability parameters were drawn from a standard normal distribution. • The effect size, degree of misfit, was varying as 0.5, 1.0. • The number of misfit items varies in each test from 10% to 40%. • Nominal significance level of 5 % was used. • 100 replications were carried out in each condition of study.

The power and Type I error by test length, effect size and sample size under Rasch model

An Empirical Example

Conclusions • The fit statistics have known asymptotic null distribution. • The fit statistics have sound statistical properties in terms of Power and Type 1 error rates. • LM (MI), LM (LI) and LM (ICC) statistics have detection rates in ascending order, respectively. • 1PL, 2PL and 3PL have Power in ascending order, respectively. • These fit indices also provide a measure of effect size. Effect size has practical advantage to gauge the severity of misfit. • The performance of these indices less deteriorates in the presence of large misfitting items. • The sample sizes, test length, degree of misfit are potential factors which have influence on Type 1 error rates and Power.

Thanks for Kind Attention & Questions

Identification of Misfit Item Using IRT Models Dr Muhammad Naveed Khalid

Identification of Misfit Item Using IRT Models Dr Muhammad Naveed Khalid

Presentation Transcript

Estimating IRT models with - gllamm -

Item Unique Identification IUID

Item Unique Identification (IUID)

Muhammad Asif Dr . Muhammad Haroon Usmani Ch . Muhammad Hanif

Item Unique Identification (IUID)

Dr Muhammad Raza

Applications of IRT Models

Dr. Khalid Al-Mobaireek King Khalid University Hospital

Item Unique Identification (IUID)

Estimation of Item Response Models

Item Unique Identification (IUID) 101

Identification of overlapping biclusters using Probabilistic Relational Models

Identification of overlapping biclusters using Probabilistic Relational Models

Naveed Akram

Item Unique Identification (IUID)

Detection of Differential Item/Test Functioning (DIF/DTF) Using IRT

Item Unique Identification (IUID) Using the Mark

Dr. Khalid Farooq kayfarooq@gmail

Identification of Wiener models using support vector regression

Item Response Theory (IRT) Models for Questionnaire Evaluation: Response to Reeve

Introduction to Item Response Theory (IRT)

Presentation on Reforms in Higher Education by Dr. Muhammad Khalid Pervaiz