How Should We Assess the Fit of Rasch -Type Models?

How Should We Assess the Fit of Rasch-Type Models? Approximating the Power of Goodness-of-fit Statistics in Categorical Data Analysis Alberto Maydeu-Olivares Rosa Montano

Outline • Introduction • Rasch-Type Models for Binary Data • Rationale of Goodness-of-Fit Statistics • Full Picture • M2, R1 and R2 • Estimating the Power • Empirical Comparisonof R1, R2 and M2 • Numerical Examples • Discussion and Conclusion

Introduction • Two properties of Rasch-Type models • Sufficient statistics • Specific objectivity • Estimation methods • Specific for Rasch-Type models (CML) • General procedures (MML via EM) • Goodness-of-fit testing procedures • Specific to Rasch-Type models • General to IRT or multivariate discrete data models

Introduction • Compare the performance of certain goodness-of-fit statistics to test Rasch-Type models in MML via EM • Binary data • 1PL (random effects) • R1 and R2 for 1PL • M2 for multivariate discrete data

Rasch model and 1PL • Fixed effects • The distribution of ability is not specified • Random effects • Specify a standard normal distribution for ability • The less restrictive definition of specific objectivity still hold

Rationale 1. High-dimensional contingency table C = 2^n cells which n is the number of items. For example, 20 items test C = 2^20 = 1048576 cells To fulfill the rule of thumb >5, at least 1048576*5 sample size is needed.

3. Limited information approach (M2) Pooling cells of the contingency table • When order r = 2, Mr -> M2 • M2 used the univariate and bivariate information • The degree of freedom is • It is statistics of choice for testing IRT models

3. Limited information approach (R1 and R2) • Degree of freedom is n(n-2) • Specific to the monotone increasing and parallel item response functions assumptions • Degree of freedom is (n(n-2)+2)/2 • Specific to the unidimensionality assumption

Estimating the Asymptotic Power Rate • Under the sequence of local alternatives • The noncentrality parameter of a chi-square distribution can be calculated given the df for M2, R1 and R2 • The Kullback-Leibler discrepancy function can be used • The minimizer of DKL is the same as the maximizer of the maximum likelihood function between a “true” model and a null model

Study 1: Accuracy of p-values under correct model • df = Mean; df = ½ Var • Another Study by Montano (2009), M2 is better than R1 and the discrepancies between the empirical and asymptotic rate were not large. • Group the sum scores ->

The degree of freedom is also adjust • An iterative procedure • When appropriate score ranges are used, the empirical rejection rate of R1 should be closely match the theoretical rejection rates. • This should be also done in R2

Study 2: Asymptotic Power to reject a 2PL

Study 3: Empirical Power to reject a 2PL

Study 4: Asymptotic Power to reject a 3PL

Study 5: Asymptotic Power to reject a multidimensional model

Empirical Example 1: LSAT 7 Data • The agreement in ordering between value/df ratio and power

Empirical Example 2: Chilean Mathematical Proficiency Data

Discussion and Conclusions • Generally, M2 is more powerful than R1, R2. • That is, the R1 and R2 which developed specific to Rasch-type models is not superior than the general M2

How Should We Assess the Fit of Rasch -Type Models?