1 / 26

Estimating Growth when Content Specifications Change:

Estimating Growth when Content Specifications Change:. A Multidimensional IRT Approach Mark D. Reckase Tianli Li Michigan State University. The Problem. State curriculum frameworks often change from one grade to the next reflecting the addition of new instructional content.

krysta
Télécharger la présentation

Estimating Growth when Content Specifications Change:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Estimating Growth when Content Specifications Change: A Multidimensional IRT Approach Mark D. Reckase Tianli Li Michigan State University

  2. The Problem • State curriculum frameworks often change from one grade to the next reflecting the addition of new instructional content. • For example, at grade 7 algebra may be introduced as an instructional goal. • At grade 6, algebra is not an important component of the curriculum. • Tests at the two grades reflect the instructional content so the 6th grade test does not include algebra and the 7th grade test does. • How can the score scales of these tests be linked?

  3. Research Questions • What do changes on the linked score scale mean, when the scale is produced using the usual unidimensional IRT models? • Can multidimensional IRT be used to form vertical scales? If so, how do the results compare to the unidimensional results?

  4. The Approach • State testing data were analyzed using multidimensional IRT to develop a realistic model for the test data at two grade levels. • The results of the real data analyses were idealized to create the specifications for simulating the tests at two grade levels. • Simulate data with known structure to determine how unidimensional and multidimensional procedures function.

  5. The Simulated Data Design • Grade 6 – two major constructs • Arithmetic • Problem Solving • Grade 7 – three major constructs • Arithmetic • Problem Solving • Algebra

  6. Simulated Test Structure Note: The numbers in parentheses are the common items between the two forms of the tests.

  7. Mean Vectors at each Grade Level Note: Values in parentheses are the observed means from the simulated data

  8. Covariance Matrices Covariance Matrix for Grade 6 Covariance Matrix for Grade 7 Note: Values in parentheses are estimated from the simulated data.

  9. Orientation of Items

  10. Effect Size Built into Data

  11. Unidimensional Basisfor Comparison • Imagine that the full set of 70 items from both test levels are administered to the students at both grade levels. • The matrix of 2000 + 2000 students from the two grades by 70 items can be analyzed with the unidimensional models to serve as a basis for comparison for the vertical scaling result. • Analyze the matrix using 2pl and Rasch model.

  12. 2PL Solution

  13. Rasch Model Solution

  14. Vertical Scaling Analysis • Common-item concurrent calibration • BILOGMG • Off grade items coded as not reached • Both 2pl and Rasch model used for analysis • Determine effect size of difference in mean of two grade levels

  15. Vertically Scaled Effect Sizes

  16. Vertically Scaled Effect Sizes • Linked effect size is smaller than full data effect size. • Rasch effect size is less than 2pl effect size. • Full data set effect size is less than modeled effect size.

  17. Alternative Linking Method • Common-item, separate calibration • Common item parameter relationship was poor

  18. MIRT Analysis • Full data analysis with TESTFACT • Three dimensional analysis • Determine effect size for each dimension • Correlate each estimated q with the generating qs to determine meaning of the results.

  19. MIRT Effect Sizes

  20. Correlation between Trueand Estimated qValues

  21. Interpretation of MIRT Solution • Results are difficult to interpret because of the default procedures in TESTFACT. • Solution needs to be rotated to have axes align with content dimensions. • Current solution shows that q1is related to algebra and shows the big algebra effect. • q2is a combination of arithmetic and problem solving with the emphasis on problem solving. • Most likely it has the sign of the a-parameters reversed.

  22. Concurrent MIRT Analysis • Use concurrent calibration of data from the two grade levels. • Three dimensional solution • No rotation • Determine effect sizes and correlations with true q values.

  23. Concurrent MIRT Calibration

  24. Concurrent MIRT Calibration

  25. Concurrent MIRT Calibration • Scale on Dimension 3 is reversed and it has a large effect size (algebra). • Dimension 1 is most related to arithmetic and problem solving with a moderate effect size. • Dimension 2 is moderately related to algebra and has a large effect size. • The overall result gives a reasonable estimate of effects, but the dimensions need to be rotated to match the constructs.

  26. Conclusions • Unidimensional linking of the two level tests underestimate the effect size. • Rasch model gives a smaller effect size than the two parameter logistic model. • MIRT solution shows promise. • Need to determine how to rotate solution to match constructs. • TESTFACT has problems converging on estimates because of mismatch between assumptions and reality.

More Related