Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Reliability and Item Response Modelling: Myths, Observations and Applications PowerPoint Presentation
Download Presentation
Reliability and Item Response Modelling: Myths, Observations and Applications

Reliability and Item Response Modelling: Myths, Observations and Applications

423 Views Download Presentation
Download Presentation

Reliability and Item Response Modelling: Myths, Observations and Applications

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

    1. Reliability and Item Response Modelling: Myths, Observations and Applications Raymond J. Adams University of Melbourne

    2. Overview What is reliability? Does reliability have a role in an IRT context JML, MML, CML How reliable can a test be? Is high reliability important?

    3. Classical Reliability Idea introduced by Spearman (1904) accidental errors attenuated relations between observed scores Reliability Coefficient (Spearman, 1910) The correlation between one half and the other half of several measures of the same thing

    4. Spearmans Definition Under the classical model: Observed score is true score plus error Reliability is true variance divided by observed variance

    5. An aside Standard formulae are estimates of this under certain assumptions Kuder-Richardson formula 20: KR-20 Cronbachs alpha Have features of what I describe in the following

    6. Properties of RX -- 1

    7. Properties of RX -- 2

    8. Properties of RX -- 3

    9. Properties of RX -- 4

    10. What about reliability and IRT? Person separation reliability (Wright & Stone, 1979) Assuming an unbiased estimator of ability

    11. Some features of Person Separation Reliability

    12. Properties of Person Separation Reliability Requires an unbiased ability estimate Warm, JML?, not EAP Has all properties of Spearmans definition Implications Variance estimates are biased Correlations are biased Loss of precision in population parameters is hidden

    13. How reliable can a test be?

    14. Measurement Error Design Effect

    15. Increasing the Accuracy of the Estimate of Group Means

    16. Reliability and Fit

    17. Reliability and Fit

    18. Summary so far Person separation and classical reliability are analogous Reliability doesnt describe the accuracy of individuals measures Reliability describes biases in population parameter estimates when based upon fallible measures For many applications unreliability can be compensated for by larger samples Reliability doesnt depend on fit Is reliability required for validityI dont think so

    19. Reliability and Marginal IRT Models Abilities often not estimated Population parameters are directly estimated from item responses Reliability as ratio of true to estimated variance is meaningless If abilities are estimated (EAP) they are biased The observed variance is less than the latent variance Reliability as ratio of true to estimated variance is greater than one

    20. Expected a-posteriori Predictions The EAP is the mean of the posterior distribution The variance of the posterior is used to represent uncertainty EAP can be viewed as predictions The posterior variance is uncertainty in that prediction

    21. EAP Reliability -- 1 Mislevy, Beaton, Kaplan and Sheehan (1992) argued that reliability can be viewed as the amount by which the measurement process has reduced uncertainty in the prediction of each individuals ability

    22. EAP Reliability -- 2 rE is an individual level reliability that explains how much we have improved the prediction of this individuals ability over assuming they were randomly sampled from the population and no item responses were observed

    23. EAP Reliability -- 3

    24. EAP Reliability -- 4 Adams (2005) shows that under the marginal model the variance in the direct estimate of the mean is:

    25. Properties of EAP Reliability Shares all of the characteristics of person separation reliability EAP-Reliability describes biases in population parameter estimates when based upon fallible measures EAP-Reliability doesnt depend upon fit EAP-Reliability doesnt describe the accuracy of individuals measures it describes how much a prediction has been improved For many applications unreliability can be compensated for by larger samples

    26. Reliability: What is it good for? Evidence of fit to the IRT model? Evidence of test validity? Information about the accuracy of individuals estimates?

    27. Measurement Error Design Effect

    28. Examples

    29. Reliability and Design Effect: The Functional Relationship

    30. Conclusion Limited importance of reliability: Doesnt describe accuracy of measurement of individuals Doesnt indicate fit or validity Can be compensated for by increased samples (if analyses done correctly, another story) Perhaps most valuable as an indicatory of loss or precision due to the test design