Create Presentation
Download Presentation

Download

Download Presentation

Reliability and Item Response Modelling: Myths, Observations and Applications

423 Views
Download Presentation

Download Presentation
## Reliability and Item Response Modelling: Myths, Observations and Applications

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**1. **Reliability and Item Response Modelling: Myths, Observations and Applications Raymond J. Adams
University of Melbourne

**2. **Overview What is reliability?
Does reliability have a role in an IRT context
JML, MML, CML
How reliable can a test be?
Is high reliability important?

**3. **Classical Reliability Idea introduced by Spearman (1904)
…accidental errors attenuated relations between observed scores
Reliability Coefficient (Spearman, 1910)
The correlation between one half and the other half of several measures of the same thing

**4. **Spearman’s Definition Under the classical model: Observed score is true score plus error
Reliability is true variance divided by observed variance

**5. **An aside Standard formulae are estimates of this under certain assumptions
Kuder-Richardson formula 20: KR-20
Cronbach’s alpha
Have features of what I describe in the following

**6. **Properties of RX -- 1

**7. **Properties of RX -- 2

**8. **Properties of RX -- 3

**9. **Properties of RX -- 4

**10. **What about reliability and IRT? Person separation reliability (Wright & Stone, 1979)
Assuming an unbiased estimator of ability

**11. **Some features of Person Separation Reliability

**12. **Properties of Person Separation Reliability Requires an unbiased ability estimate
Warm, JML?, not EAP
Has all properties of Spearman’s definition
Implications
Variance estimates are biased
Correlations are biased
Loss of precision in population parameters is hidden

**13. **How reliable can a test be?

**14. **Measurement Error Design Effect

**15. **Increasing the Accuracy of the Estimate of Group Means

**16. **Reliability and Fit

**17. **Reliability and Fit

**18. **Summary so far Person separation and classical reliability are analogous
Reliability doesn’t describe the accuracy of individual’s measures
Reliability describes biases in population parameter estimates when based upon fallible measures
For many applications unreliability can be compensated for by larger samples
Reliability doesn’t depend on fit
Is reliability required for validity…I don’t think so

**19. **Reliability and Marginal IRT Models Abilities often not estimated
Population parameters are directly estimated from item responses
Reliability as ratio of true to estimated variance is meaningless
If abilities are estimated (EAP) they are biased
The observed variance is less than the latent variance
Reliability as ratio of true to estimated variance is greater than one

**20. **Expected a-posteriori Predictions The EAP is the mean of the posterior distribution
The variance of the posterior is used to represent uncertainty
EAP can be viewed as predictions
The posterior variance is uncertainty in that prediction

**21. **EAP Reliability -- 1 Mislevy, Beaton, Kaplan and Sheehan (1992) argued that reliability can be viewed as the amount by which the measurement process has reduced uncertainty in the prediction of each individual’s ability

**22. **EAP Reliability -- 2 rE is an individual level reliability that explains how much we have improved the prediction of this individual’s ability over assuming they were randomly sampled from the population and no item responses were observed

**23. **EAP Reliability -- 3

**24. **EAP Reliability -- 4 Adams (2005) shows that under the marginal model the variance in the direct estimate of the mean is:

**25. **Properties of EAP Reliability Shares all of the characteristics of person separation reliability
EAP-Reliability describes biases in population parameter estimates when based upon fallible measures
EAP-Reliability doesn’t depend upon fit
EAP-Reliability doesn’t describe the accuracy of individual’s measures – it describes how much a prediction has been improved
For many applications unreliability can be compensated for by larger samples

**26. **Reliability: What is it good for? Evidence of fit to the IRT model?
Evidence of test validity?
Information about the accuracy of individual’s estimates?

**27. **Measurement Error Design Effect

**28. **Examples

**29. **Reliability and Design Effect: The Functional Relationship

**30. **Conclusion Limited importance of reliability:
Doesn’t describe accuracy of measurement of individuals
Doesn’t indicate fit or validity
Can be compensated for by increased samples (if analyses done correctly, another story)
Perhaps most valuable as an indicatory of loss or precision due to the test design