1 / 112

Statistical Considerations for Educational Screening & Diagnostic Assessments

Statistical Considerations for Educational Screening & Diagnostic Assessments. A discussion of methodological applications which have existed in the literature for a long time and are used in other disciplines but are emerging more now in education. Yaacov Petscher, Ph.D.

oni
Télécharger la présentation

Statistical Considerations for Educational Screening & Diagnostic Assessments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Considerations for Educational Screening & Diagnostic Assessments A discussion of methodological applications which have existed in the literature for a long time and are used in other disciplines but are emerging more now in education Yaacov Petscher, Ph.D. Florida Center for Reading Research Florida State University

  2. Discussion Points • Assessment Assumptions • Contexts of Assessments • Statistical Considerations • Reliability • Validity • Benchmarking • “Disclaimer” • Focusing on Breadth not Depth • Based on applied contract and grant research • One slide of equations

  3. Assumptions of Assessment - Researchers Constructs exist but we can’t see them Constructs can be measured Although we can measure constructs, our measurement is not perfect There are different ways to measure any given construct All assessment procedures have strengths and limitations

  4. Assumptions of Assessment - Practitioner Multiple sources of information should be part of the assessment process Performance on tests can be generalized to non-test behaviors Assessment can provide information that helps educators make better educational decisions Assessment can be conducted in a fair manner Testing and assessment can benefit our educational institutions and society as a whole

  5. Contexts of Assessments • Instructional • Formative • Interim • Summative • Research • Individual Differences • Group Differences (RCT) • Growth • Legislative Initiatives • NCLB • Reading First • Race to the Top • Common Core

  6. Common Core Adoption

  7. PARCC

  8. Smarter Balanced

  9. Within Common Core • USDOE • PARCC Assessments • Smarter Balanced Assessments • Reading for Understanding Assessments • I3 Assessments • Private Sector

  10. Underlying “Code” of Assumptions Researcher Constructs exist but we can’t see them Constructs can be measured Although we can measure constructs , our measurement is not perfect There are different ways to measure any given construct All assessment procedures have strengths and limitations Practitioner Multiple sources of information should be part of the assessment process Performance on tests can be generalized to non-test behaviors. Assessment can provide information that helps educators make better educational decisions Assessment can be conducted in a fair manner. Testing and assessment can benefit our educational institutions and society as a whole.

  11. Statistical Considerations - Reliability • Stability, accuracy, or consistency of test scores • Many types • Internal consistency • Retest • Parallel-form • Split-half • Should not be viewed as interchangeable • Once could have very high stability but very poor internal consistency • Date of Birth/Height/SSN

  12. Statistical Considerations - Reliability T X e Most frequently used framework is classical test theory What does this assume?

  13. Benefits of IRT • Puts persons and individuals on the same scale • CTT looks at total score by p-value (difficulty) • Can result in shorter tests • CTT reliability increases with more items • Can estimate the precision of scores at the individual level • CTT assumes error is the same

  14. Item Difficulty by Total Score Decile Groups

  15. Item Difficulty by Ability

  16. Items Don’t Always Do What We Want

  17. Item Information

  18. Test Information – Standard Error

  19. Precision/Reliability

  20. Statistical Considerations - Reliability • While precision improves on the idea of reliability, can precision be improved? • Account for context effects (Wainer et al., 2000) • Petscher & Foorman, 2011 • Account for time (Verhelst, Verstralen, & Jansen, 1997) • Prindle, Petscher, & Mitchell, 2013

  21. Statistical Considerations - Reliability • Context effects • Any influence or interpretation that an item may acquire as a result of its relationship to other items • Greater problem in CAT due to unique testing • Emerges as an item and passage level problem

  22. Statistical Considerations - Reliability Common stimulus

  23. Statistical Considerations - Reliability “If several questions within a test a test are experimentally linked so that the reaction to one question influences the reaction to another, the entire group of questions should be treated preferably as an ‘item’ when the data arising from application of split-half or appropriate analysis-of-variance methods are reported in the test manual” APA Standards of Educational and Psychological Testing (1966)

  24. Expressed in IRT

  25. Study 1Reading Comprehension in Florida

  26. Precision – After 3 passages

  27. FAIR Technical Manual

  28. Simulations are all well and good… How does accounting for item dependency improve testing in real world?

  29. RCT • N ~= 800, randomly assigned to testing condition • Control was current 2pl scoring • Experimental was unrestricted bi-factor • Evaluate • Precision • # of passages • Prediction to state achievement

  30. What this suggests “Newer” models help us to more appropriately model the data Precision/reliability are improved just by modeling the context effect Improve the efficiency and precision of a computer-adaptive test by modeling the item-dependency

  31. Study 2Morphology CAT

  32. Accounting for Time • Somewhat similar to the item dependency model • IRT models are concerned with accuracy • What about fluency? • CBM (DIBELS, AIMSweb, easyCBM) • Brief assessments (TOWRE, TOSREC, etc) • Prindle, Petscher, Mitchell (2013) • N = 200 • Word knowledge test • Limited to 60 sec • Compared 1pl with a 1pl-response time models

  33. Results 1pl marginal α = .80 1pl-rt marginal α = .87

  34. What this suggests • Accounting for response time of items can improve precision for most participants • Limitations • More difficult to do with younger children • Requires computer delivery to record accuracy and time • Cannot do with connected text

  35. Validity

  36. Statistical Considerations – Factor Validity • Assessments are measures of hypothetical constructs • Assessments are measured with error • Use latent variable to leverage the common variance • How is this modeled? • Unidimensional • Multidimensional • Three illustrations • Petscher & Foorman, 2012 (Syntactic Awareness) • Kieffer & Petscher, 2013 (Morphology/Vocabulary) • Justice, Petscher, & Pentimonti, 2013 (Early Literacy)

  37. Study 1Syntactic Awareness

  38. Distribution of Ability

  39. Precision (reliability) of Ability Scores

  40. Predictive Validity of Factor Scores

  41. Study 2Morphological Awareness/Vocabulary

  42. Morphological Awareness (MA) predicts Reading Comprehension (RC) For a while, we have known that MA is correlated with reading comprehension (e.g., Carlisle, 2000; Freyd & Baron, 1982; Tyler & Nagy, 1990) MA RC

More Related