1 / 36

Making Sense of Data from Complex Assessments

Making Sense of Data from Complex Assessments. Robert J. Mislevy University of Maryland Linda S. Steinberg & Russell G. Almond Educational Testing Service FERA November 6, 2001. Buzz Hunt, 1986:. How much can testing gain from modern cognitive psychology?

meira
Télécharger la présentation

Making Sense of Data from Complex Assessments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making Sense of Data fromComplex Assessments Robert J. Mislevy University of Maryland Linda S. Steinberg & Russell G. Almond Educational Testing Service FERA November 6, 2001

  2. Buzz Hunt, 1986: How much can testing gain from modern cognitive psychology? So long as testing is viewed as something that takes place in a few hours, out of the context of instruction, and for the purpose of predicting a vaguely stated criterion, then the gains to be made are minimal.

  3. Opportunities for Impact • Informal / local use • Conceptual design frameworks • E.g., Grant Wiggins, CRESST • Toolkits & building blocks • E.g., Assessment Wizard, IMMEX • Building structures into products • E.g., HYDRIVE, Mavis Beacon • Building structures into programs • E.g., AP Studio Art, DISC

  4. For further information, see... www.education.umd.edu/EDMS/mislevy/

  5. Don Melnick, NBME: “It is amazing to me how many complex ‘testing’ simulation systems have been developed in the last decade, each without a scoring system. “The NBME has consistently found the challenges in the development of innovative testing methods to lie primarily in the scoring arena.”

  6. The DISC Project • The Dental Interactive Simulations Corporation (DISC) • The DISC Simulator • The DISC Scoring Engine • Evidence-Centered Assessment Design • The Cognitive Task Analysis (CTA)

  7. Evidence-centeredassessment design The three basic models

  8. Evidence-centeredassessment design • What complex of knowledge, skills, or other attributes should be assessed? • (Messick, 1992)

  9. Evidence-centeredassessment design Student Model Variables • What complex of knowledge, skills, or other attributes should be assessed? (Messick, 1992)

  10. Evidence-centeredassessment design • What behaviors or performances should reveal those constructs?

  11. Evidence-centeredassessment design • What behaviors or performances should reveal those constructs? Work product

  12. Evidence-centeredassessment design • What behaviors or performances should reveal those constructs? Observable variables Work product

  13. Evidence-centeredassessment design • What behaviors or performances should reveal those constructs? Observable variables

  14. Evidence-centeredassessment design Student Model Variables • What behaviors or performances should reveal those constructs? Observable variables

  15. Evidence-centeredassessment design • What tasks or situations should elicit those behaviors?

  16. Evidence-centeredassessment design Stimulus Specifications • What tasks or situations should elicit those behaviors?

  17. Evidence-centeredassessment design Work Product Specifications • What tasks or situations should elicit those behaviors?

  18. Implications for Student Model SM variables should be consistent with … The results of the CTA. The purpose of assessment: What aspects of skill and knowledge should be used to accumulate evidence across tasks, for pass/fail reporting and finer-grained feedback?

  19. Simplified Version of the DISC Student Model

  20. Implications for Evidence Models • The CTA produced ‘performance features’ that characterize recurring patterns of behavior and differentiate levels of expertise. • These features ground generally-defined, re-usable ‘observed variables’ in evidence models. • We defined re-usable evidence models for recurring scenarios for use with many tasks.

  21. An Evidence Model

  22. Evidence Models: Statistical Submodel • What’s constant across cases that use the EM • Student-model parents. • Identification of observable variables. • Structure of conditional probability relationships between SM parents and observable children. • What’s tailored to particular cases • Values of conditional probabilities • Specific meaning of observables.

  23. Evidence Models: Evaluation Submodel • What’s constant across cases • Identification and formal definition of observable variables. • Generally-stated “proto-rules” for evaluating their values. • What’s tailored to particular cases • Case-specific rules for evaluating values of observables-- Instantiations of proto-rules tailored to the specifics of case.

  24. “Docking” an Evidence Model Student Model Evidence Model

  25. “Docking” an Evidence Model Student Model Evidence Model

  26. Initial Status All .33 Some .33 None .33 Expert .28 Competent .43 Novice .28

  27. Status after four ‘good’ findings All 1.00 Some .00 None .00 Expert .39 Competent .51 Novice .11

  28. Status after one ‘good’ and three ‘bad’ findings All .00 Some .00 None 1.00 Expert .15 Competent .54 Novice .30

  29. “Docking” another Evidence Model Student Model Evidence Model

  30. “Docking” another Evidence Model Student Model Evidence Model

  31. Implications for Task Models Task models are schemas for phases of cases, constructed around key features that ... • the simulator needs for its virtual-patient data base, • characterize features we need to evoke specified aspects of skill/knowledge, • characterize features of tasks that affect difficulty, • characterize features we need to assemble tasks into tests.

  32. Implications for Simulator Once we’ve determined the kind of evidence we need as evidence about targeted knowledge, how must we construct the simulator to provide the data we need? • Nature of problems • Distinguish phases in the patient interaction cycle. • Use typical forms of information & control availability. • Dynamic patient condition & cross time cases. • Nature of affordances • Examinees must be able to seek and gather data, • indicate hypotheses, • justify hypotheses with respect to cues, • justify actions with respect to hypotheses.

  33. Payoff • Re-usable student-model • Can project to overall score for licensing • Supports mid-level feedback as well • Re-usable evidence and task models • Can write indefinitely many unique cases using schemas • Framework for writing case-specific evaluation rules • Machinery can generalize to other uses & domains

  34. Part 2 Conclusion Two ways to “score” complex assessments THE HARD WAY: Ask ‘how do you score it?’ after you’ve built the assessment and scripted the tasks or scenarios. A DIFFERENT HARD, BUT MORE LIKELY TO WORK, WAY: Design the assessment and the tasks/scenarios around what you want to make inferences about, what you need to see to ground them, and the structure of the interrelationships.

  35. Grand Conclusion We can attack new assessment challenges by working from generative principles: • Principles from measurement and evidentiary reasoning, coordinated with... • inferences framed in terms of current and continually evolving psychology, • using current and continually evolving technologies to help gather and evaluate data in that light, • in a coherent assessment design framework.

More Related