1 / 41

A “Sweet Approach” to Understanding Basic Principles of Educational Measurement

A “Sweet Approach” to Understanding Basic Principles of Educational Measurement. Assessing the Performance of Chocolates. Objectives. At the end of instruction, participants will Describe sources of error that threaten the reliability and validity of performance assessment measures

kcalhoun
Télécharger la présentation

A “Sweet Approach” to Understanding Basic Principles of Educational Measurement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A “Sweet Approach” to Understanding Basic Principles of Educational Measurement Assessing the Performance of Chocolates

  2. Objectives • At the end of instruction, participants will • Describe sources of error that threaten the reliability and validity of performance assessment measures • List specific strategies to address these threats • Define and appropriately employ performance measurement terminology • Anchors, Likert, Horns and Halo Effects • Develop and test a performance-based measurement instrument • Construct scale • Train raters in using it effectively • Assess validity and reliability of measures • Identify/ explain sources of error

  3. Lesson Plan: Set the task • Judge at the Wisconsin State Fair for “Open Class” Commercial Chocolates • Develop key factors to rate chocolates • Develop the rating scale • Train other “judges” • Taste chocolates and rate • Overview Key Measurement Principles • Goal – to ID sources and strategies to control errors • Step-by-step approach to task // process in educational measurement

  4. Timeline • Introduction to measurement 20 min • Development of criteria 20 min • Develop scale 10 min • Train raters 10 min • Sample and rate chocolates 20 min • Identify sources of error 10 min

  5. Underlying Assumption of Performance Based Measures • An individuals observed performance/score is a combination of: • True Score +Errors of Measurement • Random • Controllable • All measurement seeks to control errors so that the measured score = true score Does OBSERVED score = TRUE score?

  6. Familiar EBM terminology: • Validity: • Relevance: does the measure actually reflect the variable of interest? • Appropriateness: relevant to purpose of the study • Meaningfulness: measure reflects variable of interest • Usefulness: aids decision-making • Accuracy: is the measurement free from error? • Random error: who takes the test • Systematic error: bias

  7. Principles of MeasurementValidity – What are you measuring?

  8. Principles of MeasurementCommon Types of Validity Evidence 1. Content-related evidence Looks like a duck, sounds like a duck = duck • Appropriateness, logically get at intended performance • Expert review (Face validity) • Representative sample from content domain - objectives

  9. Principles of Measurement3 Types of Validity Evidence 2. Criterion-related evidence Sugar content of the grapes = best wine • Relationship between this and others • Predictive - MCAT and medical school • Concurrent – comparison to a gold standard • Often Expressed as: • Correlation coefficients for continuous variables • Cross-tabulations for dichotomous variables

  10. Principles of Measurement3 Types of Validity Evidence 3. Construct-related evidence • Psychological construct or characteristic where gold standard does not exist • Ex Medicine: Self efficacy → medication adherence • Ex. Educ: Body/Kinesthetic IQ → Suturing Skills • Test relationship between performance and a theoretical model • Multiple Regression, Correlation, Factor analysis

  11. Validity: Accuracy and ReliabilityExamples of Error: Random Error Things that you can not control • Subject Variability • Snow storm delays Controllable error Things you CAN Control • Instrument Variability • Observer Variability • (Intra- vs. Inter-observer) • Halos/ Horns

  12. Principles of Measurement:Reliability • Defined: consistency of the scores obtained • at one time • over time Which color is: • Not reliable • Reliable but not Valid • Reliable and Valid?

  13. Subject Motivation, energy, anxiety Location Maturation History Regression (high/low) Rater Characteristics age, gender, ethnicity biases halo/horns fatigue Instrument Scales Normative Criterion Length Inadequate instructions Poor formatting Illogical order Vague terminology Reponses that fail to fit question/ scale Items favor one group over another Controlling Error in Performance Measures

  14. KEY: Always Think4 Common Categories of ERRORS • Instrument • Raters • Design/Administration • Subjects

  15. Strategies to Control Errors • Standardize Conditions (location, instrument, attitude) • Clear specific descriptors of desired behaviors • Trained raters/ Proctors • How and under what conditions data is collected • Obtain more information on subjects • relevant characteristics (e.g. do they like chocolate)? • Obtain more information on details • location, instrumentation, history, subject attitude • Appropriate Design

  16. AND NOW . . . • You have been selected to create the measurement tool with which to judge . . . WISCONSIN’s BEST CHOCOLATE

  17. Review Your TasksChocolate Judge • List criteria indicative of “best chocolate” • Develop Likert-scale rating item, with descriptive anchors for each criterion • Train other raters to use your item • Pilot all items on samples of chocolate • Identify potential sources of error • Examine reliability of ratings • Which errors contributed – could be controlled?

  18. A few words about scales . . . • Summated rating scale (Likert): • Length of scale = accuracy with which raters can make decisions • Assumes equal intervals between decision points (Ratio scale) • Gap between Excellent and good = gap between good and satisfactory (Ratio) • So need to provide scale anchors to inform raters (control error) of your intent

  19. A few words about scale assumptions. . . • Normative: Compared/relative to other learners • Standard score or mean score (USMLE Steps) • Compared to other chocolates in the group its “average”; “above average” • Criterion-based: Compared to a “gold standard”; or minimum threshold that is pre-established • 80% correct on examination • Godiva? Smooth like silk or granular like sand

  20. STEP 1: Develop Valid Indicators of Best Chocolate

  21. Chocolate Performance Criteria

  22. STEP 2: Developing Rating Scale

  23. Step 3: Train Your “Raters”

  24. Step 4: Sampling

  25. Step 5: Sources of Error True Score ≠ Observed Score • Turn rating sheets in now • Note on the Sources of Error Worksheet any factors that would affect reliability (consistency) of your ratings • Goal: to identify variance due to true variability (in chocolate) • + Variance due to raters + variance due to instrumentation + variance due to administration + etc.

  26. Potential Sources of Error • . • . • . • . • . • . • . • . Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)

  27. Step 6: Review of Scores • Within individual variance • all “high” • Between individuals • Halo/Horns Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)

  28. Measurement = Chocolate • Validity: Criterion measure essence of Wisconsin’s “best” chocolate? • Reliability: control errors due to • Instrumentation • sample #’s confused; clarity of criterion; • Raters • competent to judge; biases • Administration • allow drink soda; eat in any order, time, directions, location, standardization Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)

  29. Chocolates • # 1: Confections Solid Milk Chocolate • 1997 Wisconsin State Farm Seal of Excellence • #2: Hershey’s Extra Dark 60% Cocoa • #3: Regal Dynasty Milk • #4: Hershey’s Cookies ‘n’ Cream • #5: Dove Dark Hearts • #6: Palmer Milk Hearts • #7: Nestle’s Milk hearts

  30. Do Your Sources of Errors Explain Ratings? • Raters • Did they chat amongst themselves • Contamination? Drink diet coke, coffee? • Rating strategy: • Taste them all then rate? • Rate each independently? • Recognize brands? Bias? • Rater Fatigue? • Instrument • Scale descriptors clear? • All variables considered in scale development (e.g. cookie pieces? // Non-traditional students/ learners)

  31. Beyond ChocolateApply Principles to Assessment of Learner Performance • What are common errors / problems in learner assessment? Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)

  32. Beyond ChocolateApply Principles to Assessment of Learner Performance • Rating Forms for Resident/Student Performance • Criterion (Behavioral Anchors) vs Normative • What content/dimensions • Scales • OSCE’s (added variability – why?) • What else need to control? • Why OSVE? Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)

  33. SUMMARY: Part I True Score + Controllable Errors (Sources of Error) + Random Error Observed Score (Faculty rating, MCQ) • Sources of error: • Instrument (Valid dimensions, clear directions, pilot and revise) • Administration/Design (Standardize) • Raters (Train) Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)

  34. SUMMARY: Part II • Think, think, think a-bout • Errors of mea-sure-ment • Rater Bias • Va-lid-it-ty Too • Every time you test ROW ROW ROW YOUR BOAT

  35. Supplemental Slides

  36. Follow-up: Chocolates • Seven different chocolates were evaluated on seven different indicators during the 2006 Chocolate Survey.

  37. Results: Descriptive Stats *Scale: 7=Lowest Rating 1=Highest Rating

  38. Inter-rater reliability • Kendall’s Concordance varies on a scale of 0 = no agreement 1 = perfect agreement • Hope for concordance > 0.7

  39. Results: Inter-rater agreement

  40. Lack of concordance – Why?

More Related