A “Sweet Approach” to Understanding Basic Principles of Educational Measurement

A “Sweet Approach” to Understanding Basic Principles of Educational Measurement Assessing the Performance of Chocolates

Objectives • At the end of instruction, participants will • Describe sources of error that threaten the reliability and validity of performance assessment measures • List specific strategies to address these threats • Define and appropriately employ performance measurement terminology • Anchors, Likert, Horns and Halo Effects • Develop and test a performance-based measurement instrument • Construct scale • Train raters in using it effectively • Assess validity and reliability of measures • Identify/ explain sources of error

Lesson Plan: Set the task • Judge at the Wisconsin State Fair for “Open Class” Commercial Chocolates • Develop key factors to rate chocolates • Develop the rating scale • Train other “judges” • Taste chocolates and rate • Overview Key Measurement Principles • Goal – to ID sources and strategies to control errors • Step-by-step approach to task // process in educational measurement

Timeline • Introduction to measurement 20 min • Development of criteria 20 min • Develop scale 10 min • Train raters 10 min • Sample and rate chocolates 20 min • Identify sources of error 10 min

Underlying Assumption of Performance Based Measures • An individuals observed performance/score is a combination of: • True Score +Errors of Measurement • Random • Controllable • All measurement seeks to control errors so that the measured score = true score Does OBSERVED score = TRUE score?

Familiar EBM terminology: • Validity: • Relevance: does the measure actually reflect the variable of interest? • Appropriateness: relevant to purpose of the study • Meaningfulness: measure reflects variable of interest • Usefulness: aids decision-making • Accuracy: is the measurement free from error? • Random error: who takes the test • Systematic error: bias

Principles of MeasurementValidity – What are you measuring?

Principles of MeasurementCommon Types of Validity Evidence 1. Content-related evidence Looks like a duck, sounds like a duck = duck • Appropriateness, logically get at intended performance • Expert review (Face validity) • Representative sample from content domain - objectives

Principles of Measurement3 Types of Validity Evidence 2. Criterion-related evidence Sugar content of the grapes = best wine • Relationship between this and others • Predictive - MCAT and medical school • Concurrent – comparison to a gold standard • Often Expressed as: • Correlation coefficients for continuous variables • Cross-tabulations for dichotomous variables

Principles of Measurement3 Types of Validity Evidence 3. Construct-related evidence • Psychological construct or characteristic where gold standard does not exist • Ex Medicine: Self efficacy → medication adherence • Ex. Educ: Body/Kinesthetic IQ → Suturing Skills • Test relationship between performance and a theoretical model • Multiple Regression, Correlation, Factor analysis

Validity: Accuracy and ReliabilityExamples of Error: Random Error Things that you can not control • Subject Variability • Snow storm delays Controllable error Things you CAN Control • Instrument Variability • Observer Variability • (Intra- vs. Inter-observer) • Halos/ Horns

Principles of Measurement:Reliability • Defined: consistency of the scores obtained • at one time • over time Which color is: • Not reliable • Reliable but not Valid • Reliable and Valid?

Subject Motivation, energy, anxiety Location Maturation History Regression (high/low) Rater Characteristics age, gender, ethnicity biases halo/horns fatigue Instrument Scales Normative Criterion Length Inadequate instructions Poor formatting Illogical order Vague terminology Reponses that fail to fit question/ scale Items favor one group over another Controlling Error in Performance Measures

KEY: Always Think4 Common Categories of ERRORS • Instrument • Raters • Design/Administration • Subjects

Strategies to Control Errors • Standardize Conditions (location, instrument, attitude) • Clear specific descriptors of desired behaviors • Trained raters/ Proctors • How and under what conditions data is collected • Obtain more information on subjects • relevant characteristics (e.g. do they like chocolate)? • Obtain more information on details • location, instrumentation, history, subject attitude • Appropriate Design

AND NOW . . . • You have been selected to create the measurement tool with which to judge . . . WISCONSIN’s BEST CHOCOLATE

Review Your TasksChocolate Judge • List criteria indicative of “best chocolate” • Develop Likert-scale rating item, with descriptive anchors for each criterion • Train other raters to use your item • Pilot all items on samples of chocolate • Identify potential sources of error • Examine reliability of ratings • Which errors contributed – could be controlled?

A few words about scales . . . • Summated rating scale (Likert): • Length of scale = accuracy with which raters can make decisions • Assumes equal intervals between decision points (Ratio scale) • Gap between Excellent and good = gap between good and satisfactory (Ratio) • So need to provide scale anchors to inform raters (control error) of your intent

A few words about scale assumptions. . . • Normative: Compared/relative to other learners • Standard score or mean score (USMLE Steps) • Compared to other chocolates in the group its “average”; “above average” • Criterion-based: Compared to a “gold standard”; or minimum threshold that is pre-established • 80% correct on examination • Godiva? Smooth like silk or granular like sand

STEP 1: Develop Valid Indicators of Best Chocolate

Chocolate Performance Criteria

STEP 2: Developing Rating Scale

Step 3: Train Your “Raters”

Step 4: Sampling

Step 5: Sources of Error True Score ≠ Observed Score • Turn rating sheets in now • Note on the Sources of Error Worksheet any factors that would affect reliability (consistency) of your ratings • Goal: to identify variance due to true variability (in chocolate) • + Variance due to raters + variance due to instrumentation + variance due to administration + etc.

Potential Sources of Error • . • . • . • . • . • . • . • . Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)

Step 6: Review of Scores • Within individual variance • all “high” • Between individuals • Halo/Horns Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)

Measurement = Chocolate • Validity: Criterion measure essence of Wisconsin’s “best” chocolate? • Reliability: control errors due to • Instrumentation • sample #’s confused; clarity of criterion; • Raters • competent to judge; biases • Administration • allow drink soda; eat in any order, time, directions, location, standardization Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)

Chocolates • # 1: Confections Solid Milk Chocolate • 1997 Wisconsin State Farm Seal of Excellence • #2: Hershey’s Extra Dark 60% Cocoa • #3: Regal Dynasty Milk • #4: Hershey’s Cookies ‘n’ Cream • #5: Dove Dark Hearts • #6: Palmer Milk Hearts • #7: Nestle’s Milk hearts

Do Your Sources of Errors Explain Ratings? • Raters • Did they chat amongst themselves • Contamination? Drink diet coke, coffee? • Rating strategy: • Taste them all then rate? • Rate each independently? • Recognize brands? Bias? • Rater Fatigue? • Instrument • Scale descriptors clear? • All variables considered in scale development (e.g. cookie pieces? // Non-traditional students/ learners)

Beyond ChocolateApply Principles to Assessment of Learner Performance • What are common errors / problems in learner assessment? Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)

Beyond ChocolateApply Principles to Assessment of Learner Performance • Rating Forms for Resident/Student Performance • Criterion (Behavioral Anchors) vs Normative • What content/dimensions • Scales • OSCE’s (added variability – why?) • What else need to control? • Why OSVE? Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)

SUMMARY: Part I True Score + Controllable Errors (Sources of Error) + Random Error Observed Score (Faculty rating, MCQ) • Sources of error: • Instrument (Valid dimensions, clear directions, pilot and revise) • Administration/Design (Standardize) • Raters (Train) Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)

SUMMARY: Part II • Think, think, think a-bout • Errors of mea-sure-ment • Rater Bias • Va-lid-it-ty Too • Every time you test ROW ROW ROW YOUR BOAT

Supplemental Slides

Follow-up: Chocolates • Seven different chocolates were evaluated on seven different indicators during the 2006 Chocolate Survey.

Results: Descriptive Stats *Scale: 7=Lowest Rating 1=Highest Rating

Inter-rater reliability • Kendall’s Concordance varies on a scale of 0 = no agreement 1 = perfect agreement • Hope for concordance > 0.7

Results: Inter-rater agreement

Lack of concordance – Why?

A “Sweet Approach” to Understanding Basic Principles of Educational Measurement

A “Sweet Approach” to Understanding Basic Principles of Educational Measurement

Presentation Transcript

Basic Flow Measurement

Sweet Dough

Basic principles of peripheral interventions

Building an EU Consensus for minimum quality standards in drug demand reduction - Setting the Scene -

Hemodynamic Principles The Fundamentals

ASSESSMENT METHODS FOR MEDICAL STUDENTS

The Measurement Approach

The Measurement Approach to Decision Usefulness

Measurement

Measurement

Educational Linkage Approach In Cultural Heritage

Basic Principles of Airmanship Airfields

Understanding Standards of Measurement

Understanding Financial Aid

Impact Measurement and You

Hemodynamic Principles The Fundamentals

Basic Principles of Immunohematology

GDP measurement

Sweet By And By

The Four Basic Principles

Principles of Measurement