770 likes | 949 Vues
Using Diagnostic Assessment To Guide Timely Interventions. Natalie Rathvon, Ph.D. What We’ll Cover . A research-based framework for selecting and using diagnostic reading assessments Steps in the diagnostic assessment process Issues related to assessing the five reading components
E N D
Using Diagnostic Assessment To Guide Timely Interventions Natalie Rathvon, Ph.D.
What We’ll Cover • A research-based framework for selecting and using diagnostic reading assessments • Steps in the diagnostic assessment process • Issues related to assessing the five reading components • Diagnostic assessment options for each component • Case examples
Reading First Assessments • Screening: Brief measures to identify which students are at risk for reading problems • Progress monitoring: Brief measures to determine if students are making adequate progress in acquiring reading skills • Diagnostic: A comprehensive assessment to locate the source(s) of reading difficulty for individual students to guide instruction • Outcome: An assessment to determine the extent to which all students have achieved grade-level expectations in reading
Questions to be Answered by Diagnostic Assessments • In which reading skill areas is this student achieving at expected levels? • In which reading skill areas is the student making less than expected progress? • What types, intensity, and duration of interventions are likely to be effective in addressing this student’s skill needs?
So many tests, so few guidelines . . . • Growing number of print and online tests purporting to assess reading • Standards for Psychological and Educational Testing (AERA, APA, & NCME, 1999) • Gives general guidelines--not specific criteria--for evaluating psychometric quality
Myths about Reading Assessment • All claims that a measure is “scientifically based” are equally valid. • A valid and reliable measure is equally valid and reliable for all examinees. • All measures of the same reading component yield similar results for the same examinee.
Accelerating Student Outcomes Assessment Instruction Data-Based Instructional Planning
Traditional “Standard Battery” (one size fits all) Assumes reading problems arise from internal child deficits Designed to provide a categorical label for educational programming Component-based Targets domains related to the identified deficits Assumes most reading problems arise from experiential and/or instructional deficits Designed to provide information for guiding instruction Reading Assessment Models
Two Sets of Considerations in Selecting Assessments • Technical adequacy: Psychometric soundness • Usability: Degree to which practitioners can actually use a measure in applied settings
Assessment Checklists • Checklist 1: Evaluating the technical adequacy of diagnostic reading measures • Checklist 2: Evaluating the usability of diagnostic reading measures
Five Key Technical Adequacy Characteristics • Norms • Test floors • Item gradients • Reliability • Validity • Checklist 1: Evaluating Technical Adequacy
Norms: How Do We Interpret Performance? • Norm-referenced measures: Comparisons with age/grade peers • Criterion-referenced measures: Comparisons with pre-determined performance standards • Nonstandardized measures: Research norms or examiner judgment
Evaluating the Adequacy of Norms • Are they representative? • Criteria:Should match a national or appropriate reference population • Are they recent? • Criteria: No more than 7 – 12 years old • Are subgroup and sample sizes large enough? • Criteria: At least 100 & 1000, respectively
Evaluating Norms, II • Are norm table intervals small enough to reflect changes in skill development? • Criteria: • No more than 6 months for students aged 7-11 and younger • No more than 1 year for students aged 8-0 to 18
Reliability: Are Scores Consistent and Accurate? Alternate-form: Form A vs Form B Internal consistency: Item A vs Item B Test-retest: Time A vs Time B Interscorer: Scorer A vs Scorer B • Criteria: =/> .80 for screening measures and .90 for diagnostic measures
Hidden Threat to Reliability • Examiner variance: Differences among assessors in administering tasks and recording responses • Especially likely on: • Live-voice tasks (phoneme blending) • Fluency-based tasks (CBM, TOWRE) • Tasks with complex administration or scoring systems (DIBELS ISF, LAC–3)
Test Floors: Can the Test Detect Poor Readers? • Test floor: Lowest possible standard score when a student answers 1 item correctly • Adequate floors: Permit identification of students with very weak skills • Inadequate floors: Overestimate students’ level of skills
Test Floor Criteria • A subtest raw score of 1 should yield a standard score > 2 SDs below the subtest mean. • SS of 3 or less for a subtest mean of 10 • SS of 69 or less for a subtest mean of 100
Which Tests and Tasks Are Likely to Display Floor Effects? • “Cradle-to-grave” tests (WJ III) • Phonemic manipulation tasks (deletion, substitution, reversal) • Oral reading fluency tests • Pseudoword reading tests • Spelling tests • Reading comprehension tests
Why Floor Effects Matter • TOWRE Phoneme Decoding Efficiency • A student in the 2nd month of Grade 1 with 1 item correct earns a SS of 97 (average). • WJ III Reading Vocabulary • A student in the 3rd month of Grade 1 with 1 item correct earns a SS of 94 (average).
Item Gradients: Can the Test Detect Small Differences? • Item gradient: Steepness with which standard scores change from 1 raw score unit to another • Adequate gradient: Sensitive to small differences in performance • Steep gradient: Obscures differences among performance levels
Item Gradient Criteria • 6 or more items between subtest floor and mean (M = 10) or • 10 or more items between subtest floor and mean (M = 100) • GRADE Listening Comprehension (K) • 17 items correct = 5th stanine • 18 items correct (100%) = 8th stanine
Test Floors and Item Gradients: Special Cases • Screening tests • Critical issue is cutoff score accuracy, not floor/gradient violations • Tests not yielding standard scores • Deciles, percentiles, quartiles, stanines • Rasch-model tests • Preclude direct inspection of raw score-standard score relationships • WJ family : WJ III, WRMT-R/NU, WDRB
Validity: Are the Results Meaningful? • Content validity: Effectiveness in assessing the relevant domain • Criterion-related validity:Effectiveness in predicting performance now (concurrent validity) or later (predictive validity) • Construct: Effectiveness in measuring what the test is supposed to measure • Criteria: Evidence of all three types of validity for the target population
Predictive and Diagnostic Validity • Does the test predict reading outcomes for the target age/grade group? • Concurrent vs. predictive validity evidence • Does the test differentiate between students with and without reading problems? • Group differentiation studies
The Rest of the Story: Usability Considerations • Usability often has more influence in test selection and use than technical adequacy. • “I know how to give it.” • “It doesn’t take long to give.” • “It’s easy to carry around.” • “I think I saw one in the storage closet.”
Practical Characteristics • Test construction • Administration • Accommodations and adaptations • Scores and scoring • Interpretation • Links to intervention • Checklist 2: Evaluating Usability
The Critical Usability Issue in Diagnostic Assessment • Is there evidence that test results can be used to design instruction to address the reading deficits that have been identified?
The Diagnostic Assessment Process • What can we learn from the results of screening and/or progress monitoring measures? • Are there weaknesses in fluency, phonics, or phonemic awareness? • What can we learn from the results of outcome measures (if available)? • Are there weaknesses in vocabulary and/or comprehension?
Types of Students with Reading Problems Students with specific phonological processing problems Students with global language deficits Reading Performance Problem Attentional Problems Disruptive Behavior Problems
Identified Deficit Comprehension Fluency Phonics Vocabulary Phonemic Awareness Reading-Related Cognitive Abilities
Issues in Assessing Fluency • Floor effects common • Task variations: foundational skills vs. word reading vs. contextual reading • Variations in level of text difficulty • Oral vs. silent reading formats • Interexaminer variance • Differences in fluency definitions
Fluency Options • BEAR = WPM + Fluency Scale • CBM (student’s own text) = WCPM • CBM (DIBELS) = WCPM • GORT–4 Rate & Fluency = SS, PR, GE, AE • FOX Fluency = WCPM + Fluency Scale • Virginia PALS = WPM + Fluency Scale • Center City Consortium PALS = WCPM • TPRI = WCPM
Best Practices in Assessing Fluency • Administer graded passages with documented readability levels. • Use WCPM as the fluency metric. • Assess at the passage level (i.e., more than 1 minute reading). • Take running records to obtain diagnostic and intervention planning information. • Beware of floor effects in norm-referenced tests.
Issues in Assessing Phonics • Wide differences in content coverage for alphabet knowledge • WJ III Letter-Word ID 13 letters • TERA–3 Alphabet 13 letters • ERDA–2 Letter Recognition 26 letters • WRMT–R/NU Letter ID 51 letters • Floor effects common for pseudoword reading and spelling tests
Phonics Issues, II • Differences in task types • Pseudoword reading = recognition • Spelling = recall (more sensitive) • Differences in pseudoword construction • vake = many neighbors (easier to read) • vaik = few neighbors (harder to read) • Pseudoword reading tests vulnerable to examiner variance and interscorer inconsistency
Alphabet Knowledge Options • Book Buddies NS • CORE Phonics Survey NS • ERDA – 2 NR • FOX CR • PALS CR • TPRI CR • Random letter arrays NS
Spelling Options: Looking in through the “Phonics Window” • Book Buddies (NS - developmental scoring) • CORE Phonics Survey (CR) • FOX (CR) • PALS (CR - developmental scoring) • TPRI (CR) • WIAT–II Spelling (NR) • WJ III Spelling, Spelling of Sounds (NR)
Pseudoword Reading Options • CORE Phonics Survey NS • ERDA–2/WIAT–2 NR • FOX Decoding & Sight Words CR • PAT Decoding NR & CR • Phonics-Based Reading Test NR & CR • WRMT–R/NU Word Attack NR • WJ III Word Attack NR • Informal pseudoword measures
Best Practices in Assessing Phonics • Assess all relevant phonics components. • Select measures with adequate content coverage. • Include both recognition (pseudoword reading) and recall measures (spelling). • Include developmental spelling measures with differentiated scoring systems.
Phonological vs. Phonemic Awareness • Phonological awareness: General awareness of the sound structure of language vs. meaning • Phonemic awareness: Understanding that speech is composed of individual sounds that can be analyzed and manipulated
Issues in Assessing Phonemic Awareness • Variations in linguistic unit, presentation and response formats, coverage, item types, and scoring (all or nothing vs. partial credit) • Variations in predictive power, depending on children’s stage of literacy development • Vulnerable to examiner and interscorer variance, especially for live-voice measures
Phonemic Awareness Options • CTOPP (7 tasks) NR • FOX (7 tasks) CR • LAC-3 (2 tasks) NR & CR • PALS (4 tasks) CR • PAT (6 tasks) NR & CR • TPRI (5 tasks) CR
Best Practices in Assessing Phonemic Awareness • Select multiple measures with adequate content coverage for the domain. • Maximize diagnostic power by matching measures to children’s stage of literacy development. • Use individually administered measures with oral response formats. • Provide training and reliability checks for complex and live-voice measures.