Overview Presentation

Children’s Oral Reading Corpus (CHOREC)Description & Assessment of Annotator AgreementL. Cleuren, J. Duchateau, P. Ghesquière, H. Van hammeThe SPACE project

Overview Presentation • The SPACE project • Development of a reading tutor • Development of CHOREC • Annotation procedure • Annotation agreement • Conclusions

1. The SPACE project • SPACE = SPeech Algorithms for Clinical & Educational applications • http://www.esat.kuleuven.be/psi/spraak/projects/SPACE • Main goals: • Demonstrate the benefits of speech technology based tools for: • An automated reading tutor • A pathological speech recognizer (e.g. dysarthria) • Improve automatic speech recognition and speech synthesis to use them in these tools

2. Development of a reading tutor • Main goals: • Computerized assessment of word decoding skills • Computerized training for slow and/or inaccurate readers • Accurate speech recognition needed to accurately detect reading errors

3. Development of CHOREC • To improve the recognizer’s reading error detection abilities: CHOREC is being developed = Children’s Oral Reading Corpus = Dutch database of recorded, transcribed, and annotated children’s oral readings • Participants: • 400 Dutch speaking children • 6-12 years old • without (n = 274, regular schools) or with (n = 126, special schools) reading difficulties

3. Development of CHOREC (b) • Reading material: • existing REAL WORDS • unexisting, well pronounceable words (i.e. PSEUDOWORDS) • STORIES • Recordings: • 22050 Hz, 2 microphones • 42 GB or 130 hours of speech

4. Annotation procedure • Segmentations, transcriptions and annotations by means of PRAAT (http://www.Praat.org)

4. Annotation procedure (b) 1. Pass 1  p-files • Orthographic transcriptions • Broad-phonetic transcription • Utterances made by the examiner • Background noise • Pass 2  f-files (only for those words that contain reading errors or hesitations) • Reading strategy labeling • Reading error labeling

4. Annotation procedure (c) Expected: Els zoekt haar schoen onder het bed. [Els looks for her shoe under the bed.] Observed: Als (says ‘something’) zoekt haar sch…schoen onder bed. [Als (says ‘something’) looks for het sh…shoe under bed.]

5. Annotation agreement • Quality of annotations relies heavily on various annotator characteristics (e.g. motivation) and external influences (e.g. time pressure). • Analysis of inter- and intra-annotator agreement to measure quality of annotations • INTER: triple p-annotations by 3 different annotators for 30% of the corpus (p01, p02, p03) • INTRA: double f-annotations by the same annotator for 10% of the corpus (f01, f01b, f02)

5. Annotation agreement (b) • Remark about the double f-annotations: • f01 = p01 + reading strategy & error tiers • f01b = f01 – reading strategy & error tiers + reading strategy & error tiers • f02 = p02 + reading strategy & error tiers • Agreement metrics • Percentage agreement + 95% CI • Kappa statistic + 95% CI

5. Annotation agreement (c) • overall high agreement! К : 0.717  0.966 % : 86.37  98.64 • INTER • К : PT > RED * • % : RED > OT > PT * • INTRA • К : RSL > REL * • (1) > (2) * • % : RSL > REL * for (1) • RSL < REL * for (2) • (1) > (2) * * p < .05

5. Annotation agreement (d) • overall high agreement! К : 0.706  0.971 % : 82.18  98.72 • When looking at % agreement scores: • regular > special * (except for f01-f01b comparison) • However, when looking at kappa values: • No systematic or sign. differences: RED: regular < special * PT: regular > special * RSL(2): regular > special * * p < .05

5. Annotation agreement (e) • overall substantial agreement! К : 0.575  0.966 % : 68.45  99.32 • When looking at % agreement scores: • S > RW > PW * * p < .05 • However, when looking at kappa values: • Always best agreement for S (except for RSL: no sign. diff. OR RW > S * in case of (2)) • No systematic or sign. differences w. r. t. RW and PW: RED: RW < PW * PT: RW > PW * RSL: RW = PW REL: RW > PW (1) or RW = PW (2)

5. Annotation agreement (f) Remarkable finding: Systematic differences in % agreement disappear when looking at kappa values! Explanation: Differences go hand in hand with differences in the amount of errors made: • children coming from special schools make more errors than children coming from a regular school • pseudowords are harder to read than real words, which are again harder to read than words embedded in a text → Kappa is better suited to assess annotation quality

6. Conclusions • The SPACE project • SPeech Algorithms for Clinical and Educational applications • http://www.esat.kuleuven.be/psi/spraak/projects/SPACE • CHOREC • Dutch database of recorded, transcribed, and annotated children’s oral readings • Assessment of annotator agreement • High overall agreement  reliable annotations • Kappa better suited to assess annotation quality

Overview Presentation

Overview Presentation

Presentation Transcript

Presentation Overview

Presentation Overview

Presentation Overview

Presentation Overview

Presentation Overview

Presentation Overview

PRESENTATION OVERVIEW.

Overview Presentation

Presentation Overview

Presentation overview

Presentation Overview

Presentation Overview

PRESENTATION OVERVIEW

Presentation Overview

PRESENTATION OVERVIEW

Presentation Overview

Presentation overview