110 likes | 192 Vues
This informative guide explores the significance of detecting missrecognitions and user corrections in speech recognition systems, utilizing prosodic cues such as fundamental frequency, energy, and speaking rate. Results from studies using rules-based learning show improved prediction rates with prosodic analysis. The text delves into the implications of missrecognitions and user corrections, as well as the types of corrections and their associated prosodic features.
E N D
Detecting missrecognitions Predicting with prosody
Missrecognitions - papers • “Predicting automatic speech recognition performance using prosodic cues” - TooT • “Generalizing prosodic prediction of speech recognition errors” – W99
Missrecognitions - generalities • What are they? • WER – Word error rate • CA – concept accuracy • Why it is important to detect them? • User dificulty to correct system missundertandings • User frustration by unnecessary confirmations or rejections
Prosody to the rescue!!! • Prosodic features used: • Fundamental frequency (f0) • Energy (rms) • Duration of speaker turn (dur) • Pause preceding turn (ppau) • Speaking rate (tempo) • Silence in speaker turn (zeros)
Predicting Missrecognitions - results • Rule based learner (RIPPER) • Characteristics of missrecognitions: • Higher in pitch • Louder, longer • Less internal space • Improved prediction with prosody • TooT – 6.53% vs 22.23% • W99 – 22.77% vs 26.14%
Predicting Missrecognitions - comments • Is WER a adequate measure? • Do we model the ASR capabilities or its training set? • Comparing with ASR confidence score learning is ok?
Detecting user corrections Predicting with prosody
User corrections - papers • “Corrections in spoken dialog systems” • “Identifying user corrections automatically in spoken dialog systems”
User corrections - generalities • What are they? • Why it is important to detect them? • Recognized much more poorly • Tuning dialog strategies • ASR for hyperarticulated speech • Change of initiative and confirmation strategy
User corrections - insights • Types: • REP – repetition • PAR – paraphrase • ADD – content added • OMIT – content omitted • ADD/OMIT • Characterized by prosodic features associated with hyperarticulation – but not the same
Predicting user corrections • Rule based learner on TooT corpus • Features: PROS, ASR, SYS, POS, DIA • 15.72% error rate on Raw+ASR+ SYS+POS+PreTurn