1 / 16

Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech. Keith Vertanen July 30 th , 2004. The problem. Speech recognizers make mistakes Correcting mistakes is inefficient 140 WPM Uncorrected dictation 14 WPM Corrected dictation, mouse/keyboard

toviel
Télécharger la présentation

Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen July 30th, 2004

  2. The problem • Speech recognizers make mistakes • Correcting mistakes is inefficient • 140 WPM Uncorrected dictation • 14 WPM Corrected dictation, mouse/keyboard • 32 WPM Corrected typing, mouse/keyboard • Voice-only correction is even slower and more frustrating

  3. Research overview • Make correction of dictation: • More efficient • More fun • More accessible • Approach: • Build a word lattice from a recognizer’s n-best list • Expand lattice to cover likely recognition errors • Make a language model from expanded lattice • Use model in a continuous gesture interface to perform confirmation and correction

  4. Building lattice • Example n-best list: 1: jack studied very hard 2: jack studied hard 3: jill studied hard 4: jill studied very hard 5: jill studied little

  5. Insertion errors

  6. Acoustic confusions • Given a word, find words that sound similar • Look pronunciation up in dictionary: studied s t ah d iy d • Use observed phone confusions to generate alternative pronunciations: s t ah d iy d s t ah d iy d s ao d iy s t ah d iy … • Map pronunciation back to words: s t ah d iy d studied s ao d iy saudi s t ah d iy study

  7. Acoustic confusions:“Jack studied hard”

  8. Language model confusions:“Jack studied hard” • Look at words before or after a node, add likely alternate words based on n-gram LM

  9. Expansion results (on WSJ1)

  10. Probability model • Our confirmation and correction interface requires probability of a letter given prior letters:

  11. Probability model • Keep track of possible paths in lattice • Prediction based on next letter on paths • Interpolate with default language model • Example, user has entered “the_cat”:

  12. Handling word errors • Use default language model during entry of erroneous word • Rebuild paths allowing for an additional deletion or substitution error • Example, user has entered “the_cattle_”:

  13. Evaluating expansion • Assume a good model requires as little information from the user as possible

  14. Results on test set • Model evaluated on held out test set (Hub1) • Default language model • 2.4 bits/letter • User decides between 5.3 letters • Best speech-based model • 0.61 bits/letter • User decides between 1.5 letters

  15. “To the mouse snow means freedom from want and fear”

  16. Questions?

More Related