Download
speech recognition understanding and conversational interfaces n.
Skip this Video
Loading SlideShow in 5 Seconds..
Speech recognition, understanding and conversational interfaces PowerPoint Presentation
Download Presentation
Speech recognition, understanding and conversational interfaces

Speech recognition, understanding and conversational interfaces

444 Vues Download Presentation
Télécharger la présentation

Speech recognition, understanding and conversational interfaces

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Speech recognition, understanding and conversational interfaces Alexander Rudnicky School of Computer Science http://www.cs.cmu.edu/~air

  2. Outline • Speech • Types of speech interfaces • Speech systems and their structure • Designing speech interfaces • Some applications • SpeechWear • Communicator

  3. Speech as a signal • The difference between speech and sound • “CD” quality vs. intelligible quality • high-quality is 44.1 / 48 kHz • desirable speech bandwidth: 0-8kHz, 16bits • at 16bits/sample: 256kbps (tethered mic) • telephone: 64kbps (and lower) • Compression: • MPEG: 64kbps/channel and up (but not speech-optimal) • CELP: 16kbps … 2.4kbps (optimized for speech)

  4. Speech for communication • The difference between speech and language • Speech recognition and speech understanding

  5. Computers and speech • Transcription • dictation, information retrieval • Command and control • data entry, device control, navigation • Information access • airline schedules, stock quotes • Problem solving • travel planning, logistics

  6. Speech system architecture • SIGNAL PROCESSING • DECODING • UNDERSTANDING • DISCOURSE • ACTION

  7. Varieties of speech systems

  8. Signal processing Parser Dialog manager Language Generator Decoder Post parser Speech synthesizer Domain agent Domain agent Domain agent speech display effector A generic speech system speech

  9. Reduce dimensionality of signal • noise conditioning Signal processing • Transcribe speech to words Decoder Decoding speech Acoustic models Language models Corpus-base statistical models

  10. Creating models for recognition Speech data Acoustic models Transcribe* Train Text data Language models Train

  11. Understanding speech Grammar Ontology design, language acquisition Parser • Extract semantic content from utterance Post parser • Introduce context and world knowledge into interpretation Context Domain Agents Grounding, knowledge engineering

  12. Interacting with the user Task schemas Task analysis Context Dialog manager • Guide interaction through task • Map user inputs and system state into actions Domain agent • Interact with back-end(s) • Interpret information using domain knowledge Domain agent Domain agent Database Live data (e.g. Web) Domain expert Knowledge engineering

  13. Communicating with the user Language Generator • Decide what to say to user (and how to phrase it) Speech synthesizer Display Generator Action Generator

  14. Speech recognition and understanding • Sphinx system • speaker-independent • continuous speech • large vocabulary • ATIS system • air travel information retrieval • context management • film clip

  15. Command and control systems • Small vocabularies, fixed syntax • OPEN WINDOW <window_id> • MOVE OBJECT <object_id> to <position> • Applications: • data entry (e.g., zip codes), process control (e.g., electron microscope, darkroom equipment) • Large vocabulary, fixed syntax • Web browsing (?)

  16. SpeechWear • Vehicle inspection task • USMC mechanics, fixed inspection form • Wearable computer (COTS components) • html-based task representation • film clip

  17. Information access • Moderate to very large vocabulary • IVR and frame based systems • Commercial systems: • Nuance: http://www.nuance.com/demo/index.html • SpeechWorks: http://www.speechworks.com/demos/demos.htm • lots of others..

  18. IVR and frame-based systems • Interactive voice response (IVR) • interactions specified by a graph (typically a tree) • Frame systems • ergodic graphs • states defined by multi-item forms

  19. Graph-based systems Welcome to Bank ABC! Please say one of the following: Balance, Hours, Loan, ... What type of loan are you interested in? Please sayone of the following: Mortgage, Car, Personal, ... . . . .

  20. Destination_City: Boston Departure_Date: ______ Departure_Time: ______ Preferred_Airline: ______ . . . Frame-based systems • I would like to fly to Boston • I’d like to go to Boston on Friday, … • When would you like to fly?

  21. Frame-based systems Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____ . . Transition on keyword or phrase Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____ . . Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____ . . Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____ . . Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____ . .

  22. Some problems • IVR systems work great, but only for well-structured (& “shallow”) tasks • Frame systems are good for “tasks” that correspond to a single form leading to an action • Neither approach does well with more complex problem-solving activities

  23. Dialog Systems • Problem solving activity; complex task • Order of progression through task depends on user goals (which can change) and system state (a back-end retrieval) and is not predictable. • Track progress and help task along • mixed-initiative dialog • Discourse phenomena • User expect to “converse” with the system

  24. Carnegie Mellon Communicator • A dialog system that supports complex problem solving in a travel planning domain • create an itinerary using air schedule, hotel and car information • 186 U.S. airports (>140k enplanements/yr) • currently: >500 world airports • Web-based data resources • Live and cached flight information • Airport, airline, etc. information

  25. Value schema/handlers transform receptors value Domain Agent

  26. Value_1 Value_2 Value_3 Compound schema transform value + e.g. SQL query Domain Agent

  27. Destination airport Date Time Flight Leg Database lookup Available flights Schema ordering Schema i Value i Schema j Value j Schema k Value k transform Value

  28. Carnegie Mellon Communicator • CMU Communicator • Call: 268-5144 • the information is accurate; you can use it for your own travel planning...

  29. User-aware speech interfaces • Predictable behavior on the system’s part • Users coomunicate at different levels • http://www.speech.cs.cmu.edu/air/papers/InterfaceChars.html

  30. User-aware speech interfaces • Content: task-centric utterances • Possibility: What can I do? • Orientation: Where are we? • Navigation: moving through the task space • Control: verbose/terse, listen! • Customization: define this word

  31. Speech interface guidelines • Speech recognition is errorful • System state is often opaque to the user • http://www.speech.cs.cmu.edu/air/papers/SpInGuidelines/SpInGuidelines.html

  32. Interface guidelines • State transparency • Input control • Error recovery • Error detection • Error correction • Log performance • Application integration

  33. Summary • Speech and language communication • Dialog structure • Interface design