1 / 28

VISIONS, TECHNOLOGY, AND BUSINESS OF CONVERSATIONAL MACHINES

VISIONS, TECHNOLOGY, AND BUSINESS OF CONVERSATIONAL MACHINES. Mazin Gilbert, Director, AT&T August 7-10, 2006. Joint work with Roberto Pieraccini, Tell-Eureka. A brief history of spoken language technology. Homer Dudley Bell Labs (1939). Von Kempelen (1791).

salali
Télécharger la présentation

VISIONS, TECHNOLOGY, AND BUSINESS OF CONVERSATIONAL MACHINES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VISIONS, TECHNOLOGY, AND BUSINESS OF CONVERSATIONAL MACHINES Mazin Gilbert, Director, AT&T August 7-10, 2006 Joint work with Roberto Pieraccini, Tell-Eureka

  2. A brief history of spoken language technology

  3. Homer Dudley Bell Labs (1939) Von Kempelen (1791) Joseph Faber (1835) Talking Machines: First Steps into Spoken Language Technology

  4. S1 S2 S3 The 60's to 90's: Technology Evolution Isolated Words Speaker Dependent Connected Words Speaker Independent Context Dependent Sub-Word Units Stochastic Language Models Template Matching a11 a22 a33 a12 a23 Acoustic/Phonetic Hidden Markov Models The statistical approach becomes ubiquitous

  5. HOSTING APPLICATION DEVELOPERS STANDARDS TOOLS PLATFORM INTEGRATORS STANDARDS TECHNOLOGY VENDORS STANDARDS The 90’s: the Birth of the Spoken Dialog Industry

  6. Modern speech technology

  7. The Speech Technology Chain Speech Speech TTS ASR Automatic SpeechRecognition Text-to-SpeechSynthesis Data, Rules Words Words SLG SLU Spoken Language Generation Spoken LanguageUnderstanding DM Action Meaning DialogManagement

  8. The Speech Technology Chain Accurately and efficiently convert a speech signal into a text message independent of the device, speaker or the environment. Speech Speech TTS ASR Automatic SpeechRecognition Text-to-SpeechSynthesis Data, Rules Words Words SLG SLU Spoken Language Generation Spoken LanguageUnderstanding DM Action Meaning DialogManagement

  9. ASR - The Big Picture! Change AM for each new language Acoustic Model P(X|W) Input Speech “Hello World” Decoder/ Pattern Classification Confidence Scoring Feature Extraction (0.9) (0.8) Word Lexicon Language Model P(W) Change LM and Lexicon for each new language & app.

  10. Human Speech Recognition vs. ASR Accuracy Machines Outperform Humans Efficiency x100 Operational Performance x10 On-line Learning x1 Robustness Machines are 5-50 times worse than humans on virtually any recognition task.

  11. The Speech Technology Chain Speech Speech TTS ASR Automatic SpeechRecognition Text-to-SpeechSynthesis Data, Rules Words Words SLG SLU Spoken Language Generation Spoken LanguageUnderstanding DM Action Meaning DialogManagement Extract the meaning from recognized speech and interpret a user’s request

  12. Why is SLU a Difficult Problem? Ways to say “question about my bill”

  13. Knowledge Sources for SLU Enabling Applications Syntactic Call routing Pragmatic Semantic Lexical Problem solving Customer care Speech Translation Acoustic/ Phonetic Speech Data Mining

  14. SLU - The Big Picture! From ASR/DM (text, lattices, n-best, history) Text Normalization Morphology, Synonyms Database Access Extracting named entities, semantic concepts, syntactic tree Parsing/ Decoding Interpretation Slot filling, reasoning, task knowledge representation To DM (concepts, entities, parse tree)

  15. The Speech Technology Chain Speech Speech TTS ASR Automatic SpeechRecognition Text-to-SpeechSynthesis Data, Rules Words Words SLG SLU Spoken Language Generation Spoken LanguageUnderstanding DM Action Meaning Manage elaborate exchanges with the user, providing access to information DialogManagement

  16. Context Interpretation The Dialog Flow Dialog Strategies Backend Action Observation Dialog State Transition

  17. Mixed-Initiative Dialog Who manages the dialog? User System Initiative How may I help you? I need to travel from Chicago to Newark tomorrow night Please say just your departure city. Chicago

  18. The Speech Technology Chain Speech Speech TTS ASR Automatic SpeechRecognition Text-to-SpeechSynthesis Data, Rules Words Words SLG SLU Spoken Language Generation Spoken LanguageUnderstanding DM Action Meaning Translate the action of the DM into a textual representation DialogManagement

  19. The Speech Technology Chain Provide completely natural, high intelligibility speech from text for any talker, language or accent Speech Speech TTS ASR Automatic SpeechRecognition Text-to-SpeechSynthesis Data, Rules Words Words SLG SLU Spoken Language Generation Spoken LanguageUnderstanding DM Action Meaning DialogManagement

  20. Concatenative Synthesis Dictionary and Rules Store of Sound Units Change Sound Store for each new voice and/or language Change Front-End for each new language Speech Waveform Modification and Synthesis Text Analysis,Letter-to-Sound,Prosody Assemble Units that Match Input Targets Speech Text Alphabetic Characters Phonetic Symbols, Prosody Targets

  21. "There is no data like more data", but data is expensive to collect and label, and typically unavailable in large quantities for every speaker, language and environment. Significant resources and expertise are necessary for creating, maintaining and customizing conversational machines. Speech input/output is insufficient for accommodating for system failures and for creating complex automated applications for anyone and anywhere. Conversational MachinesLessons Learned

  22. Multimodal Technology Components Speech Speech Pen Gesture Visual TTS ASR Automatic SpeechRecognition Text-to-SpeechSynthesis Data, Rules Words Words SLG SLU Spoken Language Generation Spoken LanguageUnderstanding DM Action Meaning DialogManagement

  23. Commercial Spoken Dialog Systems

  24. Speech Scientist VUI Designer usability testing 8 full deployment speech science 7 Analyst VUI Designer 2 3 1 VUI design 10 9 6 VUI development partial deployment 4 5 requirements high level system design system engineering integration Architect, App Developer Engineer The Speech Application Lifecycle

  25. MRCP SSML, SRGF, EMMA The Voice Web Web Server Telephony Platform Voice Browser Internet TTS ASR VoiceXML /SALT Telephone

  26. The Speech Technology Market Speech to Speech Translation Entertainment Server-based Telephony Conversational Desktop Dictation Security Embedded Car Cell Call Center Automation Multimodal/ Multimedia

  27. Business in Conversational Technology • Return on Investment (ROI) • Reduce cost • Enable self service options • New revenue opportunities • Customer Retention • Better user interface • Reduce waiting time for callers • Reduce misrouting • Branding • Project a new image and brand awareness • Use of persona HOSTING APPLICATION DEVELOPERS TOOLS PLATFORM INTEGRATORS TECHNOLOGY VENDORS

  28. VISIONS, TECHNOLOGY, AND BUSINESS OF CONVERSATIONAL MACHINES

More Related