1 / 38

Introduction and overview

Introduction and overview. Outline. A short history of the field Speech synthesis (TTS) Automatic speech recognition (ASR) Dialog system architectures Voice on the Web ( perhaps show the Siri video) Voice on the Web and W3C Standards Relation to linguistic theory

ordell
Télécharger la présentation

Introduction and overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction and overview

  2. Outline • A short history of the field • Speechsynthesis (TTS) • Automaticspeechrecognition (ASR) • Dialog system architectures • Voice on the Web (perhaps show the Siri video) • Voice on the Web and W3C Standards • Relation to linguistictheory • A brief look at the course plan • What this course is not about • What this course could mean to you • Introduction to labassignments and platforms • Designing and developingspoken dialog systems • Present project • Givehomeassignment 1: Call flow design and evaluation • Present Labassignment 1)

  3. A short history of the field • 1966, Joseph Weizenbaum, Eliza • Sundial ATIS Verbmobil • AIML NLP system • VXML "Voice XML", dialog markuplanguage (primarily for telephony) developedinitially by AT&T thenadministered by an industryconsortium and finally a W3C specification. • Voxeo, Tropo

  4. Speech synthesis speech text

  5. Speech recognition text (or some semantic representation) speech

  6. Dialog management • Finite-statebaseddialog management • Framebased (form-based) dialog management • Information-statebased dialog management • Plan baseddialog management

  7. Spoken dialogue system

  8. Why voice • Wireless deviceshave small screens and limited input capabilities. • Telephone keypadcangiveusersonly a limitednumber of choices. • Speech technology is improving. • The exchange of information between a person and a computer is becomingmore like a real conversation. • Userswanthands-free or eyes-freeuse. • From a business viewpoint, voiceapplicationsopen up a host of new revenueopportunities. • Thereexistmanymoretelephonesthancomputers with the potential to access the Internet.

  9. TraditionalInteractiveVoiceResponse (IVR)

  10. Speechversus Touch Tone

  11. Applications • Information providing systems: • weather reports • stock quotes • timetables • Transaction-based systems: • calendarfunctions • shopping • financialtransactions • travel reservations

  12. Architecture 1

  13. Architecture 2

  14. Components • Naturallanguageunderstanding • Proper Nameidentification • part of speechtagging • parser • dialog manager • output generator • naturallanguage generator • gesture generator • layout engine • input recognizer/decoder • automaticspeechrecognizer • gesturerecognizer • handwritingrecognizer • output renderer • text-to-speechengine • talkinghead • robot or avatar • multi-modal fusion

  15. Types of systems • by modality • text-based • spoken dialog system • graphical user interface • multi-modal • by device • telephone-based systems • PDA systems • in-car systems • robot systems • desktop/laptop systems • native • in-browser systems • in-virtual machine • in-virtual environment • robots • by style • command-based • menu-driven • natural language • speech graffiti • by initiative • system initiative • userinitiative • mixed initiative • by application • information service • command-and-control • entertainment • education/tutorial • edutainment • reminder systems • companion systems • healthcare • eldercare • assistive/access systems

  16. Mobile voice apps • Voice on the Web • http://www.youtube.com/watch?v=OURZpqh-35A&eurl=&feature=player_embedded

  17. Relation to other fields • Phonetics • Phonology • Syntax • Semantics • Pragmatics • spoken language understanding • psycholinguistics • human communication • discourse analysis • human-computer interaction • computational linguistics • NL-parsing • NL-generation • language modeling • multi-modal fusion • multi-modal fission • psychology • cognitive science • affective dialog • user modeling • embodied communication

  18. A brief look at the course plan

  19. What this course is not about • Sophisticated dialog management • Multi-modal systems • Non-spoken dialog systems

  20. What this course could mean to you • Will prepare you for writing a thesis in the area of dialog systems (if you so choose) • Will prepare you for work in the industry • A link to the linkedin page

  21. Is this something for a linguist?

  22. Roles in the process • Dialog designer  • VoiceXML programmer  • Voice talent • Grammar writer • TTS specialist • Speech recognition specialist • Quality assurance specialist  • Server specialist • Manager

  23. Who are the big players in the area? • Google • http://googleblog.blogspot.com/2010/12/can-we-talk-better-speech-technology.html • Microsoft • http://gigaom.com/2010/12/06/microsoft-claims-its-place-in-a-voice-enabled-world/ • Apple • http://www.dailyfinance.com/story/company-news/apples-siri-purchase-heats-up-the-race-toward-a-voice-activated/19458344/ • IBM • http://www.ibm.com/news/in/en/2010/08/20/a896686u56875f96.html • Nuance • http://gigaom.com/2011/01/19/nuance-releases-mobile-sdk-to-speechify-apps/ • Voxeo • AT&T

  24. The Emergence of Speech as a Mobile Platform Market Trends Speech-Enabled Mobile AppsGainingAcceptance • Voice Control in a Mission-Critical Environment • Search Engine for Audio-Visual Content • Instantaneous Language Translation • IBM'sSpoken Web • What's Driving Speech as a Mobile Platform? • Mobile Devices and Peripherals • Cloud Computing • Open Technologies • Mashups and the Programmable Web • Legislation • Closing the (Mobile) Digital Divide • An Overview of Emerging SAAP ApplicationsCurrentSpeech-EquippedDevices are Merely the Tip of the Iceberg • SaaPEnables New Application Interaction • Spoken Alerts • Mobile Reminders • SynthesizedSpeech • Email and Text Messages • Speech-to-Text for Voicemail • SaaPEnablesVoiceUser Interfaces • SpeechRecognition: The Foundation of Speech-EnabledAppsConstrained vs. Natural Language Processing • Automated vs. Hybrid SpeechRecognition • Applications for SpeechRecognition • Speaker Authentication • Email and Text MessagesComposition • Launch and Control Mobile Apps • Special Case: VoiceActivation

  25. Call flow and call flow diagrams

  26. Evaluatingspeech and dialog technology

  27. W3C Speech Standards Torbjörn Lager

  28. The big picture HTML Webbläsare Webbservrar

  29. The place of speech technology • … speech technology itself has a very long way to go. … the most important thing may turn out to be be not the speech technology itself, but the way in which speech technology connects to all the other technologies. Tim Berners-Lee

  30. The big picture HTML HTML-browser VoiceXML Webb-servers VoiceXML-browser(ASR, TTS)

  31. The What and Why of Standards • Software standards include terminology, languages and protocols specified by committees of experts for widespread use in the software industry. Software standards have both advantages and disadvantages. • Advantages: • developers can create applications using the standard languages that are portable across a variety of platforms; • products from different vendors are able to interact with each other; • a community of experts evolves around the standard and is available to develop products and services based on the standard. • Disadvantages: • some developers feel that standards may inhibit creativity and stall the introduction of superior technology. • However, in the area of speech, vendors are enthusiastic about standards and frequently complain that standards are not developed fast enough. • Emerging speech-technology standards could give a boost to an industry hampered by proprietary software and hardware.

  32. World Wide Web Consortium http://www.w3.org/

  33. W3C Speech Standards • Speech Recognition Grammar Specification (SRGS) – • What the user can say • Semantic Interpretation for Speech Recognition (SISR) – • What the user means • Speech Synthesis Markup Language (SSML) – • What the user hears • Pronunciation Lexicon Specification (PLS) – • How words are pronounced

  34. Intro to XML • Standard for storage and transportation of data • Maintained by W3C (w3.org/TR/REC-xml) • Elements and tags • Well-formedness • Validity • DTD • Editor (Textmate + XMLmate)

  35. Speech synthesis

  36. Speech synthesis text lang speech voice persona

  37. A peek inside the black box • http://www.explainthatstuff.com/how-speech-synthesis-works.html

More Related