Exploring Voice Technologies: TTS, ASR, and Dialog Systems

Introduction and overview

Outline • A short history of the field • Speechsynthesis (TTS) • Automaticspeechrecognition (ASR) • Dialog system architectures • Voice on the Web (perhaps show the Siri video) • Voice on the Web and W3C Standards • Relation to linguistictheory • A brief look at the course plan • What this course is not about • What this course could mean to you • Introduction to labassignments and platforms • Designing and developingspoken dialog systems • Present project • Givehomeassignment 1: Call flow design and evaluation • Present Labassignment 1)

A short history of the field • 1966, Joseph Weizenbaum, Eliza • Sundial ATIS Verbmobil • AIML NLP system • VXML "Voice XML", dialog markuplanguage (primarily for telephony) developedinitially by AT&T thenadministered by an industryconsortium and finally a W3C specification. • Voxeo, Tropo

Speech synthesis speech text

Speech recognition text (or some semantic representation) speech

Dialog management • Finite-statebaseddialog management • Framebased (form-based) dialog management • Information-statebased dialog management • Plan baseddialog management

Spoken dialogue system

Why voice • Wireless deviceshave small screens and limited input capabilities. • Telephone keypadcangiveusersonly a limitednumber of choices. • Speech technology is improving. • The exchange of information between a person and a computer is becomingmore like a real conversation. • Userswanthands-free or eyes-freeuse. • From a business viewpoint, voiceapplicationsopen up a host of new revenueopportunities. • Thereexistmanymoretelephonesthancomputers with the potential to access the Internet.

TraditionalInteractiveVoiceResponse (IVR)

Speechversus Touch Tone

Applications • Information providing systems: • weather reports • stock quotes • timetables • Transaction-based systems: • calendarfunctions • shopping • financialtransactions • travel reservations

Architecture 1

Architecture 2

Components • Naturallanguageunderstanding • Proper Nameidentification • part of speechtagging • parser • dialog manager • output generator • naturallanguage generator • gesture generator • layout engine • input recognizer/decoder • automaticspeechrecognizer • gesturerecognizer • handwritingrecognizer • output renderer • text-to-speechengine • talkinghead • robot or avatar • multi-modal fusion

Types of systems • by modality • text-based • spoken dialog system • graphical user interface • multi-modal • by device • telephone-based systems • PDA systems • in-car systems • robot systems • desktop/laptop systems • native • in-browser systems • in-virtual machine • in-virtual environment • robots • by style • command-based • menu-driven • natural language • speech graffiti • by initiative • system initiative • userinitiative • mixed initiative • by application • information service • command-and-control • entertainment • education/tutorial • edutainment • reminder systems • companion systems • healthcare • eldercare • assistive/access systems

Mobile voice apps • Voice on the Web • http://www.youtube.com/watch?v=OURZpqh-35A&eurl=&feature=player_embedded

Relation to other fields • Phonetics • Phonology • Syntax • Semantics • Pragmatics • spoken language understanding • psycholinguistics • human communication • discourse analysis • human-computer interaction • computational linguistics • NL-parsing • NL-generation • language modeling • multi-modal fusion • multi-modal fission • psychology • cognitive science • affective dialog • user modeling • embodied communication

A brief look at the course plan

What this course is not about • Sophisticated dialog management • Multi-modal systems • Non-spoken dialog systems

What this course could mean to you • Will prepare you for writing a thesis in the area of dialog systems (if you so choose) • Will prepare you for work in the industry • A link to the linkedin page

Is this something for a linguist?

Roles in the process • Dialog designer • VoiceXML programmer • Voice talent • Grammar writer • TTS specialist • Speech recognition specialist • Quality assurance specialist • Server specialist • Manager

Who are the big players in the area? • Google • http://googleblog.blogspot.com/2010/12/can-we-talk-better-speech-technology.html • Microsoft • http://gigaom.com/2010/12/06/microsoft-claims-its-place-in-a-voice-enabled-world/ • Apple • http://www.dailyfinance.com/story/company-news/apples-siri-purchase-heats-up-the-race-toward-a-voice-activated/19458344/ • IBM • http://www.ibm.com/news/in/en/2010/08/20/a896686u56875f96.html • Nuance • http://gigaom.com/2011/01/19/nuance-releases-mobile-sdk-to-speechify-apps/ • Voxeo • AT&T

The Emergence of Speech as a Mobile Platform Market Trends Speech-Enabled Mobile AppsGainingAcceptance • Voice Control in a Mission-Critical Environment • Search Engine for Audio-Visual Content • Instantaneous Language Translation • IBM'sSpoken Web • What's Driving Speech as a Mobile Platform? • Mobile Devices and Peripherals • Cloud Computing • Open Technologies • Mashups and the Programmable Web • Legislation • Closing the (Mobile) Digital Divide • An Overview of Emerging SAAP ApplicationsCurrentSpeech-EquippedDevices are Merely the Tip of the Iceberg • SaaPEnables New Application Interaction • Spoken Alerts • Mobile Reminders • SynthesizedSpeech • Email and Text Messages • Speech-to-Text for Voicemail • SaaPEnablesVoiceUser Interfaces • SpeechRecognition: The Foundation of Speech-EnabledAppsConstrained vs. Natural Language Processing • Automated vs. Hybrid SpeechRecognition • Applications for SpeechRecognition • Speaker Authentication • Email and Text MessagesComposition • Launch and Control Mobile Apps • Special Case: VoiceActivation

Call flow and call flow diagrams

Evaluatingspeech and dialog technology

W3C Speech Standards Torbjörn Lager

The big picture HTML Webbläsare Webbservrar

The place of speech technology • … speech technology itself has a very long way to go. … the most important thing may turn out to be be not the speech technology itself, but the way in which speech technology connects to all the other technologies. Tim Berners-Lee

The big picture HTML HTML-browser VoiceXML Webb-servers VoiceXML-browser(ASR, TTS)

The What and Why of Standards • Software standards include terminology, languages and protocols specified by committees of experts for widespread use in the software industry. Software standards have both advantages and disadvantages. • Advantages: • developers can create applications using the standard languages that are portable across a variety of platforms; • products from different vendors are able to interact with each other; • a community of experts evolves around the standard and is available to develop products and services based on the standard. • Disadvantages: • some developers feel that standards may inhibit creativity and stall the introduction of superior technology. • However, in the area of speech, vendors are enthusiastic about standards and frequently complain that standards are not developed fast enough. • Emerging speech-technology standards could give a boost to an industry hampered by proprietary software and hardware.

World Wide Web Consortium http://www.w3.org/

W3C Speech Standards • Speech Recognition Grammar Specification (SRGS) – • What the user can say • Semantic Interpretation for Speech Recognition (SISR) – • What the user means • Speech Synthesis Markup Language (SSML) – • What the user hears • Pronunciation Lexicon Specification (PLS) – • How words are pronounced

Intro to XML • Standard for storage and transportation of data • Maintained by W3C (w3.org/TR/REC-xml) • Elements and tags • Well-formedness • Validity • DTD • Editor (Textmate + XMLmate)

Speech synthesis

Speech synthesis text lang speech voice persona

A peek inside the black box • http://www.explainthatstuff.com/how-speech-synthesis-works.html

Exploring Voice Technologies: TTS, ASR, and Dialog Systems

Exploring Voice Technologies: TTS, ASR, and Dialog Systems

Presentation Transcript

Introduction and Overview

Introduction and Overview

Introduction and Overview

Introduction and Overview

Introduction and Overview

Introduction and overview

Overview and Introduction

Introduction and Overview

Introduction and Overview

Introduction and overview

Introduction and Overview

Introduction and Overview

Introduction and Overview

Introduction and Overview

Introduction and Overview

Introduction and Overview

Overview and Introduction

Overview and Introduction

Introduction and Overview

Introduction and Overview

Introduction and Overview

Introduction and Overview