450 likes | 746 Vues
Voice XML and Speech Applications. Outline. VoiceXML – Tellme.com, BeVocal.com Speech.NET – Microsoft. What is Voice XML?. Language for specifying voice dialogs Output: Prerecorded audio and text-to-speech (TTS) Input: Touch-tone keys and Automatic Speech Recognition (ASR)
E N D
Outline VoiceXML – Tellme.com, BeVocal.com Speech.NET – Microsoft
What is Voice XML? • Language for specifying voice dialogs • Output: • Prerecorded audio and text-to-speech (TTS) • Input: • Touch-tone keys and Automatic Speech Recognition (ASR) • Extension of XML • Designed to interact with web-based applications
VoiceXML’s History • 1995 • Phone Web project by AT&T Research • 1999 • Lucent and AT&T have incompatible dialects of Phone Markup Language • So, VoiceXML Forum created with AT&T, Lucent, Motorola, and IBM • Team develops VoiceXML 0.9, a first pass at standardization • 2000 • VoiceXML 1.0 was created and submitted to World-Wide Web Consortium (W3C) • 2001 • VoiceXML 2.0 by W3C’s Voice Browser Working Group
XML in 30 seconds • Tags and body <cmu> <welcome>Welcome to CMU! </welcome> <ecom> <welcome>Welcome to the E-commerce!</welcome> </ecom> </cmu> • zero or more attributes <welcome accent=“texan”>Welcome</welcome> <welcome accent=“pittsburgh”>Welcome</welcome> • Tag with no body </breath> • XML is Picky about syntax • All lowercase, “ ” are not optional, can be validated using a Document Type Definition (DTD).
What is VoiceXML • VoiceXML ≈ XML ≈ XHTML ≈ HTML • What is Tellme.com (800.555.TELL) • A VoiceXML gateway which is easy!
Dissecting a simple VXML program <vxml version="2.0"> <form> <block>Hello, world!</block> </form> </vxml>
What you need to get started • Go to Tellme.com • Click on Studio for Developers in the lower right • Join and login • You’ll see the VoiceXML scratchpad • Type this in and hit update: <vxml version="2.0"> <form> <block>Hello, world!</block> </form> </vxml> Congrats you’ve written your first VoiceXML application. Call 1-800-555-VXML to try it out.
Playing a Sound • The Audio Element • Must be contained with a block: <block> <audio src="ui/welcome.wav"> Welcome to the HCII </audio> </block> relative file reference
Moving around • The goto element • stop what your doing and go execute this other voice xml document. • Like clicking a link on a webpage. <block> <audio>Thanks for calling!</audio> <goto next="document2.vxml" /> </block>
Getting user input • We can talk, we can go to a different page, We just need to know what the user wants! • find out using fields • Fields are different then blocks Blocks just speak, Fields listen • “But the computer doesn’t hear too good” So tell it what to expect
More on fields • Prompt– Asks the user a question • Grammar – defines the possible answers • Can use built-in or custom • Name – name of variable that stores what the user said • GoingtoDaytonaBeach • Instructions–What the program should do based on the input • If going, "that’s great!" otherwise "bummer maybe next year"
Summary of VXML elements (1/2) • Input: • <form>, <field>, <prompt> • Output: • <audio> • Events: • <filled>, <noinput>, <nomatch>, <help>, <catch> • Transition: • <goto>, <submit>
Summary of VXML elements (2/2) • Grammars: <grammar> <![CDATA[ [ [visa] {<element “visa">} [master card] {<element “mastercard">} [american express] {<element “amex">} ] ]]> </grammar> • Selection: • <menu>, <choice>, <option> • ECMA Scripting (i.e. Javascript): • <script>, <var>, <foreach>, <if>
Application overview • Components: • VXML file (to prompt information) • ASP file (to retrieve balance) • Tellme Studio: • Get developer account (free) • Enable your Tellme extension (“free”) • Web Server • To host VXML and HTML files • .NET enabled for this demo • Not provided by Tellme
A simple example <vxml version="2.0"> <form> <field name=“goingtoBeach" type="boolean"> <prompt> “Are you going to Daytona Beach this year?” </prompt> <filled>Ohh... <if cond=“goingtoBeach">That’s great! <goto next=“goingDocument.vxml" /> <else /> bummer maybe next year. <goto next=“notgoingDocument.vxml" /> </if> </filled> </field> </form> </vxml>
A simple example Grammer & Prompt <vxml version="2.0"> <form> <field name=“goingtoBeach" type="boolean"> <prompt> “Are you going to Daytona Beach this year?” </prompt> <filled>Ohh... <if cond=“goingtoBeach">That’s great! <goto next=“goingDocument.vxml" /> <else /> bummer maybe next year. <goto next=“notgoingDocument.vxml" /> </if> </filled> </field> </form> </vxml>
A simple example <vxml version="2.0"> <form> <field name=“goingtoBeach" type="boolean"> <prompt> “Are you going to Daytona Beach this year?” </prompt> <filled>Ohh... <if cond=“goingtoBeach">That’s great! <goto next=“goingDocument.vxml" /> <else /> bummer maybe next year. <goto next=“notgoingDocument.vxml" /> </if> </filled> </field> </form> </vxml> Executes if Successfully Recognized
A simple example <vxml version="2.0"> <form> <field name=“goingtoBeach" type="boolean"> <prompt> “Are you going to Daytona Beach this year?” </prompt> <filled>Ohh... <if cond=“goingtoBeach">That’s great! <goto next=“goingDocument.vxml" /> <else /> bummer maybe next year. <goto next=“notgoingDocument.vxml" /> </if> </filled> </field> </form> </vxml> If /else used with goto to control flow
Custom Grammars • What if you want the user to be able to say something that’s not built-in? • e.g. Which hotel will you be staying at? The Hilton, the Hyatt, the Doubletree, or the Bates Motel?
Custom Grammars • Which hotel will you be staying at? The Hilton, the Hyatt,the Doubletree, or the Bates Motel ? <grammar type="application/x-gsl" mode="voice"> <![CDATA[ [ [ doubletree ] {<hotel "doubletree">} [ hilton (convention center) ]{<hotel "hilton">} [ (bates motel) ] {<hotel “bates">} [ (?the hyatt ?hotel) ] {<hotel "hyatt">} ] ]]> </grammar>
Custom Grammars • At CHI which hotel will you be staying at? The Hilton, the Hyatt,the Doubletree, or the Motel 4? <grammar type="application/x-gsl" mode="voice"> <![CDATA[ [ [ doubletree ] {<hotel "doubletree">} [ hilton (convention center) ]{<hotel "hilton">} [ (bates motel) ] {<hotel “bates">} [ (?the hyatt ?hotel) ] {<hotel "hyatt">} ] ]]> </grammar> field name What they can say
Custom Grammars • At CHI which hotel will you be staying at? The Hilton, the Hyatt,the Doubletree, or the Motel 4? <grammar type="application/x-gsl" mode="voice"> <![CDATA[ [ [ doubletree ] {<hotel "doubletree">} [ hilton (convention center) ]{<hotel "hilton">} [ (bates motel) ] {<hotel “bates">} [ (?the hyatt ?hotel) ] {<hotel "hyatt">} ] ]]> </grammar> What they can say Different options are separated by spacesOptions that are more than one word long are in ( )’sput a ? before optional words
Custom Grammars • At CHI which hotel will you be staying at? The Hilton, the Hyatt,the Doubletree, or the Motel 4? <grammar type="application/x-gsl" mode="voice"> <![CDATA[ [ [ doubletree ] {<hotel "doubletree">} [ hilton (convention center) ]{<hotel "hilton">} [ (bates motel) ] {<hotel “bates">} [ (?the hyatt ?hotel) ] {<hotel "hyatt">} ] ]]> </grammar> result field variable "hotel" is set to what the user saysWe can use this result later in our if and goto’s
Grammars Tips • Grammars Languages • Nuance Grammar Specification Language (GRXML) • and Nuance Grammar Specification Language (GSL) • Tools for testing your Grammars • Syntax checker, Parse, Generate
VoiceXML VXML is ideal for non-experts in speech recognitions • Easy to understand basics in order to built simple apps Could not do this with Speech.NET
Speech.NET • Voice • No (current) Voice Portal • Multimodal • Voice, mouse, stylus, etc. Compaq TabletPC
Speech.NET Millions use Visual Studio
Speech.NET Millions use Visual Studio a few new controls to “Speech enable” apps
Speech.NET Millions use Visual Studio a few new controls to “Speech enable” apps Millions of potential speech developers
SALT Speech.NET (ASP) compiles down to SALT • Speech Application Language Tags • Prompt, listen, record, dtmf • http://www.saltforum.org/
Speech.NET BUT application developers never see SALT • Microsoft Speech.NET • ASP.NET web application • Visual (GUI) Controls • Speech Controls • Wav prompt database • Grammars
Summary • General Impressions of Speech.NET • Where’s the logic? external JS file, HTML <script> block, in properties? • Forced to scroll/expand property window • Auto-complete • Prompt Editor is very nice
Resources • Microsoft.public.netspeechsdk
Benefits of VXML? • Brings web development paradigm to IVR market • Existing HTTP gateways to existing enterprise services/data built with Internet tech like can be seamlessly extended to the phone • Anytime, anywhere access to the web via voice interface • Keypads and small displays are made moot • Great for the car (personal experience) • Standardized technology, high interoperability • Thin layer that sits on entire web technology stack • Interoperable with infrastructure, software, other standards for web deployment • Security- VPNs, SSL, cookies • Application Servers- Java Servlets, Perl, IBM Websphere, MSFT Active Server Pages • Data abstraction- XML, XSL • Database conncectivity- ODBC, SQL • Streaming media- WAV, Real, MP3 • Open Development • 15,000 developers at Tellme alone
Business Applications with Tellme • Airlines- Flight information, flight delay notification, baggage tracking, employee reservations and more. • Banking- Telephone banking, bill payment, mortgage tools, ATM and branch locators and more. • Brokerages- Telephone trading applications, retirement account management, stock alerts, financial content and more. • Government-Travel hotlines, benefits management for government services, alerts and notifications for public announcements • 511 travel directory services to Utah Government, why? • Retail- Catalog shopping applications, store locators and more.
Tellme’s and Nuance • Nuance • Speech recognition software/hardware • Nuance 8.0 speech recognition and natural language understanding server • Nuance Vocalizer- synthesizes text to speech • Nuance Verifier- identify and authenticate caller based on their voiceprint (biometrics) • Both a partner and a competitor
For more information • Tellme Studio Developer • http://studio.tellme.com/ • W3C VXML 2.0 Specification • http://www.w3.org/TR/voicexml20 • Example of a VXML application with Perl • http://www.webreference.com/perl/tutorial/20/ • Creating Voice applications with VXML and ASP .NET • http://www.devhood.com/Tutorials/tutorial_details.aspx?tutorial_id=147
VoiceXML and XML • Based on XML Tag/Attribute Format • Elements must be properly nested! <element_name attribute_name=“attribute_value”> ..contained elements.. </element_name> • All documents start with <?xml version=“1.0”?> • All other instructions are enclosed within the <vxml> tag, called “root element” <vxml version=“1.0”> ..VoiceXML Instructions.. </vxml>