Voice XML

Voice XML Team 1 Matt Ganis, Jonathan Hill, Henry Wong Anne I. Mannette-Wright

Agenda • History of Voice Applications and Voice XML • Related Voice Type Languages • Advantages of Voice XML • Architecture of VoiceXML • Paper 1 • Paper 2 • Paper 3 • Demonstration • Voice XML 2.0 • Differences between Voice XML 1.0 and 2.0 • The Future – Voice XML 2.1 Team 1 VoiceXML

History of Voice Applications • Voice technologies emerged in the 1990s : • Automatic Speech Recognition (ASR) • Small vocabulary and speech recognition problems were solved • Text-to-Speech Systems • Can generate speech responses on the fly • Interactive Voice Response (IVR) applications Team 1 VoiceXML

History of Voice Applications IVRs became programmable but programmable IVRs are: • Difficult to program (call scripting is often vendor specific) so each vendor had to “reinvent wheel” • Did not allow for the easy movement of an application from one IVR to another due to the proprietary nature of IVRs Team 1 VoiceXML

History of Voice XML • 1995: AT&T started work on Phone Markup Language (PML) • Oct.1998: Motorola developed VoxML (Voice Markup Language) • Feb.1999: IBM developed SpeechML technology • Mar.1999: VoiceXML Forum was formed by IBM, AT&T, Lucent, and Motorola • Mission was to design a standard dialog design language that developers could use to build conversational applications • March 2000: VoiceXML Forum releases VoiceXML 1.0 to the general public • May 2000: accepted by W3C Team 1 VoiceXML

W3C Speech Interface Framework From McGashan, Dr. Scott, “VoiceXML 2.0 from the Inside”, retrieved from www.voicexmlreview.org/Dec2001/features/inside.html Team 1 VoiceXML

Related Voice Type Languages • Related to VoiceXML • Grammar XML (grXML) • Provides speech grammars used by speech recognition engines • Speech Synthesis Markup Language (SSML) • SSML specification is based upon JSML(J Speech Markup Language) and JSGF (J Speech Grammar Format) specifications, which are owned by Sun. • Introduced in September 2004 is currently a W3C standard at Version 1.0 • Standardized way of specifying how text is rendered as speech and includes tags for pronunciation, tone, inflection, etc. • Often embedded in VoiceXML scripts to drive interactive telephony systems. Team 1 VoiceXML

Related Voice Type Languages • Related to VoiceXML (Continued) • Call Control XML (CCXML) • W3C standard markup language for controlling telephony and telephony equipment; currently at Version 1.0 • Performs tasks such as setting up conference calls, transferring incoming calls, etc. • Works hand-in-hand with VoiceXML Team 1 VoiceXML

Architecture of VoiceXML From: http://www.w3.org/TR/voicexml/Voice eXtensible Markup Language (VoiceXML™) version 1.0 Team 1 VoiceXML

Advantages of Voice XML • VoiceXML is a markup language that: • Minimizes client/server interactions by specifying multiple interactions per document. • Shields application authors from low-level, and platform-specific details. • Separates user interaction code (in VoiceXML) from service logic (e.g. CGI scripts). • Promotes service portability across implementation platforms. VoiceXML is a common language for content providers, tool providers, and platform providers. • Is easy to use for simple interactions, and yet provides language features to support complex dialogs. Team 1 VoiceXML

Paper 1 • Authored by Bruce Lucas: “ VoiceXML for Web-based Distributed Conversational Applications” • Presents an introduction to VoiceXML • Comparison to HTML • Support for Natural Dialogue Team 1 VoiceXML

Paper 1 • VoiceXML is an XML application which results in the following benefits: • Allows the reuse and easy retooling of existing tools for creating, transforming, and parsing XML documents • Allows VoiceXML to make use of other complementary XML-based standards. Example: Java Speech Markup Language for speech synthesis • A form is VoiceXML’s basic dialogue unit • Contains a set of inputs (fields) • Specifies what to do with a set of fields after data is collected • A field includes a prompt and a specification of what the user is allowed to say Team 1 VoiceXML

Paper 1 - VoiceXML Code Example <?xml version=”1.0”?> <vxml version=”1.0”> <menu> <prompt>Say one of: <enumerate/></prompt> <choice next=”http://www.sports.example/sports.vxml”> Sports scores </choice> <choice next=”http://www.weather.example/weather.vxml”> Weather information </choice> <choice next=”#login”> Log in </choice> </menu> <form id=”login”> <field name=”phone_number” type=”phone”> <prompt>Please say your complete phone number</prompt> </field> <field name=”pin_code” type=”digits”> <prompt>Please say your PIN code</prompt> </field> <block> <submit next=”/servlet/login”/> </block> </form> </vxml> Team 1 VoiceXML

Paper 1 • VoiceXML includes support for common field types including numbers, digits, phone, date and time AND for user-specified fields using grammars <form> <field name=”drink”> <prompt>What would you like to drink?</prompt> <grammar> coffee | tea | orange juice | milk | nothing </grammar> </field> <field name=”sandwich”> <prompt>What sandwich would you like?</prompt> <grammar src=”sandwiches.gram”/> </field> <block> <submit next=”/servlet/order”/> </block> </form> Team 1 VoiceXML

Paper 1 – The Distributed Model • VoiceXML provides support for advanced features such as: • Local validation and processing • Audio playback and recording • Support for context specific and taped help and reusable sub dialogues From: Lucas, Bruce, “VoiceXML for Web-Based Distributed Conversational Applications, Communications of the ACM, Vol.43, No.9, September 2000. Team 1 VoiceXML

Paper 1 – VoiceXML compared with HTML • An HTML document is a single unit specified by a URI and presented to the user all at once • A VoiceXML document contains a number of dialogue units (menus or forms) presented sequentially • An HTML document has no markup language to identify distinct units • A VoiceXML document is structured to reflect the sequential nature of the voice medium • An HTML document is like one single dialogue • A VoiceXML document requires dialogue elements so they can be presented one at a time. • VoiceXML has application logic for sequencing among dialogue units Team 1 VoiceXML

Paper 1 – Support for Natural Dialogue • VoiceXML supports “directed” and “mixed initiative” dialogues • “directed” dialogues: the computer directs the conversation at each step by prompting the user for the next piece of information Example: C: On what date do you wish to fly? H: May 6th • “mixed initiative” dialogues: each participant can take the initiative in leading a conversation. VoiceXML does this by allowing input grammars to be specified at the form level C: How can I help you? H: I’d like to fly from New York on May 8th C: Where would you like to fly to? Team 1 VoiceXML

Paper 2 • Concepts of Programming by Voice • Motivated by need to program without typing, therefore preventing repetitive stress injuries (RPI), a common injury among those who spend long hours typing • Voice-activated software for the disabled is a prime motivator in development • Paper proposes a system that creates an environment for voice-activated programming Team 1 VoiceXML

Paper 2 • Costs of such software has fallen dramatically; • $7500 in 1998 • $100 in 2005 • Products Include; • Dragon Naturally Speaking • IBM Via Voice • Hausbie Voice Express Team 1 VoiceXML

Paper 2 • Authors developed a generator called VocalGenerator using Dragon Naturally Speaking with MS Visual C++ • Input = a context-free grammar compatible with most programming languages • Output = An environment in which a voice recognition, syntax-directed program can be written by voice input alone • Allows for better recognition and selection of sections of code Team 1 VoiceXML

Paper 2 • Evaluation of the product • Programming is faster using a Syntax directed voice recognition system than a natural language DVR • A programmer suffering from repetitive stress injuries will be able to program at a speed sufficient to ‘maintain competitive employment’ Team 1 VoiceXML

Paper 3 • Paper 3 focuses on ‘V-commerce’ – through a survey of Voice XML applications for business communication • Looks at the inherent risks in human to human communication and the challenges these pose to human to computer communication • Examines speech recognition • Seeks to leverage the predominance of telephone usage globally Team 1 VoiceXML

Paper 3 • Utilizes the W3C Voice Browser Working Group design criteria including; • Consistency • Interoperability • Generality • Internationalization • Generalization and Readability • Implementation Team 1 VoiceXML

Paper 3 • Looks at the potential for Voice-activated Web interface • Looks at a transactional communication method with six phases; • Sender has an idea • Sender transforms the idea into a message • Sender transmits a message • Receiver gets the message • Receiver interprets the message • Receiver reacts and sends feedback Team 1 VoiceXML

Paper 3 • Challenges Include • Unproven business models • Business Process Change Requirements • Channel conflicts • Technology hurdles • Legal issues • Security & privacy Team 1 VoiceXML

Paper 3 • Conclusions • Speech is natural, flexible and efficient • Voice technology will improve • Voice recognition capabilities will improve • The intersection of voice recognition, telecom and Web technologies may lead to a large market for products that take advantage of this intersection Team 1 VoiceXML

Demo • Using TellMe Studio (http://studio.tellme.com) • TellMe Studio provides you with resources to: • Build and test your own Internet-powered "phone sites" with nothing but your Web browser and an ordinary telephone in the following ways: • Type VoiceXML directly into an area called the “Scratchpad” and then call the phone number to preview the code • Publish the VoiceXML and audio files on a publically accessible Web server, point Studio at the URL for your application's "home page", and once again call the Studio phone number to preview the application • Browse and leverage an extensive library of sample code, grammars, audio, and VoiceXML documentation • Participate in the Voice Web development community through open newsgroups Team 1 VoiceXML

Demo (Continued) • This demo – Drink Recipes I - will use one of the “prebuilt” VoiceXML scripts available from the TellMe Studio Code Library • This version of Drink Recipes • asks the caller for a drink name • in response, plays back the drink's ingredients list and mixing instructions. • demonstrates the use of large grammars and how to create data-driven applications. Team 1 VoiceXML

VoiceXML 2.0 From: McGashan, Dr. Scott, “VoiceXML 2.0 from the Inside”, retrieved from http://www.voicexmlreview.org/Dec2001/features/inside.html Team 1 VoiceXML

Differences Between VoiceXML 2.0 Differences between VoiceXML 1.0 and 2.0: • Interoperability • Functional Completeness • Clarity Team 1 VoiceXML

VoiceXML 2.0 Interoperability: VoiceXML 2.0 contains the following new formats that guarantee developers that their applications run on any VoiceXML platform conforming to the VoiceXML 2.0 specification: • input: XML Format of the Speech Recognition Grammar Specification for speech and DTMF input; VoiceXML 1.0 did not require any particular speech grammar format • output: Speech Synthesis Markup Language (SSML) is used for text-to-speech and audio output; VoiceXML 1.0 did not use SSML and its speech markup elements are not supported in Voice XML 2.0 Team 1 VoiceXML

VoiceXML 2.0 Interoperability: (Continued) • protocol: the HTTP protocol for fetching documents and resources is supported. Voice XML 1.0 did not require support for HTTP • audio: audio platforms recommended for support in VoiceXML 1.0 are now required in VoiceXML 1.0 Team 1 VoiceXML

VoiceXML 2.0 Functional Completeness: New elements, attributes and variables have been added in VoiceXML 2.0 that enable developers to ensure that key aspects of the cycle of generating system output, interpreting user input and transitioning from one dialog to another is described. NOTE: VoiceXML 1.0 contained “gaps” for example: when prompts were played to the user Some of the new/enhanced elements, variables and support include: • application.lastresult$ variable: provides info about last recognition in the application • <log> element: generates a debug message • <throw> and <catch> elements: enhanced to provide more info • <audio> element: enhanced with an “expr” attribute • <menu>: enhanced with “accept” attribute • Enhanced support for greater control over universal grammars Team 1 VoiceXML

VoiceXML 2.0 Clarity: Voice XML 2.0 provides a clear description and interpretation of ALL elements (and their attributes), how they interact with one another, and their expected behavior. NOTE: VoiceXML 1.0 contains omissions and contradictions in this respect Some clarification changes include: • Subdialogs: <subdialog> description clarified • Root and Leaf document definitions explicitly defined • Prompt queueing and input collection: relationship between these two clarified • Relationship between VoiceXML 2.0 and ECMAScript variables clarified • VoiceXML 2.0 clarifies conformance between VoiceXML documents and VoiceXML processors • Alignment of VoiceXML 2.0 with Speech Grammar and Speech Synthesis specifications Team 1 VoiceXML

VoiceXML 2.1 • Voice XML 2.1was released on June 13, 2005 by the W3C as a “candidate” recommendation • Voice XML 2.1 proposes 8 enhancements to VoiceXML 2.0 as follows: • Referencing grammars dynamically • Referencing scripts dynamically • Using <mark> to detect Barge-in during prompt playback • Using <data> to fetch XML without requiring a dialog transfer • Concatenating prompts dynamically using <foreach>. • Recording user utterances while attempting recognition • Adding namelist to <disconnect> • Adding type to <transfer> Team 1 VoiceXML

References • Ali, Sanwar, Albohali, Mohamed, Wibowo, Kustim, “VoiceXML for Business Applications: A Survey”, First Annual ABIT Conference, May 3-5, 2001, Pittsburg, Pennsylvania. • Arnold, Stephen A., Mark, Leo and Goldthwaite, John, “Programming by Voice, VocalProgramming”, ASSETS’00, November 13-15, Arlington, Virginia • Lucas, Bruce, “VoiceXML for Web-based Distributed Conversational Applications”, Communications of the ACM, September 2000, Vol.43, No.9, pp.53-57. • http://www.w3.org/TR/voicexml/Voice eXtensible Markup Language (VoiceXML version 1.0} • http://www.w3.org/TR/voicexml/Voice eXtensible Markup Language (VoiceXML version 2.0) • http://www.w3.org/TR/voicexml/Voice eXtensible Markup Language (VoiceXML version 2.1) • https://studio.tellme.com/vxml2/ovw/migrating21.html • http://www.voicexmlreview.org/Dec2001/features/inside-full.html • McGashan, Dr. Scott, “VoiceXML 2.0 from the Inside”, retrieved from www.voicexmlreview.org/Dec2001/features/inside.html Team 1 VoiceXML

Voice XML

Voice XML

Presentation Transcript

Voice XML and Speech Applications

Voice

VOICE

VOICE

VOICE

Voice

Voice

Voice

Voice

Voice

Voice

Voice

VOICE

XML & XML Schema

Voice…

Voice XML- Voice Markup Language

Multimodal Architecture for Integrating Voice and Ink XML Formats

XML to XML through XML

VOICE

Voice XML and Speech Applications

Voice XML Canada Inc.

Voice

Voice XML

Voice XML

Presentation Transcript

Voice XML and Speech Applications

Voice

VOICE

VOICE

VOICE

Voice

Voice

Voice

Voice

Voice

Voice

Voice

VOICE

XML &amp; XML Schema

Voice…

Voice XML- Voice Markup Language

Multimodal Architecture for Integrating Voice and Ink XML Formats

XML to XML through XML

VOICE

Voice XML and Speech Applications

Voice XML Canada Inc.

Voice

XML & XML Schema