Understanding VoiceXML 2.0: Enhancing Interactive Voice Applications
VoiceXML 2.0 is a powerful markup language designed for creating voice user interfaces. It simplifies client/server interactions by allowing multiple interactions per document and shields developers from low-level details. Its portability across platforms makes it ideal for content and service providers. Key features include synthesized speech output, audio playback, input recognition, and dialog control. Whether for simple prototypes or complex dialogues, VoiceXML supports diverse voice applications, ensuring a robust user experience while promoting ease of development across different platforms.
Understanding VoiceXML 2.0: Enhancing Interactive Voice Applications
E N D
Presentation Transcript
About VoiceXML 2.0 Stefanie Shriver a lot of this stuff is pulled directly from the 2.0 spec: http://www.w3.org/TR/voicexml20/
Why use VoiceXML? • Minimizes client/server interactions by specifying multiple interactions per document • Shields application authors from low-level, and platform-specific details • Separates user interaction code (in VoiceXML) from service logic (CGI scripts) • Promotes service portability across implementation platforms. VoiceXML is a common language for content providers, tool providers, and platform providers • Easy to use for simple interactions, yet provides language features to support complex dialogs
VoiceXML has features to handle: • Output of synthesized speech (text-to-speech) • Output of audio files • Recognition of spoken input • Recognition of DTMF input • Recording of spoken input • Control of dialog flow • Telephony features such as call transfer and disconnect
What can you do with VoiceXML? • Create simple dialogs, simply. • Good for prototyping (hmm, would this have worked for USI keyword experiment?) • Create more complex dialogs with some work. • "VoiceXML supports a limited type of mixed initiative. VoiceXML does NOT support the user asking arbitrary questions during a dialog." • I think it can actually be more arbitrary than this, though, with more complex grammars.
[ SUB_PLST:plst {<option strcat($plst "^d=na^ps=true")>} SUB_TMST:tmst {<option strcat($tmst "^d=na^ps=false")>} SUB_HELP:hst {<option strcat($hst "^d=na^ps=hh")>} SUB_STARTOVER:sst {<option strcat($sst "^d=na^ps=hh")>} SUB_QUIT:qst {<option strcat($qst "^d=na^ps=hh")>} SUB_UPDATE:updt {<option strcat($updt "^d=na^ps=updt")>} SUB_LEADER:ldr {<option strcat($ldr "^d=na^ps=ldr")>} SUB_NEWGAME:ngm {<option strcat($ngm "^ps=go")>}] SUB_STARTOVER [ (start over) {return (strcat("event=" "home"))} (?(go) home) {return (strcat("event=" "home"))}] SUB_NEWGAME [ (?(i want to) go to SUB_ALLTEAMS:t ?(game)) {return (strcat(strcat("team=" "")strcat($t "^d=today")))} ([(tell me) what] about the SUB_ALLTEAMS:t ?(game)) {return (strcat(strcat("team=" "")strcat($t "^d=today")))} (?(i want to) go to the last SUB_ALLTEAMS:t ?(game)) {return (strcat(strcat("team=" "")strcat($t "^d=last")))} ([(tell me) what] about the last SUB_ALLTEAMS:t ?(game)) {return (strcat(strcat("team=" "")strcat($t "^d=last")))} (?(i want to) go to yesterday's SUB_ALLTEAMS:t ?(game)) {return (strcat(strcat("team=" "")strcat($t "^d=yesterday")))} ([(tell me) what] about yesterday's SUB_ALLTEAMS:t ?(game)) {return (strcat(strcat("team=" "")strcat($t "^d=yesterday")))}] SUB_QUIT [ (quit) {return (strcat("event=" "quit"))} (good bye) {return (strcat("event=" "quit"))}] SUB_PLST [ [ (SUB_PLAYER:p SUB_STAT:s) (SUB_STAT:s SUB_PLAYER:p) (?[(give me) (tell me ?(about)) (what is)] ?(the) SUB_STAT:s ?(for) SUB_PLAYER:p) (?(SUB_TELLME) how many SUB_STAT:s SUB_PLAYER:p has ?(had))
Resources • http://www.w3.org/TR/voicexml20/ • http://www.voicexml.org/ • Development platforms: • http://studio.tellme.com/ • http://cafe.bevocal.com • http://freespeech.heyanita.com • http://developer.voicegenie.com/ • see http://www.commweb.com/article/COM20010129S0003 for an article comparing these platforms
What about SALT? • http://www.saltforum.org/ • SALT developed/promoted by Microsoft, Philips, SpeechWorks, Intel, Cisco, Comverse • VoiceXML developed/promoted by AT&T, Lucent, IBM, Motorola
SALT features • Focus on multi-modal development • Supports XML form of SRGS • Parallel tasks • Applications are DOM based • Uses SSML for speech synthesis • Call Control • Applications are scripted in ECMAScript (aka Javascript) • Uses fewer XML elements (see http://www.voicexmlplanet.com/articles/saltspec.html)
Multi-modality in SALT <xhtml xmlns:salt="urn:schemas.saltforum.org/2002/02/SALT"> <!-- HTML --> ... <input name="txtBoxCity" type="text" onpendown="listenCity.Start()"/> ... <!-- SALT --> <salt:listen id="listenCity"> <salt:grammar name="gramCity" src="./city.xml" /> <salt:bind targetelement="txtBoxCity" value="//city" /> </salt:listen> </xhtml>
Discussion • Can something this "simple" really handle user- or mixed-initiative that well? • What are the implications of having a standard, but having different development platforms with different supported & proprietary features? • What do we really need to solve dialog system development? (per Alex…) • Can multi-modalality be successfully integrated (i.e. via SALT)?