Understanding VoiceXML 2.0: Enhancing Interactive Voice Applications

About VoiceXML 2.0 Stefanie Shriver a lot of this stuff is pulled directly from the 2.0 spec: http://www.w3.org/TR/voicexml20/

Why use VoiceXML? • Minimizes client/server interactions by specifying multiple interactions per document • Shields application authors from low-level, and platform-specific details • Separates user interaction code (in VoiceXML) from service logic (CGI scripts) • Promotes service portability across implementation platforms. VoiceXML is a common language for content providers, tool providers, and platform providers • Easy to use for simple interactions, yet provides language features to support complex dialogs

VoiceXML has features to handle: • Output of synthesized speech (text-to-speech) • Output of audio files • Recognition of spoken input • Recognition of DTMF input • Recording of spoken input • Control of dialog flow • Telephony features such as call transfer and disconnect

What can you do with VoiceXML? • Create simple dialogs, simply. • Good for prototyping (hmm, would this have worked for USI keyword experiment?) • Create more complex dialogs with some work. • "VoiceXML supports a limited type of mixed initiative. VoiceXML does NOT support the user asking arbitrary questions during a dialog." • I think it can actually be more arbitrary than this, though, with more complex grammars.

[ SUB_PLST:plst {<option strcat($plst "^d=na^ps=true")>} SUB_TMST:tmst {<option strcat($tmst "^d=na^ps=false")>} SUB_HELP:hst {<option strcat($hst "^d=na^ps=hh")>} SUB_STARTOVER:sst {<option strcat($sst "^d=na^ps=hh")>} SUB_QUIT:qst {<option strcat($qst "^d=na^ps=hh")>} SUB_UPDATE:updt {<option strcat($updt "^d=na^ps=updt")>} SUB_LEADER:ldr {<option strcat($ldr "^d=na^ps=ldr")>} SUB_NEWGAME:ngm {<option strcat($ngm "^ps=go")>}] SUB_STARTOVER [ (start over) {return (strcat("event=" "home"))} (?(go) home) {return (strcat("event=" "home"))}] SUB_NEWGAME [ (?(i want to) go to SUB_ALLTEAMS:t ?(game)) {return (strcat(strcat("team=" "")strcat($t "^d=today")))} ([(tell me) what] about the SUB_ALLTEAMS:t ?(game)) {return (strcat(strcat("team=" "")strcat($t "^d=today")))} (?(i want to) go to the last SUB_ALLTEAMS:t ?(game)) {return (strcat(strcat("team=" "")strcat($t "^d=last")))} ([(tell me) what] about the last SUB_ALLTEAMS:t ?(game)) {return (strcat(strcat("team=" "")strcat($t "^d=last")))} (?(i want to) go to yesterday's SUB_ALLTEAMS:t ?(game)) {return (strcat(strcat("team=" "")strcat($t "^d=yesterday")))} ([(tell me) what] about yesterday's SUB_ALLTEAMS:t ?(game)) {return (strcat(strcat("team=" "")strcat($t "^d=yesterday")))}] SUB_QUIT [ (quit) {return (strcat("event=" "quit"))} (good bye) {return (strcat("event=" "quit"))}] SUB_PLST [ [ (SUB_PLAYER:p SUB_STAT:s) (SUB_STAT:s SUB_PLAYER:p) (?[(give me) (tell me ?(about)) (what is)] ?(the) SUB_STAT:s ?(for) SUB_PLAYER:p) (?(SUB_TELLME) how many SUB_STAT:s SUB_PLAYER:p has ?(had))

Resources • http://www.w3.org/TR/voicexml20/ • http://www.voicexml.org/ • Development platforms: • http://studio.tellme.com/ • http://cafe.bevocal.com • http://freespeech.heyanita.com • http://developer.voicegenie.com/ • see http://www.commweb.com/article/COM20010129S0003 for an article comparing these platforms

What about SALT? • http://www.saltforum.org/ • SALT developed/promoted by Microsoft, Philips, SpeechWorks, Intel, Cisco, Comverse • VoiceXML developed/promoted by AT&T, Lucent, IBM, Motorola

SALT features • Focus on multi-modal development • Supports XML form of SRGS • Parallel tasks • Applications are DOM based • Uses SSML for speech synthesis • Call Control • Applications are scripted in ECMAScript (aka Javascript) • Uses fewer XML elements (see http://www.voicexmlplanet.com/articles/saltspec.html)

Multi-modality in SALT <xhtml xmlns:salt="urn:schemas.saltforum.org/2002/02/SALT">  ... <input name="txtBoxCity" type="text" onpendown="listenCity.Start()"/> ...  <salt:listen id="listenCity"> <salt:grammar name="gramCity" src="./city.xml" /> <salt:bind targetelement="txtBoxCity" value="//city" /> </salt:listen> </xhtml>

Discussion • Can something this "simple" really handle user- or mixed-initiative that well? • What are the implications of having a standard, but having different development platforms with different supported & proprietary features? • What do we really need to solve dialog system development? (per Alex…) • Can multi-modalality be successfully integrated (i.e. via SALT)?

Understanding VoiceXML 2.0: Enhancing Interactive Voice Applications