Multimodal user interfaces: Implementation

Multimodal user interfaces: Implementation Chris Vandervelpen chris.vandervelpen@uhasselt.be

Overview • Introduction • VoiceXml • X+V • From models to X + V • Demo: ACCESS Netfront • Conclusions • Questions

Introduction • Focus on speech/direct manipulation on mobile device • How can we deploy a multi modal UI • Build our own framework using speech synthesizer/recognizers that interpret the designed models (reinventing the wheel) • Build software that generates standardized markup from the models (use existing technologies)  start point

VoiceXml • Markup language for speech only interfaces • Telephone interfaces • Using grammars for speech recognition • Java Speech Grammar Format (JSGF) • Nuance Grammar Specification Language (NGSL) • Speech output • Synthesis • Prerecorded audio • http://www.voicexml.org

VoiceXml <vxml:form> <vxml:field name=“departure_city“> <vxml:grammar> <![CDATA[ #JSGF V1.0; grammar cities; <city> = brussels | antwerp | amsterdam; ]]> </vxml:grammar> <vxml:prompt> What departure city do you like?? </vxml:prompt> <vxml:catch event="help nomatch noinput"> For example, brussels, antwerp or amsterdam </vxml:catch> <vxml:filled> <vxml:prompt>Your departure city is <vxml:value=“expr=departure_city” /></vxml:prompt> </vxml:filled> </vxml:field> <vxml:field name=“destination_city“> ……… </vxml:field> </vxml:form>

VoiceXml • Mixed-initiative forms • Single user input for several fields • Supports more natural language • For example • I want to fly from “brussels” to “amsterdam” • Filling in departure_city and destination_city fields

X + V • X + V • XHtml: visual channel • VoiceXml snippets: speech channel • Synchronization between modalities using Xml Events • Multimodal browsers supporting X+V • ACCESS Netfront multimodal browser (PocketPC) • Opera • http://www.voicexml.org/specs/multimodal/x+v/12/

X + V <html> <body> <form> <input id=“from” name=“from” size=“20” ev:event=“inputfocus”ev:handler=“#voice_city_from” /> <input id=“to” name=“to” size=“20” ev:event=“inputfocus” ev:handler=“#voice_city_to” /> </form> </body> </html>

X + V <vxml:form id=“voice_city”> <vxml:field name=“departure_city_field“ id=“voice_city_from”> <vxml:grammar> <![CDATA[ #JSGF V1.0; grammar cities; <city> = brussels | antwerp | amsterdam; ]]> </vxml:grammar> <vxml:prompt> What departure city do you like?? </vxml:prompt> <vxml:catch event="help nomatch noinput"> For example, brussels, antwerp or amsterdam </vxml:catch> <vxml:filled> <vxml:assign name=“document.getElementById(‘from)” expr=“departure_city” /> </vxml:filled> </vxml:field> <vxml:field name=“destination_city_field“ id=“voice_city_to” > ……. </vxml:field> </vxml:form>

X + V • Also usable with XForms • VoiceXml snippets and XForms influence same XForms instance model  synchronization

Models to X + V

Models to X + V • Annotate UI description for speech [Shao2003: Transcoding HTML to VoiceXML Using Annotations] • Extend this approach to UIML and X + V • Identify particular information structures • Text areas • Menu/List structures • Top-level visual region • Define their representation in XHTML and VoiceXml • Generate the synchronization XML eventing code

Model to X + V • Define a generic UIML widget vocabulary mapping for both GUI and speech [Plomp2002] • TextEntry • <field> (VoiceXml) • <input type=“text” /> (XHtml) • System.Windows.Forms.TextBox • Collection • <form> (VoiceXml) • <form> (XHtml) • System.Windows.Forms.Panel

Demo • Access Netfront multimodal browser • PocketPC • Ordering pizza • Ordering Chinese

Conclusions • X + V • built-in modality synchronization • alternative to own multimodal implementation • declarative • transformation from UIML possible

Questions?

Multimodal user interfaces: Implementation