Multimodal user interfaces: Implementation
This presentation, led by Chris Vandervelpen, explores the design and implementation of multimodal user interfaces focusing on speech recognition and direct manipulation on mobile devices. It discusses the use of VoiceXML for speech-only interfaces and X+V for synchronized multimodal interaction. Attendees will learn about building frameworks using existing technologies and how to generate standardized markup. The session includes a demo showcasing the ACCESS.Netfront multimodal browser and concludes with insights into the future of multimodal UIs.
Multimodal user interfaces: Implementation
E N D
Presentation Transcript
Multimodal user interfaces: Implementation Chris Vandervelpen chris.vandervelpen@uhasselt.be
Overview • Introduction • VoiceXml • X+V • From models to X + V • Demo: ACCESS Netfront • Conclusions • Questions
Introduction • Focus on speech/direct manipulation on mobile device • How can we deploy a multi modal UI • Build our own framework using speech synthesizer/recognizers that interpret the designed models (reinventing the wheel) • Build software that generates standardized markup from the models (use existing technologies) start point
VoiceXml • Markup language for speech only interfaces • Telephone interfaces • Using grammars for speech recognition • Java Speech Grammar Format (JSGF) • Nuance Grammar Specification Language (NGSL) • Speech output • Synthesis • Prerecorded audio • http://www.voicexml.org
VoiceXml <vxml:form> <vxml:field name=“departure_city“> <vxml:grammar> <![CDATA[ #JSGF V1.0; grammar cities; <city> = brussels | antwerp | amsterdam; ]]> </vxml:grammar> <vxml:prompt> What departure city do you like?? </vxml:prompt> <vxml:catch event="help nomatch noinput"> For example, brussels, antwerp or amsterdam </vxml:catch> <vxml:filled> <vxml:prompt>Your departure city is <vxml:value=“expr=departure_city” /></vxml:prompt> </vxml:filled> </vxml:field> <vxml:field name=“destination_city“> ……… </vxml:field> </vxml:form>
VoiceXml • Mixed-initiative forms • Single user input for several fields • Supports more natural language • For example • I want to fly from “brussels” to “amsterdam” • Filling in departure_city and destination_city fields
X + V • X + V • XHtml: visual channel • VoiceXml snippets: speech channel • Synchronization between modalities using Xml Events • Multimodal browsers supporting X+V • ACCESS Netfront multimodal browser (PocketPC) • Opera • http://www.voicexml.org/specs/multimodal/x+v/12/
X + V <html> <body> <form> <input id=“from” name=“from” size=“20” ev:event=“inputfocus”ev:handler=“#voice_city_from” /> <input id=“to” name=“to” size=“20” ev:event=“inputfocus” ev:handler=“#voice_city_to” /> </form> </body> </html>
X + V <vxml:form id=“voice_city”> <vxml:field name=“departure_city_field“ id=“voice_city_from”> <vxml:grammar> <![CDATA[ #JSGF V1.0; grammar cities; <city> = brussels | antwerp | amsterdam; ]]> </vxml:grammar> <vxml:prompt> What departure city do you like?? </vxml:prompt> <vxml:catch event="help nomatch noinput"> For example, brussels, antwerp or amsterdam </vxml:catch> <vxml:filled> <vxml:assign name=“document.getElementById(‘from)” expr=“departure_city” /> </vxml:filled> </vxml:field> <vxml:field name=“destination_city_field“ id=“voice_city_to” > ……. </vxml:field> </vxml:form>
X + V • Also usable with XForms • VoiceXml snippets and XForms influence same XForms instance model synchronization
Models to X + V • Annotate UI description for speech [Shao2003: Transcoding HTML to VoiceXML Using Annotations] • Extend this approach to UIML and X + V • Identify particular information structures • Text areas • Menu/List structures • Top-level visual region • Define their representation in XHTML and VoiceXml • Generate the synchronization XML eventing code
Model to X + V • Define a generic UIML widget vocabulary mapping for both GUI and speech [Plomp2002] • TextEntry • <field> (VoiceXml) • <input type=“text” /> (XHtml) • System.Windows.Forms.TextBox • Collection • <form> (VoiceXml) • <form> (XHtml) • System.Windows.Forms.Panel
Demo • Access Netfront multimodal browser • PocketPC • Ordering pizza • Ordering Chinese
Conclusions • X + V • built-in modality synchronization • alternative to own multimodal implementation • declarative • transformation from UIML possible