220 likes | 331 Vues
This paper explores the technical and design challenges of combining speech interfaces with virtual environments, focusing on the TALKING AGENT system and its architecture within the DIVE multi-user virtual reality system. We discuss speech recognition limits, interaction metaphors, and agent modeling frameworks, detailing how intelligent agents can operate in a VR world for functions like object transport and manipulation. The findings highlight the need for accurate speech recognition and effective user interaction metaphors, paving the way for future innovations in virtual reality interfaces.
E N D
A Speech Interface to Virtual Environment Authors Scott McGlashan and Tomas Axling Swedish Institute of Computer Science
Presentation Agenda • Introduction • The TALKING AGENT system • DIVE • SR/TTS • Agent Modeling Framework • Interaction Metaphor • Reference Resolution • Future Work • Conclusion
Purposes of this paper • Analyze the technical and design issues to combine a virtual world with a speech interface. • Describe system architecture of the TALKING AGENT system.
Problems of Integration • Speech Recognition : Limited vocabulary to gain accuracy. • Language Understanding : Limited knowledge to maximize the understanding. • Interaction Metaphor : Who does the user talk to? (Above questions are discussed in detail in the authors’ last paper “Speech Interface to Virtual Reality”.)
Innovation of this System • Combining intelligent agent and speech interface to carry out specialized functions in the VR World. • Functions have been implemented : • Transporting objects • Fetching objects • Painting objects • Increasing the size of objects
DIVE-Virtual Reality System • DIVE(Distribute Interactive Virtual Environment) is a multi-user virtual environment. • DIVE allow users and environment interact in real-time. • DIVE contains a database composed of hierarchically organized objects .
Speech Recognition • SR with limited pre-defined phrases promises good recognition performance. • Using grammar to set constraint to search space. • Using commercial SR-engine (Nuance).
Agent Modeling Framework • High-level languages do not support complex symbolic computations. • Oz is well suited for this purpose. • Using ODI as interface between Oz and DIVE. • The parent agent consists basic functions. • We can define more specific agent by extend parent agent.
Interaction Metaphor • Direct manipulation -Personal Presence. • Various metaphors for spoken interaction have been proposed. • Proxy • Divinity • Telekinesis • Interface Agent • This system adopt the Proxy metaphor.
Addressing Agent • Inside the user’s eye-sight • Dialogue initiated by clicking on the agent. • Outside the user’s eye-sight • Phone agent-First press the phone agent then connect to remote agent
Feedback • Given speech input ,system should give the visual feedback to the user. • If the agent listening or not? • What is the feedback when talking to agent far away?
Reference Resolution • Given some descriptions , the reference resolution engine maps them to object which user is referring to. • Considerations • Object focus. • Property Perception. • Discourse Modeling.
Robust Interaction • When errors don’t matter • User can view the results and current them by direct manipulation. • Safety-critical applications • Confirm user command. • Clarifying incomplete or ambiguous commands.
Future Work • Agent behavior should related to its previous action . • Add mental components. • Talking to agent by aura-driven . • Evaluate this system with realistic scenario. • Ex: virtual travel agency.
Conclusions • Add a speech interface to VR-system. • Using constraint SR to achieve high accuracy. • Developing an appropriate metaphor. • The agents modeled in this system provide specific functions in the virtual world.
Paper Source McGlashan, S Speech Interfaces to Virtual Reality in Proceedings of the Second Conference on the Military Applications of Synthetic Environments and Virtual Reality, Stockholm, Sweden, 1995.