1 / 44

AdaptO Adaptive Multimodal Output

AdaptO Adaptive Multimodal Output. University of Aveiro Dep. Electronics Telecom & Informatics IEETA. António Teixeira , Carlos Pereira, Miguel Oliveira e Silva, Osvaldo Pacheco, António Neves, José Casimiro. Outline. Background Motivation Living Usability Lab Project

nerita
Télécharger la présentation

AdaptO Adaptive Multimodal Output

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AdaptOAdaptive Multimodal Output University of Aveiro Dep. Electronics Telecom & Informatics IEETA António Teixeira, Carlos Pereira, Miguel Oliveira e Silva, Osvaldo Pacheco, António Neves, José Casimiro

  2. Outline Background Motivation Living Usability Lab Project A Tele-Rehabilitation Service for Elderly People Multimodality Adaptive Multimodal Output Adaptation Scenarios & Demo Conclusions & Future Work

  3. Motivation The present diversity of environments, systems and user profiles leads to a need for contextualization of the interaction. Users have individual differences due to anatomical and physiological factors (e.g., gender, vision, hearing and mobility). Efficient multimodal interfaces should also be able to take into account user’s requirements and needs.

  4. Living Usability Lab www.livinglab.pt LUL is a Portuguese industry-academia collaborative R&D project, active in the field of live usability testing, focusing on the development of technologies and services to support healthy, productive and active citizens. It adopts the principles of universal design and natural user interfaces (speech, gesture) making use of the benefits of next generation networks and distributed computing.

  5. LUL Scenario Demographic ageing is probably the greatest achievement of mankind. With suitable natural interfaces and the possibilities offered by next generation networks, the introduction of technological solutions can facilitate the daily life of the elderly, fighting isolation and exclusion, increasing their pro-activity, work capacity and autonomy. The envisaged services are: multimedia information access and exchange of personal data; tele-health and automatic medication delivery; support of daily activities and community, social and civic life; and automatic management of the environment to improve both the quality of life and security

  6. Tele-Rehabilitation “You must again be the director of your own life! You must train and rehearse to again be able to help yourself and gain independent living”. From: AAL Forum 2010 Track R5 (Rehabilitation, Training, Assistive Technologies)

  7. Advantages and Disadvantages Tele-rehabilitation has some advantages. Particularly relevant for the older adults are: availability of therapists, rehabilitation at home, reduced cost, reduced isolation. Some disadvantages: equipment cost, network bandwidth, technical expertise, safety at home, sterilization for redeployment, efficacy studies, psychological factors.

  8. Reducing disadvantages The next generation networks will certainly decrease the problems of network bandwidth; the technical expertise can be addressed by easy-to-use interaction; costs will certainly drop as the number of users increase. …

  9. Vision and hearing in older adults Age-related eye problems are a major cause of vision loss or distortion in people over 40 Presbycusis is age-related hearing loss It becomes more common in people as they get older. People with this kind of hearing loss may have a hard time hearing what others say or may be unable to stand loud sounds. The degree of hearing loss varies from person to person. Another common complaint is the difficulty in understanding rapid speech

  10. LUL Telerehabilitation Service with Multimodal Interaction Service Description

  11. Objective: New Generation of Remote Rehabilitation Services for the Elderly Combining the needs of the elderly to have professionally monitored exercises without leaving their homes with other factors, such as the availability and costs of qualified health professionals, a Tele-Rehabilitation Service was considered as one of the priority new services to develop and test in the scenario of our Living Usability Lab.

  12. Description(What) Remote and supervised exercise sessions at home or community centres, for maintaining health and prevent illness. Sessions carried out concurrently at several sites A health professional supervises everything from the training centre/hospital, including the biosensors signals captured remotely and the (multiple) cameras images.

  13. Global view of the prototype - Three main blocks, running in 3 different networked computers; - Subdivision of the application at the elderly side in two: • one, at a server, handling the more performance demanding operations; • the other in the Home TV/ Personal PC

  14. Technological description(How) The main user interface is a large size computer monitor (acting as a large size TV) with speakers and video cameras In addition, it should be possible to use a set of biosensors Sensors gather the vital signals from the patient and send them to the health professional LUL

  15. User and Context Services A user model service is also provided in order to register and fetch specific user related capabilities and preferences. Context/Environment Module responsible to register all the relevant environment conditions

  16. Some HCI requirements Output characteristics, such as the volume of the synthesized speech, must adjust themselves according to the user and the environment Speech rate must be adapted to listener and listening conditions Redundancy of modalities must be used in order to increase the chances of message delivery Use of several output modalities to make the system usable by speech and hearing impaired persons; Output modalities with capabilities, based on the environment and user, to decide to activate/deactivate themselves

  17. Multimodality

  18. What is Multimodality? A modality, or, more explicitly, a modality of information representation, is a way of representing information in some medium. By definition, a multimodal interactive system has both input and output, and uses at least two different modalities for input and/or output. Allows an integrated use of various forms of interaction simultaneously.

  19. Generic Architecture for Multimodality Fusion goal is to extract/combine meaning from a set of input modalities and pass it to the dialog manager. Interaction Manager – coordinates all of the interaction between the user and the system. It analyzes the intentions of the user, using the fusion module’s information, past interactions or context, and decides what to do next and what to answer to the user while communicating with other mechanisms such as the application, databases or services. Fission The main objective is to transmit the message, obtained from the dialog manager, to the user.

  20. Advantages of Multimodality The current migration from WIMP interfaces towards multimodal interaction has many potential benefits for seniors. The use of modalities such as speech or gestures allows developers to widen the spectrum of possible users for their systems, allowing for people with no IT knowledge or with disabilities to also interact with the systems.

  21. Less research on output According to a study by (Dumas et al., 2008), “less research has been done on multimodal fission than on fusion”. Most applications use few output modalities and, consequently, employ straightforward fission mechanisms.

  22. Related work The WWHT model by Rousseau et al., divides the life cycle of a multimodal presentation in four steps: 1. What is the information to present? 2. Which modality(ies) should be used? 3. How to present the information using the modality(ies) selected? 4. Then, how to deal with the output results? Cost Model by Coetzee et al., is a mathematical tool that takes in consideration the user’s profile and preferences. The user profile is defined in terms of abilities rather than disabilities.

  23. Problems with Common Approach The WWHT architectural scheme implicitly assumes that input and output devices are simple dummy devices responsible only for sending input information to the system and to receive output messages already adapted to the context and user. Fusion and fission coordination services are required to be very knowledgeableof all the available input and output devices, making it potentially very complex and more difficult to scale and extend our applications with new input and outputs devices.

  24. AdaptO Adaptive Multimodal Output

  25. Architecture Overview

  26. Agent-Oriented Implementation A natural choice to achieve a more autonomous and intelligent behavior in the output devices is to make them JADE agents A mature and quite stable technology, able to provide a solution for a distributed heterogeneous system, supported on a standard communication protocol (FIPA-ACL)

  27. Main Modules Agents were also used to implement all services in the architecture: the input devices, the context, user and history engines; and the fusion, fission and dialog manager services. Communication is done through an event based scheme, used to simplify and abstract away the knowledge required for the system to operate.

  28. Context/Environment service Is responsible to register all the relevant environment conditions 3 environment monitoring agents available for now Noise Luminosity Distance of the user to the output devices

  29. User Model A user model service is also provided, in order to register and fetch specific user related capabilities and preferences Examples of capabilities are: vision and hearing acuity and mobility capabilities Preferences example: the personal preference for receiving information visually

  30. Speech I/O Text-to-Speech and Speech-to-Text European Portuguese capabilities of the system, made available by a speech recognizer and a text synthesizer separated agents implemented using Microsoft's Speech Platform.

  31. AdaptO

  32. AdaptO mechanisms Two important mechanisms: Capable of deciding if, in the current environment conditions and taking in consideration user capabilities, it is in a position to be active and fulfil the request. Example: if the user is hearing impaired or the noise level is too high the synthesizer deactivates itself. Changing some of the message properties based on contextual and user information text synthesizer varies, using simple heuristics, font size as function of user vision capacity, environment lighting conditions and distance of the user to the screen speech synthesizer is capable of varying both volume and speech rate

  33. AdaptO Characteristics An output agent may require to know about aspects such as the noise level of the environment, or the user distance to the speakers. It may register itself in the context agent to receive this type of information, decoupling itself from specific input agents able to extract such information. It may also register itself in the user model service to become aware of possible hearing or comprehension user problems. All available output agents register themselves in the fission agent for output message types they are able to transmit, thus making the fission agent knowledgeable of the available (abstract) output agents, hence able to ensure fault tolerance.

  34. Synthesizer Adaptation Presently, and as proof of concept, the text synthesizer varies, using simple heuristics, the font size as a function of the user visual capacity, environment lighting conditions and distance of the user to the screen. In general the process of adaptation of a parameter from a synthesizer consists of 3 steps: calculation of individual gains/multiplication factors, ki , based on the factors chosen to affect the parameter; combination of the gains to obtain an unique gain, K = ∏i ki ; transformation of K into the range accepted by the renderer/device.

  35. Scenario 1 Choice of one synthesizer based on the user’s preferences

  36. Scenario 1 The system intends to output a message to the user and the user is close to the screen. The system knows by consulting the user model, that the user prefers to be notified via text messages. The message is transmitted via text. The system reads the modality-registry and gets the information that both the text and the speech synthesizer are ready to output the message.

  37. Scenario 2 Synthesizer becomes unavailable due to context

  38. Scenario 2 • The text synthesizer recalculates its parameters and sees that it cannot output messages in these conditions and as such notifies the modality registry that he is offline. The system needs to transmit another message to the user. Assume that initially, both synthesizers were available and registered in the modality registry. It was detected that the user went beyond the text synthesizer’s range. Since the text synthesizer is dependent from distance parameters, the context model alerts the text synthesizer to this fact. Speech synthesizer is the only available and is used.

  39. Scenario 3 Synthesizer adjusts its parameters due to context

  40. Scenario 3 Another message is available to be transmitted to the user. The user became once again in range of the text output, but, this time, a high level of background noise is present in the room. Text synthesizer is online. Following the same pattern on previous scenario, the speech synthesizer disconnects himself. Since the distance changed, the font size also was recalculated proportionately. As the user is in range, the system outputs the message.

  41. Small demo …

  42. Conclusions Focus on the communication between the system and the user – multimodal output – is the main novelty of our architecture and prototype. AdaptO allows modalities to have independence and self-adaptability to user and context of use (ex: environment). Systems in this area of application incorporate various modalities to communicate with the user (such as voice, text or images) but they are completely devoid of any autonomy. The design choice has several advantages example: Simplification of coordination on the fission algorithm which is no longer required to know everything about all the available output devices.

  43. Future Work Ongoing and future work includes: refining adaptation heuristics; use of more advanced user models, possibly making use of ontologies such as GUMO (Heckmann et al., 2005); tests with elderly users (the target for our work); gather and learn from user related information (preferences and history of usage); creating new output agents such as 3D dynamic graphics and avatars.

  44. Thank you. This work is part of the COMPETE and the European Union (FEDER) under QREN Living Usability Lab for Next Generation Networks (http://www.livinglab.pt/).

More Related