110 likes | 233 Vues
This document outlines the development of evaluation protocols for both fixed and mobile platforms conducted during the workshop in Torino, March 9-10, 2006. It includes planning details, required test materials, evaluation criteria, and methodologies for testing vocal interaction systems. The protocols specify environmental conditions, necessary grammar, commands, and the performance measures that will be employed in evaluations. Additionally, it emphasizes the significance of both objective and subjective assessment methods to determine system performance in real-world simulations.
E N D
Development of protocolsWP4 – T4.2 Torino, March 9th -10th 2006
Presentation plan • Planning and partners • Definition • Test material: what is needed for the evaluation test • Evaluation criteria • To do : define the protocols for both platforms
Calendar 2005 2006 2007 m18 m24 m29 m27 T4.3&4 Evaluation on the fixed and mobile platforms Nov.06 June06 Sept.06 Dec.05 T4.2 : Development of protocols M4.1 D4.2 Specification of evaluation protocols Functional integration on both platforms completed M3.2 • TRT (leader , 3 m*m) • Loquendo (2), TUC (2) • UGR (1), Loria (1), THAV (1)
Definition • Evaluation protocol • Defines precisely what must be evaluated, in which environment, what criteria are used and how to proceed. • ex: wine tasting protocols “Define the measures that will be applied during experiments in order to assess the performances of the vocal interaction system as well on a quantitative basis or on a more context dependent, qualitative basis.” what how • The performance of the Hiwire recognition systems • The integration quality on the fixed and mobile platforms >>>
Test material (1/2) • Test grammar • One for each platform • Vocabulary • Number of commands • Speech input • Live speakers • Who? (professional pilots, mechanics) • Type of microphone (close-talking / multi-mic array) • Real conditions simulation (added hangar noise through LPs) • Recorded speech • Hiwire database • Sampling rate / quantification • Mixed cockpit noise
Test material (2/2) • Location • A simulation room • PDA • Microphone + PtoT • A cockpit simulator • Graphical interface • Microphone + VAD • Panel • Professional pilots, mechanics, … (both platforms) • Hiwire database (fixed platform) • Scenario • A list of commands. • Definition of the interaction (synthetic voice, vocal feedback)
Evaluation criteria (1/3) • Objective measures • WAC [0-100] % • SAC, sentence accuracy [0-100] % • CAC, command accuracy [0-100] % • Response time # s • Time between the end of speech and the system response • Task completion rate TCR (+timeout) % of completed tasks • Plugged analyzer inside the system
Evaluation criteria (2/3) • Subjective measures • Usability • Learning time* s • Memorisation effort* [1-5] • Easiness of use* [1-5] • Workload • Number of added tasks correctly achieved # • Naturalness of the interaction [1-5] • Acceptance level [1-5] • A form to fill at the end of the test session, subjective scales • Sensors • heart pulsation • EEG • eyes movement
Evaluation criteria (3/3) • Results Analysis • Gathering objective data • Transforming subjective data into a numerical form • Subjective scales • Comparison with WoOz • Comparison with non vocal text input • Statistical features • Average, standard deviation • Classification
Summary: List of the protocol definition features • Fixed platform • Material • Grammar • Thav grammar (provided at the end of April) • Speech input • Colleagues • ~20 non native speakers (bad>good accent) • Location • The THAV cockpit simulator • Multi-speaker noise diffusion system • MM array • A test scenario • Depends on the grammar • Mobile platform • Material • Grammar • Extended version • Panel/ the users • Colleagues • 10 to 20 • Location • An equipped room, noise diffusion • Factory noise hangar noise (ask Airbus…) • Different levels (from clean to ? dB, at the microphone capsule level) • A test scenario • The maintenance of aircrafts
Summary: List of the protocol definition features • Fixed platform • Criteria • Objective measures • SAC (avg and statistics through speakers) • Response time • Subjective measures • … no pilot • Comparison with the hiwire baseline • Results analysis • statistics through speakers • Mobile platform • Criteria • Objective measures • Response time • SAC • TCR • Subjective measures • Easiness to use • Naturalness of interaction • Results analysis • Comparison with text input / pen input system