Development of Evaluation Protocols for Fixed and Mobile Platforms in Torino, March 2006

Development of protocolsWP4 – T4.2 Torino, March 9th -10th 2006

Presentation plan • Planning and partners • Definition • Test material: what is needed for the evaluation test • Evaluation criteria • To do : define the protocols for both platforms

Calendar 2005 2006 2007 m18 m24 m29 m27 T4.3&4 Evaluation on the fixed and mobile platforms Nov.06 June06 Sept.06 Dec.05 T4.2 : Development of protocols M4.1 D4.2 Specification of evaluation protocols Functional integration on both platforms completed M3.2 • TRT (leader , 3 m*m) • Loquendo (2), TUC (2) • UGR (1), Loria (1), THAV (1)

Definition • Evaluation protocol • Defines precisely what must be evaluated, in which environment, what criteria are used and how to proceed. • ex: wine tasting protocols “Define the measures that will be applied during experiments in order to assess the performances of the vocal interaction system as well on a quantitative basis or on a more context dependent, qualitative basis.” what how • The performance of the Hiwire recognition systems • The integration quality on the fixed and mobile platforms >>>

Test material (1/2) • Test grammar • One for each platform • Vocabulary • Number of commands • Speech input • Live speakers • Who? (professional pilots, mechanics) • Type of microphone (close-talking / multi-mic array) • Real conditions simulation (added hangar noise through LPs) • Recorded speech • Hiwire database • Sampling rate / quantification • Mixed cockpit noise

Test material (2/2) • Location • A simulation room • PDA • Microphone + PtoT • A cockpit simulator • Graphical interface • Microphone + VAD • Panel • Professional pilots, mechanics, … (both platforms) • Hiwire database (fixed platform) • Scenario • A list of commands. • Definition of the interaction (synthetic voice, vocal feedback)

Evaluation criteria (1/3) • Objective measures • WAC [0-100] % • SAC, sentence accuracy [0-100] % • CAC, command accuracy [0-100] % • Response time # s • Time between the end of speech and the system response • Task completion rate TCR (+timeout) % of completed tasks • Plugged analyzer inside the system

Evaluation criteria (2/3) • Subjective measures • Usability • Learning time* s • Memorisation effort* [1-5] • Easiness of use* [1-5] • Workload • Number of added tasks correctly achieved # • Naturalness of the interaction [1-5] • Acceptance level [1-5] • A form to fill at the end of the test session, subjective scales • Sensors • heart pulsation • EEG • eyes movement

Evaluation criteria (3/3) • Results Analysis • Gathering objective data • Transforming subjective data into a numerical form • Subjective scales • Comparison with WoOz • Comparison with non vocal text input • Statistical features • Average, standard deviation • Classification

Summary: List of the protocol definition features • Fixed platform • Material • Grammar • Thav grammar (provided at the end of April) • Speech input • Colleagues • ~20 non native speakers (bad>good accent) • Location • The THAV cockpit simulator • Multi-speaker noise diffusion system • MM array • A test scenario • Depends on the grammar • Mobile platform • Material • Grammar • Extended version • Panel/ the users • Colleagues • 10 to 20 • Location • An equipped room, noise diffusion • Factory noise  hangar noise (ask Airbus…) • Different levels (from clean to ? dB, at the microphone capsule level) • A test scenario • The maintenance of aircrafts

Summary: List of the protocol definition features • Fixed platform • Criteria • Objective measures • SAC (avg and statistics through speakers) • Response time • Subjective measures • … no pilot • Comparison with the hiwire baseline • Results analysis • statistics through speakers • Mobile platform • Criteria • Objective measures • Response time • SAC • TCR • Subjective measures • Easiness to use • Naturalness of interaction • Results analysis • Comparison with text input / pen input system

Development of Evaluation Protocols for Fixed and Mobile Platforms in Torino, March 2006

Development of Evaluation Protocols for Fixed and Mobile Platforms in Torino, March 2006

Presentation Transcript

Development and Delivery of Treatment Protocols

SUBELEMENT T4 [2 Exam Questions - 2 Groups]

WP4 Introduction

ORCNext – WP4 Development of supercritical technologies

ORCNext – WP4 Development of supercritical technologies

ATOS in Period 2 (WP4 leader)

WP4: Instrumentation

WP4

REMODECE 2. Project meeting WP4

WP4 report

Protocols 2

Calice WP4

WP4 - Monitoring

Report WP4

T4- Development Manager

WP4

Internet Protocols: Quiz 2

WP4

NNA P2 –T4