1 / 49

INVOCA Project Speech Interfaces for Air Traffic Control Tasks

INVOCA Project Speech Interfaces for Air Traffic Control Tasks. Javier Macías-Guarasa Speech Technology Group (GTH) Department of Electronic Engineering E.T.S.I. Telecomunicación (ETSIT) Universidad Politécnica de Madrid (UPM). Overview. Introduction Tasks (applications, prototypes)

cara
Télécharger la présentation

INVOCA Project Speech Interfaces for Air Traffic Control Tasks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INVOCA ProjectSpeech Interfaces for Air Traffic Control Tasks Javier Macías-Guarasa Speech Technology Group (GTH) Department of Electronic Engineering E.T.S.I. Telecomunicación (ETSIT) Universidad Politécnica de Madrid (UPM)

  2. Overview • Introduction • Tasks (applications, prototypes) • Data collection • System architecture & technical details • Evaluation • Demo • Conclusions

  3. Introduction (I) • INVOCA • Speech Interfaces for Air Traffic Control INterfaces VOcales para Control de tráfico Aéreo • Project proposal: • AENA Spanish Airports and Air Navigation • Speech Technology Group ETSIT-UPM • Exploratory project  technology evaluation: • Analyze the state of the art of speech recognition technology and its applications to air traffic control tasks • Feasibility study: to be integrated in SACTA? (SACTA = Advanced System for Air Traffic Control)

  4. Introduction (II) • SACTA

  5. People @ GTH: José M. Pardo Javier Ferreiros José Colás Fernando Fernández Valentín Sama Ricardo de Córdoba Juan M. Montero Javier Macías José D. Romeral More people @ GTH: Sergio Díaz María J. Pozuelo Gregoire Prime Jordi Safont Eduardo Campos et al.! AENA Staff: Germán González Myriam Santamaría Introduction (III)

  6. Tasks (I) • Identifying suitable target applications (tasks) within the SACTA environment: • Air traffic controllers (ATCs) in control towers in Barajas (Madrid airport) • ‘Feasible’ tasks • ‘Useful’ tasks • Outcome: • Isolated word recognition  IF1 • Spontaneous speech recognition & understanding  IF2

  7. Tasks (II)Speech Interface IF1 (I) • Target: • Air Traffic Controllers (ATCs) in control towers • Must keep an eye on traffic around the airport • Feasibility of C&C speech interfaces to help them in handling complex control systems? • Application: • Hard to identify in current SACTA status • Instead: replace FOCUCS system (tactile display) to control main display visualization

  8. Tasks (III)Speech Interface IF1 (II)

  9. Tasks (IV)Speech Interface IF1 (III) • Prototype architecture

  10. Tasks (V)Speech Interface IF2 (I) • Target: • Air Traffic Controllers (ATCs) in control towers • ATCs provide aircraft pilots with instructions regarding flight level, transponder code, etc. • Some data must/should be entered in the computer system • Application: • Detect key concepts (slots) and associated datavalues in [ controller  pilot ] radio communication

  11. Tasks (VI)Speech Interface IF2 (II) • IF2 subtasks: Five, one for every control position in Barajas Airport: • Arrivals • Authorizations • North tower • South tower • Take offs • IF1 & IF2 both handling Spanish and ‘English’ (spoken by Spaniards!)

  12. Data Collection (I)Standard databases • Spanish SpeechDat (M & FDB) • Telephone Speech… … but ATC radio channels are band limited! • ~4000 Speakers, isolated & continuous read speech • Digits, isolated words, digit strings, phonetically rich sentences, etc. (40 items per speaker) • But… not related to the task • Need more data! • For adaptation • For full retraining?

  13. Data Collection (II)Speech Interface IF1 • Read isolated words in the task domain: • 16KHz, 16 bits linear (downsample to 8KHz.) • 30 speakers (15 male & 15 female) • 5 repetitions of every command* in the FOCUCS task vocabulary 228 SP / 176 EN

  14. Data Collection (III)Speech Interface IF2 (I) • Real recordings: controller  pilot • 16KHz, 16 bits linear, downsample to 8KHz. • Stereo recording: speech + PTT signal 33htotal ~6s/sent 16 wrds/sent

  15. Data Collection (IV)Speech Interface IF2 (II) • Process: • Recording chunks of 15 minutes, continuously • Segmenting in sentences: PTT  easyAverage ‘real speech’ contents = 16.4% • Transcribing • Hard, specially in English • Label pauses, respiration and aspiration, tongue, unidentified noise, click, cough • Also concept labeling

  16. Data Collection (V)Speech Interface IF2 (III) • Samples of “Authorizations” sentences: • thai niner four three start up approved qnh one zero one eight !P clear eh !LP fiumicino via flight plan route !P eh nando !P one charlie departure squawk !P on one four two six • alitalia zero six nine roger start up approved and according slot one zero one eight and clear to !P milan malpensa airport via pinar one !P bravo departure squawk one four one six • !RUIDO ok we havent got it yet but the supervisor eh lets me give you start up clearance !ASP and we will give you the atc clearance we when we receive it so start up approved eh report your position again please

  17. Data Collection (VI)Speech Interface IF2 (IV) • Concept labeling sample: • olympic two four eight on stand eightystart up approved with qnh one zero one niner clear to destination athensvia flight plan route nando two golf standard departure initial flight level one three zero on the squawk one four seven three ====== UNDERSTANDING RESULT ====== identifier=[olympic248] startup_status=[START UP APPROVED] destination=[athens] exit_using=[nando2G] transponder=[1473] initial_flight_level=[130] qnh=[1019] parking=[stand80] ======================================

  18. Data Collection (VII)Speech Interface IF2 (V) • Samples of “Arrivals” sentences: • airfrance one five zero zero yes swissair six five zero vacating • klm seven zero one good morning continue approach runway three three as number two wind calm precedent traffic seven six seven four miles ahead

  19. Data Collection (VIII)Speech Interface IF2 (VI) • Samples of “Take offs” sentences: • nostrum eight six one five wind two eight zero one zero cleared take off runway three six left • speedbird four six five you are number four behind iberia airbus three twenty on sierra

  20. Data Collection (IX)Speech Interface IF2 (VII) • Samples of “North tower” sentences: • airnostrum eight seven two five continue via alfa behind iberia seven five seven via kilo tango forty i call you back hold short mike taxi way • airnostrum eight ou triple seven roger taxi via kilo mike holding three six left and please give way traffic mike delta spanair coming out via mike ou ten is now crossing juliett gate

  21. Data Collection (X)Speech Interface IF2 (VIII) • Samples of “South tower” sentences: • alitalia zero six niner are you able to enter mike between the airfrance traffic and the aireuropa your right side atp • sabena now proceed via in a taxi way to the left and wait for the follow me car

  22. System Architecture (I) Speech Interface IF1 Spanish HMMs Spanish dict. One Pass Command to UDP Feature extraction Recognizedcommand One Pass 12 LPC-Cepstrumlog energy 13  13   English HMMs English dict. Change in main display

  23. System Architecture (II) Speech Interface IF2 Task dependent Understanding module Spanish HMMs Spanish N-gram Spanish dict. Tagged dict. Tagger One Pass +rescoring Recognizedsentence in a certain language Preproc. CD rules Feature extraction LanguageID One Pass +rescoring CD rules Tag refiner 12 LPC-Cepstrumlog energy 13  13   Task dependent English HMMs English N-gram English dict. Understanding module Conceptualframe & data

  24. System Architecture (III) • Preprocessing & modeling: • 12 LPC cepstrum + logE + 13  + 13  • CMN + CVN (utterance level) • CD continuous HMMs trained with HTK • Spanish: 1509 states, 8 mixtures per state • English: 1400 states, 8 mixtures per state • Multiple pronunciations in the dictionary! • Training database: Spanish SpeechDat • Further adaptation: Task & speaker

  25. System Architecture (IV) • Search: First pass • One pass Beam search (for states and for last states) • Search space reduced to 18% w/o performance penalty • Bigram LM guided • Scores on demand • Non-speech models handling (regarding LM scoring) • Able to generates n-best output sentences • Search: Second pass: • Rescores first pass output (graph) with trigram • Task dependent tuned LM and IWP weights

  26. System Architecture (V) • Language ID: • Spanish speakers: • Great variability in ‘canonical’ pronunciation • Some words pronounced in ‘Spanish’ (e.g. bravo) • ATCs mix languages (to greet or say goodbye) • Initial effort using well known techniques (PPRLM, etc.) • Final system using LM score comparison!

  27. System Architecture (VI) • Understanding module: • Tagger: several categories per word • Number preprocessing • Tags refiner • Understanding module • Understanding module architecture used in other tasks in our Group • Task dependent & time consuming

  28. Evaluation (I) • Multiple environments: • Off line, using recorded database (offline) • With people at GTH, predefined script (online) • With users (advanced ATC trainees): • Predefined script (online) • Predefined scenarios (free online) • Subjective evaluation • English & Spanish • Measuring: • Word accuracy rates (IF1 & IF2) • Concept accuracy rates (IF2)

  29. Evaluation (II)Speech Interface IF1 (I) • Off line, Main results:

  30. Evaluation (III)Speech Interface IF1 (II) • On line, predefined script (11 speakers): • Spanish: 50 commands (98 words/speaker) • English: 30 commands (60 words/speaker)

  31. Evaluation (IV)Speech Interface IF1 (III) • On line, predefined script (11 speakers): • Detailed error analysis

  32. Evaluation (V)Speech Interface IF1 (IV) • On line, real-task test (11 speakers ATCs): • Form with different questions (subjective) • The system understands what I say… 1 2 3 4 5  AVG 4.0

  33. Evaluation (VI)Speech Interface IF1 (V) • On line, real-task test (11 speakers ATCs): • Form with different questions (subjective) • I would use this system instead of the current one… 1 2 3 4 5  AVG 3.4

  34. Evaluation (VII)Speech Interface IF2 (I) • Training, adaptation & rec. issues: • Spanish authorizations task (prelim. experim.): • Full retraining is used (using only Auth. DB) • Rescoring improves only 4% relative (20% in read speech)  not used in final prototype

  35. Evaluation (VIII)Speech Interface IF2 (II) • Database & LM statistics: • Spanish: • English:

  36. Evaluation (IX)Speech Interface IF2 (III) • Off/on line*, word/concept recognition rates: • Spanish: • English: * GTH Online 16 spks: 10 snt/spk in Sp.& 6 snt/spk in Eng. Read Speech! * ATC Online 7 spks: 10 snt/spk in Sp. & 6 snt/spk in Eng. Read Speech!

  37. Evaluation (X)Speech Interface IF2 (IV) • Off/Free on line, word/concept recognition rates: • Spanish: • English: * ATC Free online 7 spks: scenario based 10 snt/spk in Sp. & 6 snt/spk in Eng.5 additional OOVs

  38. Evaluation (XI)Speech Interface IF2 (V) • Real-world*, RT system working in tower: • Word/concept recognition rates: • Language ID rates: * Real World, 205 sentences, 3433 reference words, 588 slots. 10 addit. OOVs

  39. Evaluation (XII)Speech Interface IF2 (VI) • Cross task comparison (off line): • Spanish (average rate for all other tasks): • English (average rate for all other tasks):

  40. Demo • Start praying  • Wrong microphone & channel • Wrong speaker! • IF1: • Using defined dictionary • Only Spanish, sorry • IF2: • Random sentences, Spanish & English • Will (try to) point out mistakes

  41. Conclusions (I) • Great fun! • Plenty of space for improvement: • Task dependent restrictions (existing frequencies & flight ids, airport layout data, etc.) • Concept refining (current set is very broad) • Rules development • Speaker/gender adaptation • More data! 

  42. Conclusions (II) • ASR Technology not ready for prime time! • Difficult task • We are talking about planes and people! • ‘Political’ issues • Other applications in this field: • Non critical tasks • Pseudo-pilots for ATCs training? • Phraseology trainers • Indexing

  43. Questions?

  44. EvaluationSpeech Interface 1 IF2 • Database & LM statistics: • Spanish • English

  45. EvaluationSpeech Interface 1 IF2 • Off line, recognition rates: • Spanish • English

  46. EvaluationSpeech Interface 1 IF2 • Off line, understanding rates: • Spanish • English

  47. System ArchitectureLanguage ID in IF2 • Preliminary experiments with PPRLM: • Need almost 5 seconds to achieve 96% • Bad performance in real task • Implemented system uses LM score comparison!

  48. System ArchitectureUnderstanding example • Lufthansa four three four seven clearance correct on stand eight one next call one two one decimal seven bye • <lufthansa> -DATA_identifier- • <4> -single_digit- • <3> -single_digit- • <4> -single_digit- • <7> -single_digit- • <clearance> -ID_freq_change- • <correct> -DATA_correct- • <on> -garbage- -ID_freq_change- • <stand> -DATA_park- • <8> -single_digit- • <1> -single_digit- • <next> -garbage- • <call> -ID_standby- -ID_freq_change- • <1> -single_digit- • <2> -single_digit- • <1> -single_digit- • <decimal> -freq_decimal_point- • <7> -single_digit- • <bye> -goodbye- • <lufthansa> -DATA_identifier- • <4347> -single_digit- • <clearance> -ID_freq_change- • <correct> -DATA_correct- • <on> -garbage- -ID_freq_change- • <stand> -DATA_park- • <81> -single_digit- • <next> -garbage- • <call> -ID_standby- -ID_freq_change- • <121> -single_digit- • <decimal> -freq_decimal_point- • <7> -single_digit- • <bye> -goodbye-

  49. System ArchitectureUnderstanding example • Lufthansa four three four seven clearance correct on stand eight one next call one two one decimal seven bye • <lufthansa> -DATA_identifier- • <4347> -single_digit- • <clearance> -ID_freq_change- • <correct> -DATA_correct- • <on> -garbage- -ID_freq_change- • <stand> -DATA_park- • <81> -single_digit- • <next> -garbage- • <call> -ID_standby- -ID_freq_change- • <121> -single_digit- • <decimal> -freq_decimal_point- • <7> -single_digit- • <bye> -goodbye- • <lufthansa4347> -SLOT_identifier- • <clearance> -ID_freq_change- • <correct> -DATA_correct- • <stand> -DATA_park- • <81> -single_digit- • <call> -ID_standby- -ID_freq_change- • <121.7> -SLOT_freq_change- • <bye> -goodbye- • <lufthansa4347> -SLOT_identifier- • <clearance> -ID_freq_change- • <correct> -DATA_correct- • <stand81> -SLOT_park_id- • <call> -ID_standby- -ID_freq_change- • <121.7> -SLOT_freq_change- • <bye> -goodbye- • ====== UNDERSTANDING RESULTS ====== • identifier=[lufthansa4347] • park_id=[stand81] • freq_change=[121.7] • ======================================

More Related