Verbmobil: Mobile Speech Translation System

Verbmobil Multilingual Processing of Spontaneous Speech Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49 681) 302-5341 e-mail: wahlster@dfki.de WWW:http://www.dfki.de/~wahlster

Mobile Speech-to-Speech Translation of Spontaneous Dialogs As the name Verbmobil suggests, the system supports verbal communication with foreign dialog partners in mobile situations. 1 face-to-face conversations 2 telecommunication

Mobile Speech-to-Speech Translation of Spontaneous Dialogs Verbmobil Speech Translation Server Solution: Conference Call: The Verbmobil Speech Translation Server is accessed by GSM mobile phones.

Verbmobil is a Multilingual System English (American) German Japanese German Chinese (Mandarine) German l Siemens, Philips, FH Konstanz, 2 Chinese Universities l Final industrial demo at the end of 2000 It supports bidirectional translation between:

Challenges for Language Engineering Input Conditions Naturalness Adaptability Dialog Capabilities Close-Speaking Microphone/Headset Push-to-talk Speaker Dependent Isolated Words Monolog Dictation Speaker Independent Information- seeking Dialog Read Continuous Speech Telephone, Pause-based Segmentation Increasing Complexity Spontaneous Speech Open Microphone, GSM Quality Multiparty Negotiation Speaker adaptive Verbmobil

Context-Sensitive Speech-to-Speech Translation Wann fährt der nächste Zug nach Hamburg ab? When does the next train to Hamburg depart? Wo befindet sich das nächste Hotel? Where is the nearest hotel? Verbmobil Server Final Verbmobil Demos: l CeBIT-2000 (Hannover) l COLING-2000 (Saarbrücken) l ECAI-2000 (Berlin)

Verbmobil: The First Speech-Only Dialog Translation System German Speaker: “Verbmobil” (Voice Dialing) Mobile DECT Phone Mobile GSM Phone

Verbmobil: The First Speech-Only Dialog Translation System German Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server +49 631 3111911 Mobile DECT Phone Mobile GSM Phone

Verbmobil: The First Speech-Only Dialog Translation System German Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server +49 631 3111911 Verbmobil: “Willkommen beim Verbmobil-Sprachserver. Bitte sprechen sie nach dem Piepton.” Mobile DECT Phone Mobile GSM Phone

Verbmobil: The First Speech-Only Dialog Translation System German Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server +49 631 3111911 Verbmobil: “Willkommen beim Verbmobil-Sprachserver. Bitte sprechen sie nach dem Piepton.” German Speaker: “Verbmobil neuer Teilnehmer hinzufügen.” (Speech command to initiate a conference call) Mobile DECT Phone Mobile GSM Phone

Verbmobil: The First Speech-Only Dialog Translation System German Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server +49 631 3111911 Verbmobil: “Willkommen beim Verbmobil-Sprachserver. Bitte sprechen sie nach dem Piepton.” Mobile DECT Phone Mobile GSM Phone German Speaker: “Verbmobil neuer Teilnehmer hinzufügen.” (Speech command to initiate a conference call) Verbmobil: “Bitte sprechen Sie jetzt die Telephonnummer Ihres Gesprächspartners.”

Verbmobil: The First Speech-Only Dialog Translation System German Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server +49 631 3111911 Verbmobil: “Willkommen beim Verbmobil-Sprachserver. Bitte sprechen sie nach dem Piepton.” Mobile DECT Phone Mobile GSM Phone German Speaker: “Verbmobil neuer Teilnehmer hinzufügen.” (Speech command to initiate a conference call) Verbmobil: “Bitte sprechen Sie jetzt die Telephonnummer Ihres Gesprächspartners.” German Speaker: “0681/302 5253”

Verbmobil: The First Speech-Only Dialog Translation System German Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server +49 631 3111911 Verbmobil: “Willkommen beim Verbmobil-Sprachserver. Bitte sprechen sie nach dem Piepton” To American Participant To German Participant Foreign Participant is placed into the Conference Call Verbmobil: Verbmobil hat eine neue Verbindung aufgebaut. Bitte sprechen Sie jetzt. Verbmobil: Welcome to the Verbmobil server. Please start your input after the beep. Mobile DECT Phone Mobile GSM Phone German Speaker: “Verbmobil neuer Teilnehmer hinzufügen” (Speech command to initiate a conference call) Verbmobil: “Bitte sprechen Sie jetzt die Telephonnummer Ihres Gesprächspartners.” German Speaker: “0681/302 5253”

Verbmobil II: Three Domains of Discourse Scenario 2 Travel Planning Scenario 3 Remote PC Maintenance Scenario 1 Appointment Scheduling When? What? When? Where? How? When? Where? How? Focus on temporal expressions Integration of special sublanguage lexica Focus on temporal and spatial expressions Vocabulary Size: 2500/6000 Vocabulary Size: 15000/30000 Vocabulary Size: 7000/10000

The Control Panel of Verbmobil

From a Multi-Agent Architecture to a Multi-Blackboard Architecture Verbmobil I Verbmobil II  Multi-Agent Architecture  Multi-Blackboard Architecture M3 M1 M2 M3 M1 M2 Blackboards BB 1 BB 2 BB 3 M4 M5 M6 M4 M5 M6  Each module must know, which module produces what data  Direct communication between modules  Each module has only one instance  Heavy data traffic for moving copies around  Multiparty and telecooperation applications are impossible  Software: ICE and ICE Master  Basic Platform: PVM  All modules can register for each blackboard dynamically  No direct communication between modules  Each module can have several instances  No copies of representation structures (word lattice, VIT chart)  Multiparty and Telecooperation applications are possible  Software: PCA and Module Manager  Basic Platform: PVM

A Multi-Blackboard Architecture for the Combinationof Results from Deep and Shallow Processing Modules Command Recognizer Channel/Speaker Adaptation Audio Data Spontaneous Speech Recognizer Prosodic Analysis

A Multi-Blackboard Architecture for the Combinationof Results from Deep and Shallow Processing Modules Command Recognizer Channel/Speaker Adaptation Audio Data Spontaneous Speech Recognizer Prosodic Analysis Statistical Parser Chunk Parser Word Hypotheses Graph with Prosodic Labels Dialog Act Recognition HPSG Parser

A Multi-Blackboard Architecture for the Combinationof Results from Deep and Shallow Processing Modules Command Recognizer Channel/Speaker Adaptation Audio Data Spontaneous Speech Recognizer Prosodic Analysis Statistical Parser Chunk Parser Word Hypotheses Graph with Prosodic Labels Dialog Act Recognition HPSG Parser Semantic Construction Semantic Transfer VITs Underspecified Discourse Representations Robust Dialog Semantics Generation

Verbmobil as the First Dialog Translation System that Uses Prosodic Information Systematicallyat All Processing Stages Speech Signal Word Hypotheses Graph Multilingual Prosody Module Prosodic features: l duration l pitch l energy l pause Boundary Information Boundary Information Sentence Mood Accented Words Prosodic Feature Vector Dialog Act Segmentation and Recognition Search Space Restriction Lexical Choice Speaker Adaptation Constraints for Transfer Speech Synthesis Dialog Understanding Translation Parsing Generation

Integrating Shallow and Deep Analysis Components in a Multi-Blackboard Architecture Augmented Word Hypotheses Graph Statistical Parser Chunk Parser HPSG Parser

Integrating Shallow and Deep Analysis Components in a Multi-Blackboard Architecture Augmented Word Hypotheses Graph Statistical Parser Chunk Parser HPSG Parser partial VITs Chart with a combination of partial VITs partial VITs partial VITs

Integrating Shallow and Deep Analysis Components in a Multi-Blackboard Architecture Augmented Word Hypotheses Graph Statistical Parser Chunk Parser HPSG Parser partial VITs Chart with a combination of partial VITs partial VITs partial VITs Robust Dialog Semantics Combination and knowledge- based reconstruction of complete VITs Complete and Spanning VITs

Verbmobil‘s Massive Data Collection Effort Transliteration Variant 1 Transliteration Variant 2 Lexical Orthography Canonical Pronounciation Manual Phonological Segmentation 3,200 dialogs (182 hours) with 1,658 speakers 79,562 turns distributed on 56 CDs, 21.5 GB Automatic Phonological Segmentation Word Segmentation Prosodic Segmentation Dialog Acts Noises Superimposed Speech Syntactic Category Word Category Syntactic Function Prosodic Boundaries The so-called Partitur (German word for musical score) orchestrates fifteen strata of annotations

Extracting Statistical Properties from Large Corpora Segmented Speech with Prosodic Labels Treebanks & Predicate- Argument Structures Annotated Dialogs with Dialog Acts Aligned Bilingual Corpora Transcribed Speech Data Machine Learning for the Integration of Statistical Properties into Symbolic Models for Speech Recognition, Parsing, Dialog Processing, Translation Neural Nets, Multilayered Perceptrons Probabilistic Transfer Rules Hidden Markov Models Probabilistic Automata Probabilistic Grammars

VHG: A Packed Chart Representation of Partial Semantic Representations l Incremental chart construction and anytime processing l Chart Parser using cascaded finite-state transducers (Abney, Hinrichs) Semantic Construction

VHG: A Packed Chart Representation of Partial Semantic Representations l Incremental chart construction and anytime processing l Chart Parser using cascaded finite-state transducers (Abney, Hinrichs) l Statistical LR parser trained on treebank (Block, Ruland) Semantic Construction

VHG: A Packed Chart Representation of Partial Semantic Representations l Incremental chart construction and anytime processing l Chart Parser using cascaded finite-state transducers (Abney, Hinrichs) l Statistical LR parser trained on treebank (Block, Ruland) l Very fast HPSG parser (see two papers at ACL99, Kiefer, Krieger et al.) Semantic Construction

VHG: A Packed Chart Representation of Partial Semantic Representations l Incremental chart construction and anytime processing l Rule-based combination and transformation of partial UDRS coded as VITs l Chart Parser using cascaded finite-state transducers (Abney, Hinrichs) l Statistical LR parser trained on treebank (Block, Ruland) l Very fast HPSG parser (see two papers at ACL99, Kiefer, Krieger et al.) Semantic Construction

VHG: A Packed Chart Representation of Partial Semantic Representations l Incremental chart construction and anytime processing l Rule-based combination and transformation of partial UDRS coded as VITs l Selection of a spanning analysis using a bigram model for VITs (trained on a tree bank of 24 k VITs) l Chart Parser using cascaded finite-state transducers (Abney, Hinrichs) l Statistical LR parser trained on treebank (Block, Ruland) l Very fast HPSG parser (see two papers at ACL99, Kiefer, Krieger et al.) Semantic Construction

Robust Dialog Semantics: Deep Processing of Shallow Structures Goals of robust semantic processing (Pinkal, Worm, Rupp) l Combination of unrelated analysis fragments l Completion of incomplete analysis results l Skipping of irrelevant fragments Method: Transformation rules on VIT Hypothesis Graph: Conditions on VIT structures  Operations on VIT structures The rules are based on various knowledge sources: l lattice of semantic types l domain ontology l sortal restrictions l semantic constraints Results: 20% analysis is improved, 0.6% analysis gets worse

Robust Dialog Semantics: Combining and Completing Partial Representations Let us meet (in) the late afternoon to catch the train to Frankfurt the late afternoon the train to Frankfurt meet to catch Let us The preposition ‚in‘ is missing in all paths through the word hypothesis graph. A temporal NP is transformed into a temporal modifier using a underspecified temporal relation: [temporal_np(V1)]  [typeraise_to_mod (V1, V2)] & V2 The modifier is applied to a proposition: [type (V1, prop), type (V2, mod)] [apply (V2, V1, V3)] & V3

The Understanding of Spontaneous Speech Repairs I need a car next Tuesday oops Monday Editing Phase Repair Phase Original Utterance Reparans Hesitation Reparandum Recognition of Substitutions Transformation of the Word Hypothesis Graph I need a car next Monday Verbmobil Technology: Understands Speech Repairs and extracts the intended meaning Dictation Systems like: ViaVoice, VoiceXpress, FreeSpeech, Naturally Speaking cannot deal with spontaneous speech and transcribe the corrupted utterances.

Integrating a Deep HPSG-based Analysis with Probabilistic Dialog Act Recognition for Semantic Transfer HPSG Analysis Probabilistic Analysis of Dialog Acts (HMM) Robust Dialog Semantics Dialog Act Type VIT Dialog Act Type Recognition of Dialog Plans (Plan Operators) Semantic Transfer

Integrating a Deep HPSG-based Analysis with Probabilistic Dialog Act Recognition for Semantic Transfer HPSG Analysis Probabilistic Analysis of Dialog Acts (HMM) Robust Dialog Semantics Dialog Act Type VIT Dialog Act Type Recognition of Dialog Plans (Plan Operators) Semantic Transfer Dialog Phase

Combining Statistical and Symbolic Processing for Dialog Processing Dialog-Act based Translation Dialog Module Context Evaluation Statistical Prediction Dialog Act Predictions Context Evaluation Main Proprositional Content Focus Plan Recognition Dialog Phase Transfer by Rules Dialog Act Dialog-Act based Translation Dialog Memory Dialog Act Generation of Minutes

Using Context and World Knowledgefor Semantic Transfer Example: Platz  room / table / seat Nehmen wir dieses Hotel, ja. Let us take this hotel. Ich reserviere einenPlatz.  I will reserve aroom. 1 Machen wir das Abendessen dort.  Let us have dinner there. Ich reserviere einenPlatz.  I will reserve atable. 2 Gehen wir ins Theater.  Let us go to the theater. Ich möchtePlätzereservieren.  I would like to reserveseats. 3 All other dialog translation systems translate word-by-word or sentence-by-sentence.

Automatic Generation of Multilingual Protocolsof Telephone Conversations Dialog Translation by Verbmobil Multilingual Generation of Protocols HTML-Document in German Transferred by Internet or Fax HTML-Document in English Transferred by Internet or Fax German Dialog Partner American Dialog Partner

Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads Segment 1 If you prefer another hotel, Segment 2 please let me know.

Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads Segment 1 If you prefer another hotel, Segment 2 please let me know. Statistical Translation Case-Based Translation Dialog-Act Based Translation Semantic Transfer Alternative Translations with Confidence Values

Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads Segment 1 If you prefer another hotel, Segment 2 please let me know. Statistical Translation Case-Based Translation Dialog-Act Based Translation Semantic Transfer Alternative Translations with Confidence Values Selection Module Segment 1 Translated by Semantic Transfer Segment 2 Translated by Case-Based Translation

Verbmobil: Long-Term, Large-Scale Funding and Its Impact l Funding by the German Ministry for Education and Research BMBF Phase I (1993-1996) $ 33 M Phase II (1997-2000) $ 28 M l 60% Industrial funding according to shared cost model $ 17 M l Additional R&D investments of industrial partners $ 11 M Total $ 89 M • l > 800 Publications (>600 refereed) l > Many Patents • l > 17 Commercial Spin-off Products l > 6 Spin-off Companies • > 900 trained Researchers for l > Product Announcement • German Language Industry for GSM version in 2001 • Philips, DaimlerChrysler and Siemens are leaders in Spoken Dialog • Applications

More than 80% of Verbmobil’s Translations are Approximately Correct - Large-Scale Web-based Evaluation:25 345 Translations,65 Evaluators - Sentence Length 1 - 60 Words Percentage of Approximately Correct Translation Word Accuracy  80% 2723 Turns Word Accuracy  50% 5069 Turns Word Accuracy  75% 3267 Turns Translation Thread Case-based Translation Statistical Translation Dialog-Act based Translation Semantic Transfer Substring-based Translation Automatic Selection Manual Selection 37% 69% 40% 40% 65% 57% / 78% * 88% 44% 79% 45% 47% 75% 66% / 83% * 95% 46% 81% 46% 49% 79% 68% / 85% * 97% * After Training with Instance-based Learning Algorithm

Checklist for Final Verbmobil System I Three Domains:Appointment Scheduling, Travel Planning, PC Hotline Bi-directional and speaker-independenttranslation in the domains: appointment scheduling and travel planning Translation pairs:German  English, German  Japanese Vocabulary Size: 10 000 for German , Equivalent English Lexicon, 2500 for Japanese Operational Success Criteria: Word recognition rate (16 kHz): German: spontaneous: 75% (cooperative: 85%) English: spontaneous: 72% (cooperative: 82%) Japanese: spontaneous: 75% (cooperative: 85%) (8kHz) spontaneous: 70% (cooperative: 80%) 80%of the translations areapproximately correctand the dialog task success rate should bearound 90%. The average end-to-end processing time should befour times real time(length of the input signal)

Checklist for Final Verbmobil System II The system can work in theopen microphone modeand cope with speech overGSMmobile phones. Verbmobil can be controlled byspeech commands. Aspelling modeis integrated into the speech recognizer. The speech recognizers can cope with simplenon-speech input(like coughing). Spontaneous speech phenomenalike repairs, hesitations and agreement failures can be handled. Thelanguage identificationand speech recognition components are implemented as separate components. Athree-party conferencecall with Verbmobil and a foreign partner can be initiated by one speaker. A high-quality speech synthesisfor German and American English is realized.

Checklist for Final Verbmobil System III Prosodic informationis used for input segmentation. Unknown wordscan be identified and processed. Robust semantic processingintegrates partial analysis results of the competing parsing approaches. The selection of the translation result is based on adynamic choice functionbased on confidence values computed by competing translation threads. Some translation ambiguities can be resolved by theexploitation of world andcontext knowledge, so that the translation quality is improved. Verbmobil can generate various forms ofdialog protocols in German and English.

Spoken Dialog Systems 4 Translation Systems 2 Command & Control Systems 5 Text Classification Systems 1 Verbmobil Dictation Systems 3 Dialog Engines 2 Results of the Verbmobil Project have been used in 17 Spin-Off Products by the Industrial Partners DaimlerChrysler, Philips and Siemens

Successful Technology Transfer: 6 High-Tec Spin-Off Companies in the Area of LanguageTechnology have been founded by Verbmobil Researchers CLT Sprachtechnologie GmbH Language Technology for Text Processing www.clt-st.de Saarbrücken XTRAMIND Technologies Language Technology for Customer Interaction Services www.xtramind.com Saarbrücken GSDC GmbH Multilingual Documentation www.ic-portal.gsdc.de Nürnberg RETIVOX GbR Speech Synthesis Systems www.retivox.de Bonn Verbmobil SCHEMA GmbH Document Engineering www.schema.de Nürnberg SYMPALOG GmbH Spoken Dialog Systems www.sympalog.de Nürnberg

Verbmobil was the Key Resource for the Education and Training of Researchers and Engineers Needed to Build Up Language Industry in Germany Master Students 238 Total 919 Student Research Assistants 483 Habilitations 16 Verbmobil Internships 18 PhD Students 164

Verbmobil: Mobile Speech Translation System

Verbmobil: Mobile Speech Translation System

Presentation Transcript

Wolfgang Kohler

Wolfgang Gatterbauer

Wolfgang Amadeus

Wolfgang Puck

Wolfgang Glatz

Wolfgang Hürst

WOLFGANG PUCK

Wolfgang Huber

Wolfgang Appelt

Wolfgang Schamel

Wolfgang Pauli

Wolfgang Tillmans

Wolfgang Hofle

Wolfgang Täger

WOLFGANG PUCK

Wolfgang Sachs

Wolfgang Wahlster