AVIVAVOZ: technologies for speech-to-speech translation

AVIVAVOZ:technologies for speech-to-speech translation TEC-2006-13694-C03

ASR SMT TTS The problem source speech source text broadcast news parliamentary speech target text Official languages in Spain: Spanish (Castilian) Catalan Basque Galician English target speech source speaker expressive voice

ASR SMT TTS Applications (I) source speech source text target text Interpreting Audio and video dubbing target speech

ASR SMT TTS Applications (II) source speech source text target text Video and TV subtitling Multilingual video indexing target speech

source speech ASR source text SMT target text TTS target speech General goals • Development and integration of ASR, SMT and TTS technologies. • Technology evaluation: internal evaluation and participation in national and international evaluation campaigns. • Development of bilingual text corpora for the language pairs: Spanish – Basque / Catalan / Galician • Development of speech corpora to achieve both expressive voice and voice transformation. • Implementation of an automatic video dubbing demonstrator.

source speech ASR source text SMT target text TTS target speech Participants

source speech ASR source text SMT target text TTS target speech Technology development: ASR • Multilingual acoustic-phonetic modeling • estimation of pronunciation-rules • Spontaneousspeech • hesitations (acoustic modeling) • false starts and repetitions (vocabulary tree) • Segmentation of speech • insertion of punctuation marks • diarization • Robustness • language model adaptation • language model rescoring • speaker adaptation

source speech ASR source text SMT target text TTS target speech Technology development: SMT • Translation models • reordering rules (linguistic and statistical), graph • hierarchical phrases • language pairs: English / Basque / Catalan / Galician - Spanish • Use of linguistic knowledge to tackle translation errors • enclitics • POStag: polysemic works, gender and number agreement, etc. • morphology of verbs • New version of MARIE

source speech ASR source text SMT target text TTS target speech Technology development: SMT Example 1 (O) No quiero verte más por aquí. (T1) No vull veure et més per aquí. (T2) No vull veure’t més per aquí. Example 2 (O) Els meus amics no són els teus. (T1) Mis amigos no están *tus. (T2) Mis amigos no son los tuyos. Example 3 (O) There is huge political responsibility both on the part of the European Union, which legalised the war, and the political groups, which approved the resolution. (T1) Hay gran responsabilidad política tanto de la UE, que legalizaron la guerra, y los Grupos políticos, que aprobó la resolución. (T2) Hay una gran responsabilidad política tanto de la UE, que legalizó la guerra, y los Grupos políticos, que aprobaron la resolución.

source speech ASR source text Pronunciation of foreign names SMT before now target text TTS target speech Technology development: TTS • Linguistic processing • POS-tagging • phonetic transcription (timbre of vowels, foreign names)

source speech ASR source text Prosody generation SMT before now target text TTS target speech Technology development: TTS • Linguistic processing • POS-tagging • phonetic transcription (timbre of vowels, foreign names) • Prosody generation

source speech ASR source text SMT target text TTS target speech Technology development: TTS • Linguistic processing • POS-tagging • phonetic transcription (timbre of vowels, foreign names) • Prosody generation • Generation of expressive speech • spontaneous (non-fluent) speech • emotional speech

source speech ASR source text SMT target text Emotional speech neutral sad Spontaneous speech TTS target speech Technology development: TTS • Linguistic processing • POS-tagging • phonetic transcription (timbre of vowels, foreign names) • Prosody generation • Generation of expressive speech

source speech ASR source text SMT target text TTS target speech Technology development: TTS • Linguistic processing • POS-tagging • phonetic transcription (timbre of vowels, foreign names) • Prosody generation • Generation of expressive speech • spontaneous (non-fluent) speech • emotional speech • Voice transformation

source speech ASR source text Voice transformation target sources before now SMT target text TTS target speech Technology development: TTS • Linguistic processing • POS-tagging • phonetic transcription (timbre of vowels, foreign names) • Prosody generation • Generation of expressive speech • spontaneous (non-fluent) speech • emotional speech • Voice transformation

source speech ASR source text SMT target text TTS Spanish Catalan target speech Technology development: TTS • Linguistic processing • POS-tagging • phonetic transcription (timbre of vowels, foreign names) • Prosody generation • Generation of expressive speech • spontaneous (non-fluent) speech • emotional speech • Voice transformation • Corpus-based speech synthesis

source speech ASR source text SMT target text TTS target speech Integration of the technologies • UIMA (Unstructured Information Management Architecture) • web-distributed architecture • open source software • developed by Alpha Works IBM • maintained by Apache Software Foundation • HTK graph • developed and maintained by Cambridge University • multiple hypothesis and scores • metadata • Voice synchronization between input and output

source speech ASR source text SMT target text TTS target speech Integration of the technologies

source speech ASR source text SMT target text TTS target speech Technology evaluation • Internal evaluation • Initial evaluation: December 2007 • V Jornadas de Tecnología del Habla • SMT international evaluations • International Workshop on Spoken Language Translation • 2007: best system for ASR output in Arabic - English translation • 2008: second best (human evaluation) for ASR output in Arabic - English translation • International Conference of the Association of Computational Linguistics • 2007: best system out of domain for Spanish - English

source speech ASR source text SMT target text TTS target speech Technology evaluation • Speech Synthesis international evaluations • 2008 Blizzard Challenge • ECESS

source speech ASR source text SMT target text TTS target speech Coordination • General meetings • Kick-off meeting: Bilbao (February 2007) • Barcelona (July 2007) • Vigo (February 2008) • VJTH (Bilbao, November 2008) • SMT workshop (July 2007) • Wiki of the project • Technology evaluation campaigns • Integration of the technologies • Specific meetings for each technology

source speech ASR source text SMT target text TTS target speech Developed corpora • SMT corpora • Spanish - English, based on the TC-STAR corpus. • Catalan - Spanish, based on the bilingual edition of “El Periódico”. • Basque - Spanish, based on the web texts of “Eroski Consumer” (available to VJTH). • Galician - Spanish, based on the bilingual web edition of “El Correo Gallego”, “Eroski Consumer” and DOGA. • Speech Synthesis corpora • UPC_ESMA Spanish Database • Expressive voice (Basque).

source speech ASR source text SMT target text TTS target speech Publications • PhD dissertations: 7 • International Journal papers: • 16 published o accepted for publication • 4 submitted for publication • Spanish Journal papers: 5 • International Conferences papers: 52 • Spanish Conferences papers: 13

source speech ASR source text SMT target text TTS target speech Transference of technology • “TecnoParla” and “LinkCat” projects funded by “Secretaría de Política Lingüística” of “Generalitat de Catalunya”. • “SMARTADAPT: Generación de tecnología asistiva multimodal”, project funded by Diputación Foral de Bizkaia. • “INREDIS: Interfaces de Relación entre el Entorno y las personas con Discapacidad”, Proyecto Cenit (Ministerio de Industria, Turismo y Comercio). • “C-Extractor: Plataforma de Extracción Automática de Conocimiento a partir de Fuentes de Información Estructuradas”, Proyector Tractor (Ministerio de Industria, Turismo y Comercio).

source speech ASR source text SMT target text TTS target speech Transference of technology • Transference contracts with: • Verbio Technologies S.L., spin-off of UPC. • Natural Vox S.A. • Tecnocom S.A. • Quobis Networks SLU. • ELRA-Distribution Agency (ELDA) • " Fostering Language Resources Network" (FlareNet). ECP 2007-LANG 617001. • Action COST 2102: Cross-modal analysis of verbal and non-verbal communication.

source speech ASR source text SMT target text TTS target speech Cooperation • “Human Language Technology and Pattern Recognition” group of RWTH, Aachen University. • “Human Language Technology” group of “Institute for Infocomm Research”, Singapore. • “Speech Research Group” of the “Engineering Laboratory” of Cambridge University. • “Center for Spoken Language Understanding” (CSLU), Portland (Oregon), USA. • ECESS (European Center of Excellence in Speech Synthesis). • “Institut des Systèmes Intelligents et de Robotique” of “Université Pierre et Marie Curie“.

source speech ASR source text SMT target text TTS target speech Cooperation • “Natural Language Processing” group of UPC. • Dr. David Escudero, Universidad de Valladolid. • “Grupo de Procesado de Señal” of “Instituto de Investigación en Ingeniería de Aragón”. • “Centro Ramón Piñeiro para a investigación en humanidades”, Xunta de Galicia.

source speech ASR source text SMT target text TTS target speech Video dubbing demonstrator Spanish - English Galician - Spanish

source speech ASR source text SMT target text TTS target speech The end Thanks for your attention ! Questions are welcome !

AVIVAVOZ: technologies for speech-to-speech translation

AVIVAVOZ: technologies for speech-to-speech translation

Presentation Transcript

Figures of Speech

Free Speech/1 st Amendment

Speech Recognition

Speech Coding EE 516 Spring 2009

Reconstructing Spontaneous Speech

Parts of Speech

Occupational and Speech Therapy: Treating children with ASD

Why Inner Speech?

Parts of Speech

Laryngeal Function and Speech Production

Tutorial on Neural Network Models for Speech and Image Processing

Robust Translation of Spontaneous Speech: A Multi-Engine Approach

A Tutorial on Bayesian Speech Feature Enhancement

Klaus J. Kohler University of Kiel, Germany

Language and Speech Technology: Introduction

Clear and present danger (test)

Design and Implementation of Speech Recognition Systems

Message Flow for an AMR Speech Call

Feature Extraction for speech applications

Conditional Random Fields for Automatic Speech Recognition

Novel Speech Recognition Models for Arabic

Speech: