1 / 7

Named Entities in Domain Unlimited Speech Translation

Named Entities in Domain Unlimited Speech Translation. Alex Waibel, Stephan Vogel, Tanja Schultz Carnegie Mellon University Interactive Systems Labs. Objective. Extraction and Translation of Arabic Named Entities from Speech Problem: How do we do Domain- Un limited Speech Translation?

ayoka
Télécharger la présentation

Named Entities in Domain Unlimited Speech Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Named Entitiesin Domain Unlimited Speech Translation Alex Waibel, Stephan Vogel, Tanja Schultz Carnegie Mellon UniversityInteractive Systems Labs

  2. Objective • Extraction and Translation of Arabic Named Entities from Speech • Problem: • How do we do Domain-Unlimited Speech Translation? • What to do with Named Entities in Speech? • Named Entities are Typically OoV’s  Recognizer will Replace it with a WRONG Word  Named Entity is Unlikely to be Handled Right • Translation of Named Entities  Named Entities Frequently not in Lexicon ITIC MT Integration Meeting

  3. Approach – Speech Translation • Piggy-Back on STR-DUST (NSF-ITR Project): • Speech Translation on Domain UnlimitedSpeech Tasks • Approach: • Recognition: Statistical Speech Recognition • Consolidation: Statistical Reduction and Extraction • Translation: Statistical MT • Opportunity: • Cascade of Statistical Source-Channel Models • Integration and Optimization • Combine and Compute Joint Models • Working with Errors: Lattices to Communicate between Modules ITIC MT Integration Meeting

  4. Approach – Named Entities • Two Pass Decoding Strategy • OoV’s in Speech: • Recover Named Entity in Dictionary • Identify Relevant Names from Very Large Name Lists • Search for Relevant New Names on Internet • Insert Named Entities in Dictionary, Iterate • New Word Model • Model Unseen Words by New-Word-Model • Assign Named Entity Tag to New-Word • Bi-Lingual Named Entity Tagging • Recover Named Entity • Identify Relevant Names from Translation Output • IR of Relevant Texts in Target Language • Use Transliteration Model to Update Lexicon ITIC MT Integration Meeting

  5. Input/Output • Input: • Speech in source language (Arabic) • Text in source language (Arabic) • Output: • English translation of transcript • English translation of extracted entities Reco القاعدة بزعامة أسامة بن لادن الهجومين اللذين استهدفا كنيسين يهوديين في إسطنبول واللذين أسفرا عن مقتل 23 شخصا وإصابة 300 آخرين. وهدد البيان بتوجيه مزيد من الضربات للولايات المتحدة وحلفائها في جميع أنحاء العالم. NESearch and Translation Name: Abu HafzOrgnz: al-Qaida Location: Baghdad ITIC MT Integration Meeting

  6. Evaluation • Correct Named Entity Detection • Word Correct from Arabic Speech • NE-Tag Correct from Arabic Transcript • Correct Translation • Of Output Text (NIST, Bleu) • Of Output Named Entity ITIC MT Integration Meeting

  7. First Results NE Translation(Chinese) • Online NE translation gives improvements for both tracks • Online NE translation works better on uncommon NE translation, and gives more improvement ITIC MT Integration Meeting

More Related