1 / 13

Integrating Speech Recognition and Machine Translation

Integrating Speech Recognition and Machine Translation. Spyros Matsoukas, Ivan Bulyko, Bing Xiang, Kham Nguyen, Richard Schwartz, John Makhoul. Integration Issues. Machine Translation (MT) system is trained on text data, so it expects segments that correspond to foreign sentences

lidia
Télécharger la présentation

Integrating Speech Recognition and Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrating Speech Recognition and Machine Translation Spyros Matsoukas, Ivan Bulyko, Bing Xiang, Kham Nguyen, Richard Schwartz, John Makhoul

  2. Integration Issues • Machine Translation (MT) system is trained on text data, so it expects • segments that correspond to foreign sentences • properly placed punctuation marks • numbers, dates, monetary amounts, abbreviations, etc., as they appear in ordinary text • However, Speech-To-Text (STT) output • is segmented automatically on long pauses • resulting segments may be too short, or may cross sentence boundaries • has no punctuation • punctuation needs to be automatically added prior to translation • has numbers, dates, etc., in spoken form • output can be parsed to convert numbers to written form

  3. STT/MT Pipeline • Initial set of experiments ran MT on the 1-best hypothesis from STT

  4. STT Components • STT-A • EARS RT04 Arabic BN system • Word pronunciations based on graphemes • Acoustic models estimated using Maximum Mutual Information (MMI) and Speaker Adaptive Training (SAT) on 100 hours of BN audio data • 3-gram language model trained on 400 million words of news text • STT-B • Uses morphological analyzer and automatic methods to infer short vowels in word pronunciations • Trained on an additional 50 hours of acoustic training data • STT-C • Makes use of additional language model training data

  5. MT Components • MT-A • System developed during the period Sep 2004 – Apr 2005 • Phrase-based translation model, trained on 100M words of Arabic/English UN and news bitext • 3-gram English LM, trained on 2 billion words of text (mostly newswire) • Translation based on posterior probability P(English | Foreign) • MT-B • Uses a combination of generative and posterior translation probabilities • Includes a phrase segmentation score • Uses a method to compensate for over-estimated translation probabilities • Optimizes decoding weights by minimizing TER on N-best lists TER results on the 2002 and 2004 MT Eval sets

  6. Test Data • Tested integration on bnat05 • 6-hour set from several sources from Jan 2001 and Nov 2003 • Test set consists of both Modern Standard Arabic (MSA) and Arabic dialect segments • All system comparisons based on TER • MT system output automatically scored against single reference transcription, with mixed case

  7. Integration Results • Effect of STT accuracy, segmentation and punctuation on MT accuracy • At current MT performance level: • large improvements in STT accuracy result in small TER gain • significant TER reduction (2.7% absolute) can be obtained by improving sentence boundary detection • full punctuation helps translation only marginally

  8. Optimizing STT segmentation for MT • Tuned the audio segmentation procedure in order to output segments that match the reference in terms of average length • 1.6% absolute TER gain for optimizing segmentation • Additional gains can be obtained by • Converting spoken numbers to written form prior to translation (0.4-0.5% TER reduction) • re-defining STT output segmentation, using linguistic information

  9. Sentence Boundary Detection (SBD) • Used a hidden-event language model (HELM) to detect sentence boundaries in the 1-best STT output • 4-gram HELM, trained 850M words of Arabic news with Kneser-Ney smoothing • Silence duration can be integrated as observation into HMM search • Explored various configurations • SBD-1: Use only LM to insert periods within speaker turns • SBD-2: Use LM and silence duration jointly • SBD-3: Bias the LM to insert boundaries at a higher rate (by 30-50%), then remove boundaries with lowest model posteriors while constraining the maximum sentence length

  10. SBD Results • Effect of HELM-based SBD on MT accuracy, starting from one of two audio segmentations • audio-seg-1: 9.47 sec / segment • audio-seg-2: 13.60 sec / segment • HELM has larger effect on Modern Standard Arabic (MSA) regions, where STT accuracy is high • SBD can be applied safely on top of any audio segmentation

  11. Optimizing MT on Speech Data • MT accuracy can be enhanced by optimizing MT decoding weights on broadcast speech data • Optimization can compensate for differences in style between newswire text and STT transcript (esp. on broadcast conversations) • Optimization Issue: • MT optimization requires one-to-one mapping between translation hypotheses and references on the tuning set • Non-trivial to tune on translations of automatically segmented STT output • Solutions: • Re-segment STT output according to reference segmentation prior to translation, then use translation hypotheses for tuning • Tune based on translations of the STT reference transcriptions

  12. MT Optimization Results • Updated development sets • Results • MT02: tuning on translations of the 2002 NIST MT evaluation set • BNC-STT: tuning on translations of manually segmented (according to reference) STT output • BNC-REF: tuning on translations of reference transcripts

  13. Conclusions and Future Research • Results on 1-best STT/MT integration show that sentence boundary detection has a large impact on MT performance • Segmentation should be based on both audio and STT transcript • Better performance is expected by coupling STT and MT more tightly • Have begun running MT on consensus networks from STT output • Will explore joint optimization of STT and MT system parameters • At current operating point, improvements in MT will have the largest effect

More Related