Stochastic Transductions for Machine Translation Overview

Stochastic Transductions for Machine Translation* Giuseppe Riccardi AT&T Labs-Research *Joint work with Srinivas Bangalore and Enrico Bocchieri January 26, 2000

Overview • Motivation • Stochastic Finite State Machines • Learning Machine Translation Models • Case study • MT for Human-Machine Spoken Dialog • Experiments and Results

Speech Understanding Case P( C ) Semantics C P(W|C) Syntax L P(A|W) Acoustic A max P(A,W,C) min A  L  C Speech Recognizer • ATIS 1994 DARPA Evaluation • (G. Riccardi et al., ICASSP 1995, E. Bocchieri et al. SLT Workshop 1995 • Levin et al., SLT Workshop 95)

Source Spoken Language Target Spoken Language Motivation • Finite State Transducers (FST) • Unified formalism to represent symbolic transductions • Learnability • Automatically train transductions from (parallel) corpora • Speech-to-Speech Machine Translation chain • Combining speech and language sciences

Finite State Transducers • Weighted Finite State Transducers (FST) • Algebraic Operations (1 + 2, 1 2…) • Composable E-S =  E-J  J-S • Minimization min(1) • Stochastic transductions (E-S :E* X S* [0,1]) • Joint Probability Decomposition: P(X1, X2, …, XN) P(X1) P(X2| X1)…P(XN| XN-1) • ..and computation  1  2….N

Stochastic Transducers I,e I,e I,3 I,1 3 4 1 2 1 2 V/4 5 cool,<adv>/0.3 cool,<adj>/0.3 1 2 cool,<noun>/0/2 cool,<verb>/0.2

Learning FSTs • Data-driven Learnig FSTs from large corpora. • Learnability • Finite History Context (N-gram). • Generalization • Unseen Event modeling (back-off) • Class-based n-grams. • Phrase grammar (long-distance dependency) • Context-free grammar approximation. • Large-scale transductions • Efficient state and transition function model • Variable Ngram Stochastic Machine (VNSM)

VNSM: the state and transition space • Bottom-up approach • Each state is associated to n-tuple in the corpus • Each transition is associated to adjacent strings • Parametrization: #states  #n-tuples in the corpus #transitions  #n-tuples in the corpus #-transition  #(n-1)-tuples in the corpus VNSM recognizes W  V* (V is the dictionary)

VNSM:Unseen Event Modeling(the power of amnesia/reminiscence) w4 History= w2, w3   History= w3   History=“”  

VNSM: probability distributions • Probability Distribution over W  V* • Parameter tying • Probability training

Stochastic FST Machine Translation • Decompose language translation into two independent processes. Lexical Choice : searching the target language words Word Reordering: searching the correct word order • Modeling the two processes as stochastic finite state transductions • Learning the transductions from bilingual corpora. Speech and Language finite state transduction chain Source Spoken Language Target Spoken Language

Stochastic Machine Translation MT • Noisy-channel paradigm (IBM) • Stochastic Finite State Transducer Model

Learning Stochastic Transducers • Given the input-output pair training set • Align the input and output language sequences: • Estimate the joint probability via VNSMs • Local reordering • Sentence-level reordering

Pairing and Aligning (1) • Source-target language pairs • Sentence Alignment • Automatic algorithm (Alshawi, Bangalore and Douglas, 1998) Spanish :ajá quiero usar mi tarjeta de crédito English : yeah I wanna use my credit card Alignment : 1 3 4 5 7 0 6

Learning SFST from Bi-language • Bi-language: each token consists of a source language word with its target language word. • Ordering of tokens: source language order or target language order • ajá quiero usar mi tarjeta de crédito • yeah I wanna use my credit card • (ajá,yeah) (e,I) (quiero,wanna) (usar,use) (mi,my) (tarjeta,card) (de, e) (crédito,credit)

Learning Bilingual Phrases • Effective translation of text chunks (e.g. collocations) • Learn bilingual phrases • Joint entropy minimization on bi-language corpus • Weighted Mutual Information to rank bilingual phrases • Phrase-based VNST • Local Reordering of phrases una llamada de larga distancia a call long distance a long distance call VNST Local Reordering

Local Reordering • Spanish Reordered Phrase=min(S  TLM) • Word permutation machineexpensive • S is the “sausage” machine • TLM is the target language model

Lexical Reordering • Output of the lexical choice transducer: sequence of target language phrases. • like to makeI'd calla calling cardplease • Words in phrases are in target language word order. • However, phrases need to be reordered in target language word order. • Reordered: • I'd like to make a calling card call please

Lexical Reordering Models • Tree-based model • Impose a tree structure on a sentence (Alshawi et.al ACL98) • English: I'd like to charge this to my home phone

私は (I) 私は (I) 家の (home) これを (this) これを (this) 私の (my) 電話に (phone) したいのです (like) したいのです (like) 家の (home) チャージ (charge) 私の (my) 電話に (phone) チャージ (charge) Lexical Reordering Models • Reordering using tree-local reordering rules. Eng-Jap:私はしたいのですチャージこれを私の家の電話に Japanese: 私はこれを私の家の電話にチャージしたいのです

Lexical Reordering Models (contd.) • Dependency tree represented as a bracketed string (bounded) with reordering instructions. e:[したいのです:したいのですe:-1e:[チャージ:チャージe:]e:] • Training VNSTs from bracketed corpus • Output of lexical reordering VNST: strings with reordering instructions. • Instructions are composed with “interpreter” FST to form target language sentence.

Tree Reordering • Sentence-level reordering • Mapping sentence tree structures English :my card credit (spanish order) English : my credit card (english order) Transduction 1 card -1 +1 my credit Transduction 2 (alignment statistics) card -2 -1 my credit Transduction 3

ASR-based Speech Translation Acoustic Model Training Alignment VNST Learning Lexicon FSM Bi-Phrase Learning Speech Recognizer Tree Reordering

MT Evaluation • Lexical Accuracy (LA) • Bag of words. • Translation Accuracy (TA) • Based on string alignment • Application-driven evaluation • “How May I Help You?” • Spoken dialog for call routing • Classification based on salient phrase detection

HELP DA rate . . . area code billing credit Automated Services and Customer Care via Natural Spoken Dialog • Prompt is “AT&T. How may I help you?” • User responds with unconstrained fluent speech • Spoken Dialog System for call routing

Examples • Yes I like to make this long distance call area code x x x x x x x x x x • Yeah I need the area code for rockmart georgia • Yeah I’m wondering if you could place this call for me I can’t seem to dial it it don’t seem to want to go through for me

Call-Classification Performance • False Rejection Rate: • Probability of rejecting a call, • given that the call-type is one • of the 14 call-type set. • Probability Correct: • Probability of correctly • classifying a call , given that • the call is not rejected.

MT evaluation on HMIHY

DEMO

Conclusion • Stochastic Finite State based approach is viable and effective for limited domain MT. • Finite-state model chain for complex speech and language constraints. • Multilingual speech application enabled by MT • Coupling of ASR and MT http://www.research.att.com/~srini/Projects/Anuvaad/home.html

Biblio -J. Berstel “Transductions and Context Free Languages” Teubner Studienbüchner -G. Riccardi, R. Pieraccini and E. Bocchieri, "Stochastic Automata for Language Modeling", Computer Speech and Language, 10, pp. 265-293, 1996. -Fernando C. N. Pereira and Michael Riley. Speech Recognition by Composition of Weighted Finite Automata . Finite-State Language Processing. MIT Press, Cambridge, Massachusetts. 1997 -S. Bangalore and G. Riccardi, "Stochastic Finite-State Models for Spoken Language Machine Translation", Workshop on Embedded Machine Translation Systems, NAACL, pp. 52-59, Seattle, May 2000. More references on http://www.research.att.com/info/dsp3 http://research.att.com/info/dsp3

Stochastic Finite State Models:from concepts to speech 1993 • Variable Ngram Stochastic Automata (VNSA) • Concept Modeling for NLU • Word Sequence Modeling for ASR • Phonotactic Transducers (context-to-phone) • Tree-structured Transducers (phone-to-word) • Stochastic-FSM based ASR (context-to-concept) • ATIS Evaluation: it actually worked! 1994

Why it worked? • Symbolic representation (SFSM) for probabilistic sequence modeling (words, concepts,..). • Learning algorithms • Cascade (phrase grammar -> {phrases, word classes} -> words) • Machine Combination • Context-to-Phone, Phone-to-Word, Context-to-Grammar (CLG) • Decoding very simple and fast (Viterbi and Beam-Width Search)

Tarjeta de credito Credit card ASR-MT Engine Credit card Multilingual Speech Processing • Finite state chain allow for: • Speech and Language coupling (e.g. prosody, recognition errors) • Integrated multilingual processing

Speech Translation • Previous approaches to Speech Translation • Source language ASR • Translation Model • Finite-state Model based Speech Translation • Source Language Acoustic Model • Lexical Choice Model • Lexical Reordering Model

Learning the state space and state transition function (revised) • For each suffix in the corpus, we create two states (one for string recognition and the other for backoff, epsilon transition). • The size of the automaton is still linear in the corpus size • The stochastic automaton is able to compute word probability for all strings in X*!.

“elections” States/p1 the president of United History(4)=“the president of United” PrevClass=Adj PrevPrevClass=Function Word Trigger(10)=“Elections” Airlines/p2 /p3 State Transition Probability  Stochastic Finite State Automata/Transducers Word Prediction ……the President of United ???……

Learning Lexical Choice Models • English utterances recorded from customer calls. • Manually translated into Japanese/Spanish. • ``Bunsetsu'' like tokenization for Japanese. • Alignment English: I'd like to charge this to my home phone Japanese: 私はこれを私の家の電話にチャージしたいのです Alignment: 1 7 0 6 2 0 3 4 5 • Bilanguage I'd_私はlike_したいのですto_echarge_チャージthis_これをto_emy_私のhome_家のphone_電話に

Learning Bilingual Stochastic Transducers • Learn stochastic transducers from bilanguage (Embedded MT 2000) • Learn automatically bilingual phrases • Reordering within phrases. エイティーアンドティーA T and T 私の家の電話にto my home phone 私はコレクトコールをI need to make かける必要がありますa collect call tarjeta de credito credit card una llamada de larga distancia a long distance call

Lexical Choice Transducer • Language Model: N-gram model built on phrase-chunked bilanguage. • A combination of phrases and words maximize predictive power and minimize number of parameters • Resulting finite-state automaton on bilanguage vocabulary is converted into a finite-state transducer.

Lexical Reordering • Output of the lexical choice transducer: sequence of target language phrases. • like to makeI'd calla calling cardplease • Words in phrases are in target language word order. • However, phrases need to be reordered in target language word order. • Reordered: • I'd like to make a calling card call please

Lexical Reordering Models • Tree-based model • Impose a tree structure on a sentence (Alshawi et.al ACL98) • English: I'd like to charge this to my home phone

私は (I) 私は (I) 家の (home) これを (this) これを (this) 私の (my) 電話に (phone) したいのです (like) したいのです (like) 家の (home) チャージ (charge) 私の (my) 電話に (phone) チャージ (charge) Lexical Reordering Models • Reordering using tree-local reordering rules. Eng-Jap: 私はしたいのですチャージこれを私の家の電話に Japanese: 私はこれを私の家の電話にチャージしたいのです

Lexical Reordering Models (contd.) • Dependency tree represented as a bracketed string with reordering instructions. e:[したいのです:したいのですe:-1e:[チャージ:チャージe:]e:] • Lexical reordering FST: Result of training a stochastic finite-state transducer on the corpus of bracketed strings. • Output of lexical reordering FST: strings with reordering instructions. • Instructions are interpreted to form target language sentence.

私は (I) 私は (I) 家の (home) これを (this) これを (this) 私の (my) 電話に (phone) したいのです (like) したいのです (like) 家の (home) チャージ (charge) 私の (my) 電話に (phone) チャージ (charge) Translation using stochastic FSTs • Sequence of finite-state transductions English: I’d like to charge this to my home phone Eng-Jap: 私はしたいのですチャージこれを私の家の電話に Japanese: 私はこれを私の家の電話にチャージしたいのです

Spoken Language Corpora • Prompt: How may I help you? • Examples Yeah I need the area code for rockmart georgia Yes I'd like to make this long distance call area code x x x x x x x x x x Yeah I'm wondering if you could place this call for me I can't seem to dial it it don't seem to want to go through for me • Parallel corpora: English, Japanese, Spanish.

Evaluation Metric • Evaluation metric for MT is a complex issue. • String edit distance between reference string and result string (length in words: R) • Insertions (I) • Deletions (D) • Moves = pairs of Deletions and Insertions (M) • Remaining Insertions (I') and Deletions (D') • Simple String Accuracy = 1 – (I + D + S) / R • Generation String Accuracy = 1 – (M + I' + D' + S) / R

Stochastic Transductions for Machine Translation Overview

Stochastic Transductions for Machine Translation Overview

Presentation Transcript

Statistical Machine Translation

What’s New in Statistical Machine Translation

Human Evaluation of Machine Translation Systems

Machine Translation: Interlingual Methods

Dependency-Based Automatic Evaluation for Machine Translation

Making machine translation work

Integrating Speech Recognition and Machine Translation

Machine Translation Overview

xml:tm

Approaches to Machine Translation

Rapid development of machine translation for low density languages

Search Applications: Machine Translation

Statistical Machine Translation Part I - Introduction

Postgraduate Diploma in Translation

Machine Translation: Approaches, Challenges and Future

Machine Translation – What’s the use?

Machine Translation

Machine Translation Speech Translation

Machine Translation Overview

Machine Translation (MT)

Machine Translation Overview