Statistical Phrase Alignment Model Using Dependency Relation Probability

Statistical Phrase Alignment ModelUsing Dependency Relation Probability Toshiaki Nakazawa and SadaoKurohashi Kyoto University

Outline • Background • Tree-based Statistical Phrase Alignment Model • Model Training • Experiments • Conclusions

Conventional Word Sequence Alignment 受 (accept) 光 (light) A 素子 (device) photogate に (ni) is は (ha) used フォト (photo) for ゲート (gate) the を (wo) photodetector 用いた (used)

grow-diag-final-and

Conventional Word Sequence Alignment Proposed Model 受受 (accept) (accept) 光 A 光 (light) (light) A 素子 photogate 素子 (device) (device) photogate に is に (ni) is (ni) は used は (ha) used (ha) フォト for フォト (photo) for (photo) ゲート the ゲート (gate) (gate) the を photodetector を (wo) photodetector (wo) Dependency trees 用いた用いた (used) (used)

Proposed Model 受 (accept) 光 A (light) 素子 photogate (device) に is (ni) は used (ha) フォト for (photo) ゲート the (gate) を photodetector (wo) Dependency trees Phrase alignment Bi-directional agreement 用いた (used)

grow-diag-final-and Proposed model

Related Work • Using tree structures • [Cherry and Lin, 2003], [Quirk et al., 2005], [Galley et al., 2006], ITG, … • Considering phrase alignment • [Zhang and Vogel, 2005], [Ion et al., 2006], … • Using two directed models simultaneously • [Liang et al., 2006], [Graca et al., 2008], …

Tree-based Statistical Phrase Alignment Model

Dependency Analysis of Sentences Source (Japanese) Target (English) 受 (accept) 光 A (light) 素子 photogate (device) に is (ni) は used (ha) Word order フォト for (photo) ゲート the (gate) を photodetector (wo) 用いた (used) Head node A photogate is used for the photodetector Head node 受光素子にはフォトゲートを用いた

Overview of the Proposed Model(in comparison to the IBM models) • IBM models find the best alignment by • Proposed model : source sentence : target sentence : alignment : Lexical prob. Word translation Word reordering : Alignment prob. Phrase translation Dependency Relation Phrase translation Dependency Relation Phrase translation Dependency Relation

Phrase Translation Probability

Phrase Translation Probability IBM Model • Note that the sentences are not previously segmented into phrases F1 F2 E1 f1 s(j): s(1) = 1 s(2) = 2 s(3) = 2 s(4) = 3 s(5) = 1 e1 f2 A: A1=2 A2=3 A3=0 E2 f3 e2 e3 f4 F3 e4 f5 E3 source target

Dependency Relation Probability

Dependency Relations EAs(p) Inverted parent-child EAs(c) ・・・ Fs(c) ？・・・・・・ fc Parent-child NULL Parent-child fp Fs(p) EAs(p) EAs(c) ・・・・・・ target source rel(fc, fp) = p rel(fc, fp) = c rel(fc, fp) = c;c rel(fc, fp) = NULL_p Grandparent-child

Dependency Relation Probability • Ds-pcis a set of parent-child word pairs in the source sentence • Source-side dependency relation probability is defined in the same manner

Model Training

Model Training p(コロラド|Colorado)=0.7 p(大学|university)=0.6 … • Step 1：Estimate word translation prob. (IBM Model 1) • Initialize dependency relation prob. • Step 2：Estimate phrase translation prob. and dependency relation prob. • E-step • Create initial alignment • Modify the alignment by hill-climbing • Generate possible phrases • M-step: Parameter estimation Word base Tree base p(c) = 0.4 p(c;c)= 0.3 p(p) = 0.2 … p(コロラド|Colorado)=0.7 p(大学|university)=0.6 p(コロラド大学|university of Colorado)=0.9 …

Step 2 (E-step) Example of Hill-climbing 受受受受受 • Initial alignment is greedily created • Modify the initial alignment with the operations: • Swap • Reject • Add • Extend 光光光光光 A A A A A 素子素子素子素子素子 Initial Alignment photogate photogate photogate photogate photogate ににににに is is is is is Swap ははははは Extend Add Reject used used used used used フォトフォトフォトフォトフォト for for for for for ゲートゲートゲートゲートゲート the the the the the ををををを photodetector photodetector photodetector photodetector photodetector 用いた用いた用いた用いた用いた

Generate Possible Phrases 受 • Generate new possible phrases by merging the NULL-aligned nodes into their parent or child non-NULL-aligned nodes • The new possible phrases are taken into consideration from the next iteration 光 A 素子 photogate に is は used フォト for ゲート the を photodetector 用いた 

Model Training p(コロラド|colorado)=0.7 p(大学|university)=0.6 … • Step 1：Estimate word translation prob. (IBM Model 1) • Initialize dependency relation prob. • Step 2：Estimate phrase translation prob. and dependency relation prob. • E-step • Create initial alignment • Modify the alignment by hill-climbing • Generate possible phrases • M-step: Parameter estimation Word base Tree base p(c) = 0.4 p(c;c)= 0.3 p(p) = 0.2 … p(コロラド|colorado)=0.7 p(大学|university)=0.6 p(コロラド大学|university of colorado)=0.9 …

Experiments

Alignment Experiments • Training: JST Ja-En paper abstract corpus (1M sentences, Ja: 36.4M words, En: 83.6M words) • Test: 475 sentences with the gold-standard alignments annotated by hand • Parsers: KNP for Japanese, MSTParser for English • Evaluation criteria: Precision, Recall, F1 • For the proposed model, we did 5 iterations in each Step

Experimental Results +1.7

Effectiveness of Phrase and Tree • Positional relations instead of dependency relations c -1 p +1

Discussions • Parsing errors • Parsing accuracy is basically good, but still sometimes makes incorrect parsing results • Parsing probability into the model • Search errors • Hill-climbing sometimes goes local minima • Random restart • Function words • Behave quite differently in different languages (ex. case markers in Japanese, articles in English) • Post-processing

Post-processing for Function Words • Reject correspondences between Japanese particles and English “be” or “have” • Reject correspondences of English articles • Japanese “する” and “れる” or English “be” and “have” are merged into its parent verb or adjective if they are NULL-aligned +6.2 +0.3

Conclusion and Future Work • Linguistically motivated phrase alignment • Dependency trees • Phrase alignment • Bi-directional agreement • Significantly better results compared to conventional word alignment models • Future work: • Apply the proposed model for other language pairs (Japanese-Chinese and so on) • Incorporate parsing probability into our model • Investigate the contribution of our alignment results to the translation quality

Statistical Phrase Alignment Model Using Dependency Relation Probability

Statistical Phrase Alignment Model Using Dependency Relation Probability

Presentation Transcript

Word and Phrase Alignment

Finding Theoretical Probability Using an Area Model

Generic Face Alignment using Boosted Appearance Model

Finding Theoretical Probability Using an Area Model

An Integrated Phrase Segmentation/Alignment Algorithm for Statistical Machine Translation

Bayesian Subtree Alignment Model based on Dependency Trees

Face alignment using Boosted Appearance Model (BAM)

Dependency Model Using Posterior Context

Finding Theoretical Probability Using an Area Model

Machine Translation Phrase Alignment

Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model

Finding Theoretical Probability Using an Area Model

Finding Theoretical Probability Using an Area Model

Face alignment using Boosted Appearance Model (BAM)

Word and Phrase Alignment

Using Probability

Word and Phrase Alignment

A Phrase-Based Model of Alignment for Natural Language Inference

Entity Relation Model

Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation