1 / 27

A Syntax-Driven Bracketing Model for Phrase-Based Translation

A Syntax-Driven Bracketing Model for Phrase-Based Translation. Deyi Xiong, et al. ACL 2009. Introduction. Machine Translation Chinese to English Chinese 把 7 月 11 日 設立 為 航海 節 An ideal case:. 把 7 月 11 日 設立 為 航海 節.

bevis
Télécharger la présentation

A Syntax-Driven Bracketing Model for Phrase-Based Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009

  2. Introduction • Machine Translation • Chinese to English • Chinese • 把 7月 11日 設立 為 航海 節 • An ideal case: 把 7月 11日 設立 為 航海 節 to establish July 11 as Sailing Festival day

  3. Wrong Linguistic Structure • 航海 節 is a syntactic constituent 把 7月 11日 設立 為 航海 節 to set up for navigation on July 11 knots

  4. A Naive Solution • Employ syntactic constraints • Fully respect linguistic structures

  5. A Naive Solution (2) • Unfortunately, it damages the performance • Non-syntactic translations are sometimes useful 把 今天 設立 為 航海 節 establish today as Sailing Festival day

  6. Syntax-Driven Bracketing Model • SDB model • Translation unit is more important • Whether it is syntactic or non-syntactic • Include but not limited to constituent matching/violation • Protect the strength of the phrase-based system

  7. Translation Unit • Bracketable source phrase and its corresponding translation • Bracketable • A source phrase is bracketable • Its translation is contiguous • A pair of neighboring phrases is bracketable • Their translations are contiguous after combined

  8. Translation Unit Examples • Bracketable 把 今天 設立 為 把 今天 設立 為 establish today as establish today as • 把 今天 設立 and 為 are bracketable • 把 今天 設立 為 is bracketable

  9. Translation Unit Examples • Unbracketable 把 今天 設立 為 establish today as • 設立 and 為 are unbracketable • 設立 為 is unbracketable

  10. Bracketing Instances Extraction • Extract bracketable and unbracketable instances from training data • Aligned sentence pair + parsed source sentence • Estimate whether a source phrase is bracketable at run time

  11. SDB Features

  12. Rule Features • Rule Features (RF) • CFG rule • Horizontal context

  13. Rule Features (2) S1: ADVP  AD S2: VP  VV AS NP S: VP ADVP VP

  14. Path Features • Path features (PF) • Path to roots • S1 to the root of S • S2 to the root of S • S to the root of this tree • Vertical context

  15. Path Features (2) S1: ADVP VP S2: VP VP S: VP IP

  16. Constituent Boundary Matching Features • Constituent Boundary Matching Features (CBMF) • Exact match • Source phrase covers the boundaries of its tree • Inside match • Source phrase covers a sequence of its tree • Crossing match • Source phrase crosses the subtree of its tree

  17. Constituent Boundary Matching Features (3) Exact match Inside match Crossing match

  18. Integration into Phrase-based MT • SDB model estimate the probability that a source phrase is bracketable. • Whether it can be translated as a unit • Integrated into BTG MT system • Bracketing Transduction Grammar (Wu, 1997) Straight Inverted 把 今天 設立 為 把 今天 設立 為 establish today as as establish today

  19. Experiment • Comparing models • Baseline: BTG system • XP+ (Marton and Resnik, 2008) • NP, VP, PP, ADVP…. • Penalize each time when violating the syntactic boundaries. (soft constraint) • UniSDB • Only S features • BiSDB • S1, S2 and S features

  20. Experiment (2) • Chinese parser • Lexicalized PCFG parser (Xiong et al., 2005) • Parallel corpus • FBIS corpus • Word alignment • GIZA++ • Four-gram language model • Built with SRILM • Xinhua section of the the English Gigaword corpus • Maximum Entropy (ME) Trainer • Zhang 2004

  21. Result • SDB receives the largest feature weight • Imply its impact on decoder. XP+ and SDB Baseline features (Common for phrase-based systems)

  22. Result (2) • NIST MT-05 test set • Improvement of 1.67 BLEU over baseline • Improvement of 0.59 BLEU over XP+

  23. Result (3) • Based on CBMF, adding rule and path feature achieves further improvement • BiSDB is constantly better than UniSDB • Inner contexts (S1 and S2) are useful

  24. XP+ and SDB • Same • Consider syntactic constituent • Different • XP+ only punishes non-syntactic source phrase • SDB is able to encourage non-syntactic if the phrase is bracketable

  25. XP+ and SDB

  26. Conclusion • SDM model predict whether a source phrase can be translated as a unit. • Appropriate constituent violations are helpful • Because it better inherit the strength of phrase-based approach

More Related