1 / 25

Better MT Using Parallel Dependency Trees

Better MT Using Parallel Dependency Trees. Yuan Ding University of Pennsylvania. Outline. Motivation The alignment algorithm Algorithm at a glance The framework Heuristics Walking through an example Evaluation Conclusion. Motivation (1) Statistical MT Approaches.

cathy
Télécharger la présentation

Better MT Using Parallel Dependency Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania 2003 (c) University of Pennsylvania

  2. Outline • Motivation • The alignment algorithm • Algorithm at a glance • The framework • Heuristics • Walking through an example • Evaluation • Conclusion 2003 (c) University of Pennsylvania

  3. Motivation (1)Statistical MT Approaches • Statistical MT approaches • Pioneered by (Brown et al., 1990, 1993) • Leverage large training corpus • Outperform traditional transfer based approaches • Major Criticism • No internal representation, syntax/semantics 2003 (c) University of Pennsylvania

  4. Motivation (2) Hybrid Approaches • Hybrid approaches • (Wu, 1997) (Alshawi et al., 2000) (Yamada and Knight, 2001, 2002) (Gildea 2003) • Applying statistical learning to structured data • Problems with Hybrid MT Approaches • Structural Divergence (Dorr, 1994) • Vagaries of loose translations in real corpora 2003 (c) University of Pennsylvania

  5. Motivation (3) • Holy grail: • Syntax based MT which captures structural divergence • Accomplished work • A new approach to the alignment of parallel dependency trees (paper published at MT summit IX) • Allowing non-isomorphism of dependency trees 2003 (c) University of Pennsylvania

  6. We are here… 2003 (c) University of Pennsylvania

  7. Outline • Motivation • The alignment algorithm • Algorithm at a glance • The framework • Heuristics • Walking through an example • Evaluation • Conclusion 2003 (c) University of Pennsylvania

  8. Define the Alignment Problem • Define the alignment problem • In natural language: find word mappings between English and Foreign sentences • In math: DefinitionFor each , find a labeling ,where 2003 (c) University of Pennsylvania

  9. The IBM Models • The IBM way • Model 1: Orders of words don’t matter, i.e. “bag of words” model • Model 2: Condition the probabilities on the length and position • Model 3, 4, 5: • A. generate fertility of each english word • B. generate the identity • C. generate the position • Gradually adding positioning information 2003 (c) University of Pennsylvania

  10. Using Dependency Trees • Positioning information can be acquired from parse trees • Parsers: (Collins, 1999) (Bikel, 2002) • Problems with using parse trees directly • Two types of nodes • Unlexicalized non-terminals control the domain • Using dependency trees • (Fox, 2002): best* phrasal cohesion properties • (Xia, 2001): constructing dependency trees from parse trees using the Tree Adjoining Grammar 2003 (c) University of Pennsylvania

  11. The Framework (1) • Step 1: train IBM model 1 for lexical mapping probabilities • Step 2: find and fix high confidence mappings according to a heuristic functionh(f, e) The girl kissed her kitty cat The girl gave a kiss to her cat A pseudo-translation example 2003 (c) University of Pennsylvania

  12. The Framework (2) • Step 3: • Partition the dependency trees on both sides w.r.t. fixed mappings • One fixed mapping creates one new “treelet” • Create a new set of parallel dependency structures 2003 (c) University of Pennsylvania

  13. The Framework (3) • Step 4: Go back to Step 1 unless enough nodes fixed • Algorithm properties • An iterative algorithm • Time complexity O(n * T(h)), where T(h) is the time for the heuristic function in Step 2. • P(f |e) in IBM Model 1 has a unique global maximun • Guaranteed convergence • Results only depend on the heuristic function h(f, e) 2003 (c) University of Pennsylvania

  14. Heuristics • Heuristic functions for Step 2 • Objective: find out the confidence of a mapping between a pair of words • First Heuristic: Entropy • Intuition: model probability distribution shape • Second heuristic: Inside-outside probability • Idea borrowed from PCFG parsing • Fertility threshold: rule out unlikely fertility ratio (>2.0) 2003 (c) University of Pennsylvania

  15. Outline • Motivation • The alignment algorithm • Algorithm at a glance • The framework • Heuristics • Walking through an example • Evaluation • Conclusion 2003 (c) University of Pennsylvania

  16. Walking through an Example (1) • [English] I have been here since 1947. • [Chinese] 1947 nian yilai wo yizhi zhu zai zheli. • Iteration 1: • One dependency tree pair. Align “I” and “wo” 2003 (c) University of Pennsylvania

  17. Walking through an Example (2) • Iteration 2: • Partition and form two treelet pairs. • Align “since” and “yilai” 2003 (c) University of Pennsylvania

  18. Walking through an Example (3) • Iteration 3: • Partition and form three treelet pairs. • Align “1947” and “1947”, “here” and “zheli” 2003 (c) University of Pennsylvania

  19. Outline • Motivation • The alignment algorithm • Algorithm at a glance • The framework • Heuristics • Walking through an example • Evaluation • Conclusion 2003 (c) University of Pennsylvania

  20. Evaluation • Training: • LDC Xinhua newswire Chinese – English parallel corpus • Filtered roughly 50%, 60K+ sentence pairs used • The parser generated 53130 parsed sentence pairs. • Evaluation: • 500 sentence pairs provided by Microsoft Research Asia. • Word level aligned by hand. • F-score: • A: set of word pairs aligned by automatic alignment • G: set of word pairs aligned in the gold file. 2003 (c) University of Pennsylvania

  21. Results (1) • Results for IBM Model 1 to Model 4 (GIZA) • Bootstrapped from Model 1 to Model 4 • Signs of overfitting • Suspect caused by difference b/w genres in training/testing 2003 (c) University of Pennsylvania

  22. Results (2) • Results for our algorithm: • Heuristic h1: (entropy) • Heuristic h2: (inside-outside probability) • The table shows results after one iteration, M1 = IBM model 1 • Overfitting problem • mainly caused by violation of the partition assumption in fine-grained dependency structures. 2003 (c) University of Pennsylvania

  23. Outline • Motivation • Algorithm at a glance • The framework • Heuristics • Walking through an example • Evaluation • Conclusion 2003 (c) University of Pennsylvania

  24. Conclusion • Model based on partitioning sentences according to their dependency structure • Without the unrealistic isomorphism assumption • Outperforms the unstructured IBM models on a large data set. • “Orthogonal” to the IBM models • uses syntactic structure but no linear ordering information. 2003 (c) University of Pennsylvania

  25. Thank You! 2003 (c) University of Pennsylvania

More Related