1 / 17

Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem

Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem. Mikhail Zaslavskiy Marc Dymetman Nicola Cancedda ACL 2009. Introduction. Word-based & Phrase-based Machine Translation (MT) Statistical machine translation (SMT) Successful in practice

dexter
Télécharger la présentation

Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phrase-Based Statistical Machine Translation as a Traveling SalesmanProblem Mikhail Zaslavskiy Marc Dymetman Nicola Cancedda ACL 2009

  2. Introduction • Word-based & Phrase-based Machine Translation (MT) • Statistical machine translation (SMT) • Successful in practice • Open Source Moses, Google Translate, etc. • cette traduction automatique est curieuse (this automatic translation is curious) Biphrase table

  3. Decoding Complexity • Decoding: Perform MT given models. • Translation, language, distortion, etc. • Word-based SMT is NP-hard • Any NP problem can be reduced to Travelling Salesman Problem (TSP) • Any TSP instance can be reduced to word-based SMT • It is in NP • So it is NP-complete • Kevin Knight. 1999. Decoding Complexity in Word-Replacement Translation Models. Computational Linguistics.

  4. Goal • TSP is NP-complete • Word-based SMT is in NP • So SMT can be reduced to TSP, theoretically. • Goal • Reduce SMT to TSP • Directly apply existing TSP solvers to SMT

  5. Traveling Salesman Problem • STSP (Symmetric TSP) • Most standard and studied • Undirected graph G on N nodes, where the edges carry real-valued costs. • Goal: find a Hamiltonian Circuit of minimal cost • ATSP (Asymmetric TSP) • Graph G is directed • Edges (i,j) and (j,i) may carry different costs

  6. Traveling Salesman Problem (2) • SGTSP (Symmetric Generalized TSP) • Undirected graph G of |G| nodes • Given partition of these |G| nodes into m non-empty, disjoint clusters • Find a circular sequence of m nodes of minimal total cost, where each cluster is visited exactly once. C2 C1 C3 Cm C4

  7. Traveling Salesman Problem (3) • AGTSP (Asymmetric Generalized TSP) • Directed SGTSP • Edges (i,j) and (j,i) may carry different costs • Reductions • SMT --> AGTSP • This paper • AGTSP --> ATSP • C. Noon and J.C. Bean. 1993. An efficient transformation of the generalized traveling salesman problem. INFOR, pages 39–44. • ATSP --> STSP • David L. Applegate et al, 2007. The Traveling Salesman Problem: A Computational Study (Princeton Series in Applied Mathematics). Princeton University Press, January.

  8. Phrase-based Decoding as AGTSP • Translating the French sentence "cette traduction automatique est curieuse" into English. • Biphrase table

  9. Clusters in AGTSP • Graph nodes are all the possible pairs (w, b). • b = biphrase, w = source word contained by b • biphrase ht contributes (cette, ht) and (traduction, ht) • Clusters are the subsets of the graph nodes that share a common source word w. • # of clusters = # of words in the sentence • 5 words in this case

  10. Example Graph traduction cluster cette cluster Start cluster automatique cluster est cluster curieuse cluster

  11. Transition Cost • Transition between nodes M and N • M is (w1, b) and N is (w2, b), and w1 and w2 are consecutive words in b. • Source side of b is "......w1w2...." • Cost = 0, because of same biphrase

  12. Transition Cost • M is (w1, b1), where w is the rightmost source word in b1, and N = (w2, b2), where w2 is the leftmost source word in b2 • Meaning: combine biphrases b1 and b2 • Costs of b1 and b2 • Language model, translation model, etc. • Costs of combining them • Language model • Distortion model

  13. Example Circuit This machine translation is strange Output: This machine translation is strange

  14. Experiment 1 • Given English (target) word sequence in French (source) order. The goal is to reconstruct "bad English" into "good English" with pure language model. • One node for each cluster. • Example • this translation automatic is curious (cette traduction automatique est curieuse) • Reorder the sentence into this automatic translation is curious • Corpus • Training: 50000 sentences from NewsCommentary corpus • Testing: 170 sentences, average length is 17 words

  15. Experiment 1 • Exact TSP solver (Concorde) vs. SMT (Moses) • Better performance for both bigram & trigram • Wrong sentence with higher score than correct sentence is possible Bigram Trigram

  16. Experiment 2 • Machine Translation task • LK (Lin-Kernighan) TSP solver implemented in Concorde • Not exact solver, since node size is too large • Data: Europarl • Training: 2.81 million sents • Testing: 500 sents

  17. Comment • Main contribution • Transform SMT to TSP • Directly solve MT with TSP solver • Problem • Experiment 1 • Word reordering is less practical • Experiment 2 • No significant test, diff(BLEU) < 1 • BLEU score is too low (30 in 2003) • Experiment • Sentence length (17) for test • Sentence number (170, 500) for test

More Related