1 / 27

Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation

Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation. Linguistics 580 (Machine Translation) Scott Drellishak, 2/21/2006. Overview. Gildea presents an alignment model he describes as “loosely tree-based” Builds on Yamada & Knight (2001), a tree-to-string model

Télécharger la présentation

Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Daniel Gildea (2003):Loosely Tree-Based Alignment for Machine Translation Linguistics 580(Machine Translation)Scott Drellishak, 2/21/2006

  2. Overview • Gildea presents an alignment model he describes as “loosely tree-based” • Builds on Yamada & Knight (2001), a tree-to-string model • Gildea extends it with a clone operation, and also into a tree-to-tree model • Wants to keep performance reasonable (polynomial in sentence length)

  3. Background • Tree-to-String Model • Tree-to-Tree Model • Experiment

  4. Background • Historically, two approaches to MT: transfer-based and statistical • More recently, though, hybrids • Probabilistic models of structured representations: • Wu (1997) Stochastic Inversion Transduction Grammars • Alshawi et. al. (2000) Head Transducers • Yamada & Knight (2001) (see below)

  5. Gildea’s Proposal • Need to handle drastic changes to trees (real bitexts aren’t isomorphic) • To do this, Gildea adds a new operation to the Y&K’s model: subtree clone • This operation clones a subtree from the source tree to anywhere in the target tree. • Gildea also proposes a tree-to-tree model that uses parallel tree corpora.

  6. Background • Tree-to-String Model • Tree-to-Tree Model • Experiment

  7. Yamada and Knight (2001) • Y&K’s model is tree-to-string: the input is a tree and output is a string of words. • (Gildea compares it to an “Alexander Calder mobile”. He’s the guy who invented that kind of sculpture, which is like Y&K’s model, because each node of the tree can turn either backwards or forwards. Visualize!)

  8. Y&K Tree-to-String Model • Three steps to turn input into output: • Reorder the children of each node (for m nodes, m! orderings; conditioned only on the category of the node and its children) • Optionally insert words at each node either before or after all the children (conditioned only on foreign word) • Translate words at leaves (conditioned on P(f|e); words can translate to NULL)

  9. Aside: Y&K Suitability • Recall that this model was used for translating English to Japanese. • Their model is well-suited to this language pair: • Japanese is SOV, while English is SVO. Japanese is also generally head-last where English is head-first. Reordering handles both of these. • Japanese marks subjects/topics and objects with postpositions. Insertion handles this.

  10. Y&K EM Algorithm • EM algorithm estimates inside probabilities β bottom-up: for all nodes εiin input tree T do for all k, l such that 1 < k < l < N do for all orderings ρof the children ε1… εmof εido for all partitions of span k, l into k1, l1…km, lmdo end for end for end forend for

  11. Y&K Performance • Computation complexity O(|T|Nm+2), where T = tree, N = input length, m = fan-out of the grammar • “By storing partially complete arcs in the chart and interleaving the inner two loops”, improve to O(|T|n3m!2m) • Gildea says “exponential in m” (looks factorial to me) but polynomial in N/n • If |T| is O(n) then the whole thing is O(n4)

  12. Y&K Drawbacks • No alignments with crossing brackets: A B Z X Y • XZY and YZX are impossible • Recall that Y&K flatten trees to avoid some of this, but don’t catch all cases

  13. Adding Clone • Gildea adds clone operation to Y&K’s model • For each node, allow the insertion of a clone of another node as its child. • Probability of cloning εi under εj in two steps: • Choice to insert: • Node to clone: • Pclone is one estimated number, Pmakeclone is constant (all nodes equally probable, reusable)

  14. Background • Tree-to-String Model • Tree-to-Tree Model • Experiment

  15. Tree-to-Tree Model • Output is a tree, not a string, and it must match the tree in the target corpus • Add two new transformation operations: • one source node → two target nodes • two source nodes → one target node • “a synchronous tree substitution grammar, with probabilities parameterized to generate the target tree conditioned on the structure of the source tree.”

  16. Calculating Probability • From the root down. At each level: • At most one of node’s children grouped with it, forming an elementary tree (conditioned on current node and CFG rule children) • Alignment of e-tree chosen (conditioned as above). Like Y&K reordering except: (1) alignment can include insertions and deletions (2) two nodes grouped together are reordered together. • Lexical leaves translated as before.

  17. Elementary Trees? • Elementary trees allow the alignment of trees with different depths. Treat A,B as an e-tree, reorder their children together: A A B Z → X Z Y X Y

  18. EM algorithm • Estimates inside probabilities β bottom-up: for all nodes εain source tree Ta in bottom-up order do for all elementary trees ta rooted in εado for all nodes εb in target tree Tb in bottom-up order do for allelementary trees tb rooted in εbdo for all alignments α of the children of ta and tbdo end forend for end for end forend for

  19. Performance • Outer two loops are O(|T|2) • Elementary trees include at most one child, so choosing e-trees is O(m2) • Alignment is O(22m) • Which nodes to insert or clone is O(22m) • How to reorder is O((2m)!) • Overall: O(|T|2m242m(2m)!), quadratic (!) in size of the input sentence.

  20. Tree-to-Tree Clone • Allowing m-to-n matching of up to two nodes (e-trees) allows only “limited non-isomorphism” • So, as before, add a clone operation • Algorithm unchanged, except alignments may now include cloned subtrees, same probability as in tree-to-string (uniform)

  21. Background • Tree-to-String Model • Tree-to-Tree Model • Experiment

  22. The Data • Parallel Korean-English corpus • Trees annotated by hand on both sides • “in this paper we will be using only the Korean trees, modeling their transformation into the English text.” • (That can’t be right—only true for TTS?) • 5083 sentence: 4982 training, 101 eval

  23. Aside: Suitability • Recall that Y&K’s model was suited to the English-to-Japanese task. • Gildea is going to compare their model to his, but using a Korean-English corpus. Is that fair? • In a word, yes. Korean and Japanese are syntactically very similar: agglutinative, head-last (so similar that syntax is the main argument that they’re related).

  24. Results • Alignment Error Rate Och & Ney (2000):

  25. Results Detailed • The lexical probabilities come from Model 1 and node reordering probabilities initialized to uniform • Best results when Pins set to 0.5 rather than estimated (!) • “While the model learned by EM tends to overestimate the total number of aligned word pairs, fixing a higher probability for insertions results in fewer total aligned pairs and therefore a better trade-off between precision and recall”

  26. How’d TTS and TTT Do? • The best results were with tree-to-string, surprisingly • Y&K + clone was ≈ to IBM, fixing Pins was best overall • Tree-to-tree + clone was ≈ to IBM, but it was much more efficient to train (since it’s quadratic instead of quartic) • Still, disappointing results for TTT

  27. Conclusions • Model allows syntactic info to be used for training without ordering constraints • Clone operations improve alignment results • Tree-to-tree + clone is better only in performance (but he’s hopeful) • Future directions: bigger corpora, conditioning on lexicalized trees

More Related