Training a Parser for Machine Translation Reordering

Training a Parser for Machine Translation Reordering Jason Katz-Brown, Slav Petrov, Ryan McDonald, Franz Och David Talbot, Hiroshi Ichikawa, Masakazu Seno, HidetoKazawa

Dependency Parsing • Given a sentence, label the dependencies • (from nltk.org) • Output is useful for downstream tasks like machine translation • Also of interest to NLP reaserchers

Overview of Paper • Motivation • Targeted Self Training Algorithm • MT experiments • Domain adaptation

Motivation - Evaluation • Intrinsic • How well does system replicate gold annotations? • Precision/recall/F1, accuracy, BLEU, ROUGE, etc. • Extrinsic • How useful is system for some downstream task? • High performance on one doesn’t necessarily mean high performance on the other • Can be hard to evaluate extrinsically

Motivation • Parsing is not a stand-alone task • Useful as part of a larger system • High-fidelity replication of gold parses won’t necessarily yield the best downstream performance • Try to train a model that will yield better downstream performance than a model trained to replicate gold standard • Maximize extrinsic quality, rather than intrinsic

Targeted Self Training Algorithm • For each sentence in a corpus • Parse sentence S with a baseline parser, get k-best • Choose the parse of S that optimizes some function F, add to training data • Retrain parser • F measures the extrinsic quality of the parse • Finding F can be challenging! • Standard self training: just choose 1-best

Reordering • Reordering is changing source language word order to target language word order • Here doing English (SVO) to Japanese (SOV) • Metrics that account for word order correlate better with human judgment than those that prefer word choice • Can use manually or automatically derived tree transforms to reorder • Reordering is useful as a preprocessing step

Reordering • Reordering is its own step • Function to evaluate reordering quality, given gold reordering: 1 – ((# chunks – 1) / (# words – 1)) • Chunks are contiguous spans in both predicted and gold • Prediction: A B E C D; Gold: A B C D E • 1 – ((3 -1 ) / (5 – 1))

Parsing and Reordering • Different parses yield different reorderings • Systems tend to be sensitive to errors

MT Experiment Setup • Train baseline Nivre dependency parser on WSJ (and Berkeley parser) • English/Japanese corpus with literal translations and manual word alignments • 6,268 training / 7,327 test sentences • Annotators need very little training • Makes this relatively cheap • Annotating dependency parses requires a lot of training

MT Experiment Setup • Use hand-crafted rules for reordering • Phrase-based MT system • Train parser in 3 ways: • Baseline • Standard self-training • Targeted self-training • Look at: • Labeled attachment score (LAS; intrinsic) • Reordering score • MT quality (BLEU and human)

Results

Results • Evaluated MT quality with BLEU and humans • Varied the training of the dependency parser that feeds into reordering component • Experiments in Korean, Japanese and Turkish (all SOV languages) • In all cases BLEU and human opinion improves with targeted self training (10x) compared to baseline parser • Humans still put the translation quality in the “some meaning/grammar” range (~2.5/6) • Improvement is not drastic

Domain Adaptation Experiment • Use Question Treebank (QTB) to make MT system translate questions better than baseline system • Have 2k questions parsed • Have 2k questions translated and annotated for reordering • Compare translation output from system that includes parsers trained in different ways

Results

Results • BLEU score and human opinion of Japanese translations of QTB test sentences was higher with targeted self training than with baseline parser • Gold QTB yielded better reordering score, but more expensive to produce than alignments • Didn’t report BLEU/human opinion on resulting translations

Training a Parser for Machine Translation Reordering

Training a Parser for Machine Translation Reordering

Presentation Transcript

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation Minimum Error Rate Training

Machine Translation

Machine Translation

Machine Translation

Reordering

Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation