40 likes | 175 Vues
This project focuses on systematic studies of heuristics for phrase table generation, utilizing the Moses pipeline. It includes experiments with various alignment models, such as GIZA and Berkeley aligners, to evaluate their impacts on phrase tables and translation quality. Comparative analysis is made between phrase-based and hierarchical decoding along different language pairs. The project also explores methods to detect discontinuous phrases, apply Parts-of-Speech features to improve alignments, and refine MT systems with confidence estimation and DNT lists for better outcomes.
E N D
Suggestions for Class Projects • Systematic study of phrase table generation heuristics (experimentation) • use Moses pipeline, different heuristics, systematic changes of alignment density, alignment quality, corpus size to extract phrase pairs • Comparison of Moses phrase-based and hierarchical decoding (experimentation) • experiment with at least 2 language pairs; analyze the differences • Compare GIZA and Berkeley aligner (experimentation) • compare AER; impact on phrase table; impact on resulting translation • Word alignment for language without word boundaries (experimentation) • can standard work alignment models help to detect word/morpheme boundaries; experiment with simple (e.g. Spanish) and difficult language (e.g Inupiac)
Suggestions for Class Projects • Detect discontinuous phrase pairs (implementation) • analyze dis-contiguous phrases in hand aligned data; implement extension to PESA aligner to detect dis-contiguous phrases • Use Parts-of-Speech features to improve phrase alignment (implementation) • evaluate phrase alignment quality; implement extension to PESA aligner; evaluate impact on translation
Suggestions for Class Projects • Sentence-level Confidence Estimation of MT quality, possibly including identifying poor MT translations • Tuning an MT system to different automatic metrics (including the new METEOR-tuning), and comparing the outcomes • Learning DNT (Do Not Translate) Lists and Incorporating them into Moses • Improving a Baseline Moses MT System by: - Filtering out bad word-alignments - Filtering the phrase-table - Adding decoding features • Building a Hierarchical or Syntax-based MT system for any language-pair