10 likes | 106 Vues
Explore the benefits of using hard examples in SMT tuning, comparing methods, choosing inputs, and synthesizing translations for improved output quality. Consider implications for translator selection and tuning data. Future work includes expanding to other languages, evaluations, and techniques for optimal tuning.
E N D
Parameter Optimization for SMT: It Pays to Learn from Hard Examples Preslav Nakov, Fahad Al-Obaidli, Francisco Guzmán, Stephan Vogel 3. Results 1. Overview • Statistical Machine Translation (SMT) • tuning/testing: multiple references • direction: XEnglish (e.g., NIST, IWSLT) • Reversing the direction to EnglishX • single reference translation • multiple sources for each tuning/testing sentence • Question: How to use the multiple tuning inputs? • Answer: Use the hardest input. Tuning on NIST MT04x, testing on MT05x Choosing an entire dataset 2. Method • A. Choosing an entire dataset • Baselines • select-first: use the first input • concat-all: use all inputs • Choose BLEU / LEN • best-on-tuning: tune using one input, then use the learned parameters to translate all inputs • X-vs-all-but-X: score each English input using the other English inputs as references • backtranslate: translate the reference to English, then compare it to each English input • B. Select the best input for each sentence • Compare to an “English” reference from • our own system • Google Translate • Similarity measures: • BLEU+1 (B1): smooths n-gram counts only • BLEU+1 BP smooth (B1-BP): also smooths length ratio • BLEU+1 sigmoid length penalty (B1-SG): symmetric length penalty • Length Difference (DL): minimize • Minimum BLEU+1 (MIN-B1): instead of max • Minimum Length (MIN-L). • C. Synthesize input for each sentence • MEMT: fuse all inputs into one new input, sub-sentence O = worst according to the selection criteria X = best according to the selection criteria Selecting/synthesizing input for each sentence 4. Conclusion 5. Future Work • Other datasets: from NIST, IWSLT • Other language pairs, e.g., Chinese-English • Other evaluation measures: TER, METEOR • Quality estimation techniques • Better evaluation with multiple inputs • It is best to tune on the hardest available input • Implications: (i) how we should choose a good translator, and (ii) how to find useful data for tuning • Or maybe just an issue with BLEU?