10 likes | 119 Vues
Language Knowledge Engineering Lab. Kyoto University. EBMT System of Kyoto University in OLYMPICS Task at IWSLT 2012. Chenhui Chu, Toshiaki Nakazawa , Sadao Kurohashi { chu , nakazawa , kuro }@ nlp.ist.i.kyoto-u.ac.jp. Graduate School of Informatics, Kyoto University. System Description.
E N D
Language Knowledge Engineering Lab. Kyoto University EBMT System of Kyoto University in OLYMPICS Task at IWSLT 2012 Chenhui Chu, Toshiaki Nakazawa, SadaoKurohashi {chu, nakazawa, kuro}@nlp.ist.i.kyoto-u.ac.jp Graduate School of Informatics, Kyoto University System Description Sub-sentence Splitting Optimized Chinese Segmenter Zh: 我带了矿泉水和茶,您喜欢喝什么? En: I’ve brought some mineral water and some tea. Which do you prefer? Zh1: 我带了矿泉水和茶。 En1: I’ve brought some mineral water and some tea. Zh2: 您喜欢喝什么? En2: Which do you prefer? ※ Proposed for Chinese-Japanese MT [Chu+, 2012] Non-parallel Sentence Filtering Rule-based Decoding Constraints • Punctuation-based splitting • Reduce the computational complexity • Avoid examples across punctuation boundaries Zh:我上牛津大学。 (I am studying at Oxford University.) En: What about you? Alignment Model Additional Corpora Zh: 是的,这位女士要一杯曼哈顿酒,我要一杯马丁尼。 (Yes, this lady will have a Manhattan, and I’ll have a martini.) En: Yes, I think so. This lady will have a Manhattan and I’ll have a martini. • Wiki corpora • Bilingual titles in Wikipedia • Bilingual terms in Wiktionary Related Work [DeNero+, 2008] Proposed [Nakazawa+, 2011] Experimental Results 5K parallel sentence pairs from BTEC Step 1 & 2 Cartesian Product Non-parallel sentence pairs 500 Non-parallel sentence pairs Selection Classifier Input: 这饭菜怎么样,女士? Step 3 Output: How about this food, Madam? HIT & BTEC corpora Dictionary Input: 干酪饼很不错,尝一尝吗? Output: The cheese pie is very good, taste? Simple position-based reordering Dependency tree-based reordering IWSLT 2012 OLYMPICS Task, Hong Kong, Dec. 6-7, 2012