Statistical Machine Translation with Rule Based Re-ordering of Source Sentences

Statistical Machine Translation with Rule Based Re-ordering of Source Sentences Amit Sangodkar Vasudevan N Om P. Damani (CSE, IIT Bombay)

Motivation • Combining Linguistic knowledge with Statistical Machine Translation. • Can re-ordering source language sentences as per target language improve the alignment?

Example English: Many Bengali poets have sung songs in praise of this land. Hindi: कई बंगाली कवियों ने इस महान भूमि की प्रशंसा के गीत गाए हैं Re-order: Many Bengali poets this land of praise in songs sung have

Translation Architecture

Dependency Parser Many Bengali poets have sung songs in praise of this land. amod (poets-3, Many-1) nn (poets-3, Bengali-2) nsubj (sung-5, poets-3) aux (sung-5, have-4) dobj (sung-5, songs-6) prep_in (sung-5, praise-8) det (land-11, this-10) prep_of (praise-8, land-11) ------------------------------------ Output of Stanford Parser

Tree Processing • Handling Auxiliary Verbs • remove and postfix to their respective verb • e.g. aux(sung, have)  sung_have • Handling Prepositions/Conjunctions • extract the preposition from the relation and attach to parent/child • e.g. prep_in(sung, praise)  prep(sung, praise_in)

Modified Dependency Tree

Re-ordering • Parent-Child Positioning • Prioritizing the Relations

Re-ordering (Parent-Child Positioning) • parent before child conj (conjunction), appos (apposition), advcl (adverbial clause), ccomp (clausal complement), rcmod (relative clause modifier) • e.g. John cried because he fell advcl(cry, fell). In Hindi, cry is ordered before fell. • child before parent  nsubj(subject), dobj(object) • e.g.Ram eats mango dobj(eat,mango). In Hindi, mango ordered before eat.

Re-ordering (Relation Priority) • Deciding the order in case of multiple children • Priority among relation pairs

Illustration - Re-ordering Input Dependency Tree sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

Illustration - Re-ordering sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

Illustration - Re-ordering Output: Many sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

Illustration - Re-ordering Output:Many Bengali sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

Illustration - Re-ordering Output:Many Bengali poets sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

Illustration - Re-ordering Output: Many Bengali poets sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

Illustration - Re-ordering Output: Many Bengali poets this sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

Illustration - Re-ordering Output: Many Bengali poets this land of sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

Illustration - Re-ordering Output: Many Bengali poets this land of praise in sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

Illustration - Re-ordering Output: Many Bengali poets this land of praise in songs sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

Illustration - Re-ordering Output:Many Bengali poets this land of praise in songs sung have कई बंगाली कवियों ने इस महान भूमि की प्रशंसा के गीत गाए हैं sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

Experimental Setup • Procedure • Train Moses using Training data with 6-gram language model • Tune the Moses using Development data • Decode Testing data using trained Moses • This experimentation procedure on pure data and reordered data

Results

Translation Example - I Actual :इसी वर्ष नील व़्यापार और नील उत़्पादन के इतिहास में एक मोड़ आया. Baseline :इस वर्ष में एक निर्धारित बिंदु रहे के इतिहास में नील व्यापार और नील उत़्पादन. Re-ordered :इस साल नील व्यापार और नील उत़्पादन के इतिहास में यह एक रहा था.

Translation Example - II Actual :वे गुलामी की जिंदगी से रिहाई चाहते हैं. Baseline :वे चाहते हैं कि deliverance का जीवन से गुलामी की है. Re-ordered :वे गुलामी की जिंदगी से रिहाई चाहते हैं.

Conclusion • Using Linguistic knowledge appears to improve the SMT quality • BLEU score applicability in this context needs to be investigated

Acknowledgements • We acknowledge the Department of IT (DIT), Government of India and the English-to-Indian Languages (EILMT) consortium for making the EILMT tourism dataset available. • IIIT Data Set: Data acquired during DARPA TIDES MT project 2003 and later refined at LTRC,IIIT-H.

References • [Hieu2008] Hieu Hoang, Philipp Koehn, Design of the Moses Decoder for Statistical Machine Translation, ACL Workshop on Software engineering, testing, and quality assurance for NLP 2008. • [Marie2006] Marie-Catherine de Marneffe, Bill MacCartney and Christopher D. Manning, Generating Typed Dependency Parses from Phrase Structure Parses. In Proceedings of LREC-06. 2006. • [Manual2008] Stanford Dependencies Manual, Available at http://nlp.stanford.edu/software/dependencies_manual.pdf.. • [Moses] Moses Tutorial, Available at http://www.statmt.org/moses/?n=Moses.Tutorial. . • [Singh2007] Smriti. Singh, Mrugunk. Dalal, Vishal Vachhani, Pushpak Bhattacharyya, Om P. Damani. Hindi Generation from Interlingua (UNL), Machine Translation Summit XI, 2007.

Statistical Machine Translation with Rule Based Re-ordering of Source Sentences