1 / 31

Statistical Machine Translation with Rule Based Re-ordering of Source Sentences

Statistical Machine Translation with Rule Based Re-ordering of Source Sentences. Amit Sangodkar Vasudevan N Om P. Damani (CSE, IIT Bombay). Motivation. Combining Linguistic knowledge with Statistical Machine Translation.

ziazan
Télécharger la présentation

Statistical Machine Translation with Rule Based Re-ordering of Source Sentences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Machine Translation with Rule Based Re-ordering of Source Sentences Amit Sangodkar Vasudevan N Om P. Damani (CSE, IIT Bombay)

  2. Motivation • Combining Linguistic knowledge with Statistical Machine Translation. • Can re-ordering source language sentences as per target language improve the alignment?

  3. Example English: Many Bengali poets have sung songs in praise of this land. Hindi: कई बंगाली कवियों ने इस महान भूमि की प्रशंसा के गीत गाए हैं Re-order: Many Bengali poets this land of praise in songs sung have

  4. Translation Architecture

  5. Dependency Parser Many Bengali poets have sung songs in praise of this land. amod (poets-3, Many-1) nn (poets-3, Bengali-2) nsubj (sung-5, poets-3) aux (sung-5, have-4) dobj (sung-5, songs-6) prep_in (sung-5, praise-8) det (land-11, this-10) prep_of (praise-8, land-11) ------------------------------------ Output of Stanford Parser

  6. Tree Processing • Handling Auxiliary Verbs • remove and postfix to their respective verb • e.g. aux(sung, have)  sung_have • Handling Prepositions/Conjunctions • extract the preposition from the relation and attach to parent/child • e.g. prep_in(sung, praise)  prep(sung, praise_in)

  7. Modified Dependency Tree

  8. Re-ordering • Parent-Child Positioning • Prioritizing the Relations

  9. Re-ordering (Parent-Child Positioning) • parent before child conj (conjunction), appos (apposition), advcl (adverbial clause), ccomp (clausal complement), rcmod (relative clause modifier) • e.g. John cried because he fell advcl(cry, fell). In Hindi, cry is ordered before fell. • child before parent  nsubj(subject), dobj(object) • e.g.Ram eats mango dobj(eat,mango). In Hindi, mango ordered before eat.

  10. Re-ordering (Relation Priority) • Deciding the order in case of multiple children • Priority among relation pairs

  11. Illustration - Re-ordering Input Dependency Tree sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

  12. Illustration - Re-ordering sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

  13. Illustration - Re-ordering sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

  14. Illustration - Re-ordering Output: Many sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

  15. Illustration - Re-ordering Output: Many sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

  16. Illustration - Re-ordering Output:Many Bengali sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

  17. Illustration - Re-ordering Output:Many Bengali poets sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

  18. Illustration - Re-ordering Output: Many Bengali poets sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

  19. Illustration - Re-ordering Output: Many Bengali poets this sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

  20. Illustration - Re-ordering Output: Many Bengali poets this land of sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

  21. Illustration - Re-ordering Output: Many Bengali poets this land of praise in sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

  22. Illustration - Re-ordering Output: Many Bengali poets this land of praise in sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

  23. Illustration - Re-ordering Output: Many Bengali poets this land of praise in songs sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

  24. Illustration - Re-ordering Output:Many Bengali poets this land of praise in songs sung have कई बंगाली कवियों ने इस महान भूमि की प्रशंसा के गीत गाए हैं sung_have nsubj dobj prep poets praise_in songs nn amod prep land_of Bengali Many det this

  25. Experimental Setup • Procedure • Train Moses using Training data with 6-gram language model • Tune the Moses using Development data • Decode Testing data using trained Moses • This experimentation procedure on pure data and reordered data

  26. Results

  27. Translation Example - I Actual :इसी वर्ष नील व़्यापार और नील उत़्पादन के इतिहास में एक मोड़ आया. Baseline :इस वर्ष में एक निर्धारित बिंदु रहे के इतिहास में नील व्यापार और नील उत़्पादन. Re-ordered :इस साल नील व्यापार और नील उत़्पादन के इतिहास में यह एक रहा था.

  28. Translation Example - II Actual :वे गुलामी की जिंदगी से रिहाई चाहते हैं. Baseline :वे चाहते हैं कि deliverance का जीवन से गुलामी की है. Re-ordered :वे गुलामी की जिंदगी से रिहाई चाहते हैं.

  29. Conclusion • Using Linguistic knowledge appears to improve the SMT quality • BLEU score applicability in this context needs to be investigated

  30. Acknowledgements • We acknowledge the Department of IT (DIT), Government of India and the English-to-Indian Languages (EILMT) consortium for making the EILMT tourism dataset available. • IIIT Data Set: Data acquired during DARPA TIDES MT project 2003 and later refined at LTRC,IIIT-H.

  31. References • [Hieu2008] Hieu Hoang, Philipp Koehn, Design of the Moses Decoder for Statistical Machine Translation, ACL Workshop on Software engineering, testing, and quality assurance for NLP 2008. • [Marie2006] Marie-Catherine de Marneffe, Bill MacCartney and Christopher D. Manning, Generating Typed Dependency Parses from Phrase Structure Parses. In Proceedings of LREC-06. 2006. • [Manual2008] Stanford Dependencies Manual, Available at http://nlp.stanford.edu/software/dependencies_manual.pdf.. • [Moses] Moses Tutorial, Available at http://www.statmt.org/moses/?n=Moses.Tutorial. . • [Singh2007] Smriti. Singh, Mrugunk. Dalal, Vishal Vachhani, Pushpak Bhattacharyya, Om P. Damani. Hindi Generation from Interlingua (UNL), Machine Translation Summit XI, 2007.

More Related