1 / 17

Generalising lexical translation strategies for MT using comparable corpora

Generalising lexical translation strategies for MT using comparable corpora. Bogdan Babych, Serge Sharoff, Anthony Hartley Centre for Translation Studies, University of Leeds Leeds, UK {b.babych,s.sharoff,a.hartley}@leeds.ac.uk. Overview.

topanga
Télécharger la présentation

Generalising lexical translation strategies for MT using comparable corpora

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generalising lexical translation strategies for MT using comparable corpora Bogdan Babych, Serge Sharoff, Anthony Hartley Centre for Translation Studies, University of Leeds Leeds, UK {b.babych,s.sharoff,a.hartley}@leeds.ac.uk

  2. Overview • Indirect translation equivalents in MT: current limitations • Increasing the range of translation equivalents used by MT • Equivalent-oriented vs. strategy-oriented approaches • Methodology for discovering translation strategies using comparable corpora • Applications for terminology research • Conclusions and future work LREC 2008 Generalising Lexical Translation Strategies for MT

  3. Indirect equivalents in MT Data-driven MT (statistical & example-based) • Reusing equivalents learnt from parallel corpora • Problem: Lack of generalisation • Equivalents expressed as word patterns • Do not generalise beyond lemmas • Cannot generate indirect equivalents for ‘unseen’ expressions • Difficult to maintain many specific patterns • Fundamental limits on the range of translation solutions generated by MT LREC 2008 Generalising Lexical Translation Strategies for MT

  4. Indirect equivalents: Change of perspective Problems for MT: non-fluent translations & mistranslations • Ru: Изкризисов такого рода как парламентский можно выходитьза счет демократических методов. • lit.: 'Fromcrises of such type as parliamentary it is possible to go out by means of democratic methods • RBMT: Such as parliamentary it is possible to leave crisesdue to democratic methods. • SMT: This kind of crisis as a parliamentary, can go through democratic methods. • HT: We can escape crises like these through democratic means LREC 2008 Generalising Lexical Translation Strategies for MT

  5. From equivalents to lexical translation strategies • Indirect equivalents = ‘creative’ solutions to non-trivial problems • Parallel corpora: too small, sparse and specialised • The same problem often solved idiosyncratically: no clear statistical model • Set of ‘indirect’ translation problems is open • Our solution: higher order model • Generalising classes of equivalents as strategies • By similarity of usage in comparable corpora • Equivalents to unseen expressions are generated from discovered strategies LREC 2008 Generalising Lexical Translation Strategies for MT

  6. Current methodology • One fixed strategy: rephrasing words using similarity of ‘collocation vectors’ ~ near-synonyms • Generator of equivalents from ASSIST project • выходить из кризиса (go out of crisis) ~ {to approach, to face, to get over} crisis • Выходить(goout).sim задходить(come).dict + collocations of (crisis)  to approach • No other strategies yet implemented • Transposition (change of syntactic perspective) Modulation (change of lexical perspective) … • Further goal: to find ~ escape from crisis … via … LREC 2008 Generalising Lexical Translation Strategies for MT

  7. Strategy evaluation • Coverage of problems vs. coverage of solutions • Several strategies cover the same problem (variation) • Ru:Механизм принятия решений будет публичным. (lit.: 'The mechanism of making decisions will be public‘) • публичный механизм (‘public mechanism’) • Public process / … a greater public interaction (Current re-phrasing strategy) • The answer will come from the people. (Change-of-perspective strategy) • It is harder to match solutions: diversity of strategies LREC 2008 Generalising Lexical Translation Strategies for MT

  8. Coverage of translation problems by re-phrasing strategy • Characterising linguistic productivity of the strategy • Experiment: 12 translators suggest indirect solutions to the same set of problems • 36 translation problems (25 Ru & 11 En) • 210 different human solutions (5.83 solutions / problem) • Task of the system: to generate a possible solution for each problem LREC 2008 Generalising Lexical Translation Strategies for MT

  9. Coverage of translation problems by re-phrasing strategy • For 75% of problems: at least 1 match by re-phrasing strategy • Average coverage of a set of human solutions: 34.7% LREC 2008 Generalising Lexical Translation Strategies for MT

  10. Coverage of translation solutions by re-phrasing strategy • Comparing coverage of indirect equivalents by: • (1) bilingual dictionary solutions (Oxford Russian) • (2) solutions extracted from word alignment in parallel corpus: • Training Set: Ru-En news, 700k wd. • Test Set: Euronews Ru-En interviews, 100k wd. • (3) strategy-based (i.e. re-phrasing) solutions: • Collocations vectors from monolingual corpora (BNC, RNC) ~ 100M • Filtered by co-occurrence in news corpora ~200M LREC 2008 Generalising Lexical Translation Strategies for MT

  11. Coverage of solutions by re-phrasing strategy • Task of the system: to generate an exact solution for each problem LREC 2008 Generalising Lexical Translation Strategies for MT

  12. Coverage of solutions by re-phrasing strategy Conclusions • Learning individual equivalents is not efficient • Low coverage of unseen problems • Lower generalisation of idiosyncratic alignments • Re-phrasing strategy: productive but not sufficient LREC 2008 Generalising Lexical Translation Strategies for MT

  13. On-going project: beyond re-phrasing strategy Modelling transposition and modulation strategies • Learning strategies from parallel data • Aligning ‘indirect’ solutions (discontinuous MWEs) • выходить из кризиса (go out of crisis)<~> escapecrisis • Generalising equivalents with similarity classes • Covering unseen expressions: • {Выходить / выводить…} из {конфликта / застоя / депрессии…}(go out / lead out from crisis, stagnation, depression) <~> to escape conflict/ controversy, to flee difficulty, to survive disaster/ tragedy … LREC 2008 Generalising Lexical Translation Strategies for MT

  14. MT-oriented evaluation Improvements for incomprehensible translations and mistranslations: • MT: Es verdad que empezamos vacilantes pero era lógico. (lit: started hesitant) • HT: Of course we had our doubts to begin with but that's normal • SMT: It is true that we started to waver but was logical (unacceptable literal translation) • empezar vacilante ~ begin doubt(modulation) • Indirect solutions: we had our fears/ doubts to start with; we began with fear/ scepticism/ worries...; we were not convinced then; after our early scepticism; we were soon/gradually/quickly convinced LREC 2008 Generalising Lexical Translation Strategies for MT

  15. Application to terminological research • Terminological equivalents are usually direct • Rarely change lexical or syntactic perspective • Standard fixed equivalents preferred • Distributional similarity framework • Yields a network of related terms (not paraphrases) • Useful for automating terminological research • Prototype terminological workbench for translators • English—French corpora in a specialised domain (2M words in total); Giza alignments; termbanks • Translators explore systems of related terms LREC 2008 Generalising Lexical Translation Strategies for MT

  16. Terminological interface for translators • French term plan and the English term plain LREC 2008 Generalising Lexical Translation Strategies for MT

  17. Conclusions and future work • Making testable predictions for indirect equivalents • Model for re-phrasing, transposition & modulation strategies • Match human translators’ solutions for unseen phrases • Future work • Automatic identification of phrases which need non-literal translation • Building fluent equivalents around solutions • Integrating strategy-based generator into SMT decoder • Evaluation of the improvement in coverage • Evaluation of the productivity / reusability of strategies LREC 2008 Generalising Lexical Translation Strategies for MT

More Related