1 / 28

Current Trends in MT

Current Trends in MT. Andy Way NCLT, School of Computing, Dublin City University, Dublin 9, Ireland away@computing.dcu.ie www.nclt.dcu.ie/mt/. Overview of Talk. Current Trends From EACL-06 to ACL-07 Topics Country of Origin Ongoing and Future Work at DCU Other Important Research

Rita
Télécharger la présentation

Current Trends in MT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Current Trends in MT • Andy Way • NCLT, School of Computing, • Dublin City University, • Dublin 9, Ireland • away@computing.dcu.ie • www.nclt.dcu.ie/mt/

  2. Overview of Talk • Current Trends • From EACL-06 to ACL-07 • Topics • Country of Origin • Ongoing and Future Work at DCU • Other Important Research • Future General Directions • Increased convergence within MT • Increased convergence between MT and rest of NLP • Concluding Remarks NCLT, Dublin, April 2007

  3. Current Trends EACL-06 MT Track featured 24 papers in a number of areas: NCLT, Dublin, April 2007

  4. Current Trends: Country of Origin • Of the 24 MT papers: • 18 (75%) were from Europe • 6 from UK • 6 from Spain • 3 from Germany • 1 each from Romania, Italy & Ireland • 6 (25%) were from N. America (5 from USA) • 0 were from Asia NCLT, Dublin, April 2007

  5. Current Trends: Success Rates (by Country) • Of the 24 MT papers, 7 (29%) were accepted (general EACL acceptance rate 19.7%: 52/264) • 2 from USA (out 0f 5) • 2 from Germany (out of 3) • 1 from UK (out of 6) • 1 from Romania (out of 1) • 1 from Canada (out of 1) NCLT, Dublin, April 2007

  6. Current Trends: Success Rates (by Topic) • Of the 7 accepted MT papers • 2 were on SMT (out of 8) • 2 were on word alignment (out of 4) • 2 were on evaluation (out of 5) • 1 was on hybrid MT (out of 1) NCLT, Dublin, April 2007

  7. Current Trends ACL-07 MT Track features 67 papers in a number of areas: NCLT, Dublin, April 2007

  8. Current Trends ACL-07 SMT Track features 29 papers in a number of areas: NCLT, Dublin, April 2007

  9. Current Trends: Summary of Themes • Of the 67 MT papers: • 54 (80%) involve corpus-based MT • 9 (13%) involve evaluation • 3 (4%) involve RBMT NCLT, Dublin, April 2007

  10. Current Trends: Country of Origin • Of the 67 MT papers: • 32 (48%) are from Asia • 19 (28%) are from N. America (18 from USA) • 16 (24%) are from Europe NCLT, Dublin, April 2007

  11. Current Trends: Country of Origin Of the 32 papers from Asia: NCLT, Dublin, April 2007

  12. Current Trends: Country of Origin Of the 16 papers from Europe: NCLT, Dublin, April 2007

  13. Change 06—07 (by Topic) NCLT, Dublin, April 2007

  14. Change 06—07 (by Country) NCLT, Dublin, April 2007

  15. Current Trends: Success Rates (by Country) • Of the 67 MT papers, 17 were accepted accepted (25.4%; overall acceptance rate 22.4%) from the following countries: • USA: 8 (out of 18) • China: 3 (out of 20) • Ireland: 2 (out of 3) • UK: 2 (out of 2) • Canada: 1 (out of 1) • Singapore: 1 (out of 1) NCLT, Dublin, April 2007

  16. Current Trends: Success Rates (by Topic) • Of the 17 successful MT papers: • 3 were on language modelling/decoding • 2 were on evaluation • 2 were on word alignment • 2 were on reordering • 1 was on word-sense disambiguation • 1 was on treestring models • 1 was on SMT via pivot languages • 1 was on multi-parallel corpora • 1 was on hybrid MT • 1 was on transductive learning NCLT, Dublin, April 2007

  17. Consequences of these Trends • The ‘system’ is at breaking point • Do we need a pre-selection phase? • As in many other areas, a ‘new world order’ is emerging • There is very little internal QA as yet • Standard of English and basic structure is lacking • But … they’re doing OK already, and they’ll improve! • Relatively few ‘world centres’ in MT at present • Despite massive increase in MT use, big decrease in teaching of MT – paradox! NCLT, Dublin, April 2007

  18. Ongoing Work in DCU • Integrating Syntax into SMT • Supertag translation and target language models • Adding source language information • Tree-to-Tree Translation (DOT, LFG-DOT: also treestring models), inc. porting monolingual parsing techniques to the bilingual case • Applications • Automatic Translation of DVD subtitles • Sign-Language MT • Large-Scale Open Evaluation (inc. parallel computation) • New Language Pairs, Corpora etc. NCLT, Dublin, April 2007

  19. System Development NCLT, Dublin, April 2007

  20. Ongoing Work in DCU (cont’d) • Dependency- (and Semantically) Marked-Up Corpora • New models of Word Alignment • New integrated models of subtree/substring alignment • New dependency-based Evaluation metrics • New Decoders • EBMT • Memory-Based • Open-Source Components NCLT, Dublin, April 2007

  21. Ongoing Work in DCU (cont’d) Collaborative work: • Tilburg (Memory-based Decoding) • Donostia (Basque MT) • Aachen (Sign-Language MT) • Amsterdam (Integrating Syntax & SMT) • St. Andrew’s (DOT) • Edinburgh (SMT) • CMU (Hybrid SMT—EBMT) NCLT, Dublin, April 2007

  22. Future Work in DCU • Spoken Language Translation NCLT, Dublin, April 2007

  23. Future Work in DCU • MT via SMS • Automatic Interpreting • Enhanced hybrid models • Scalability • Tuning MT to text type & genre • MT using Pivot languages (‘triangulation’) • Better quality phrases (cf. CONLL monolingual chunking shared task) • … NCLT, Dublin, April 2007

  24. Future General Directions • Corpus Building (integrating syntax, semantics … discourse …) • cf. data size vs. data quality … • Filtering/pruning training data (‘safe’ alignments) • Word Alignment • Language Modelling • Decoding • Evaluation Methods • Large-scale Open Evaluations • Further Convergence between models NCLT, Dublin, April 2007

  25. Dekai Wu’s 3D MT Space NCLT, Dublin, April 2007

  26. Convergence between MT and Rest of NLP • For some time now not many MT researchers doing syntax and vice-versa. • With move (back) to trees instead of strings: • Reconnect with wealth of tree automata literature • Get lots of implemented algorithms for free! NCLT, Dublin, April 2007

  27. Concluding Remarks So … there’s plenty for us still to do! Two worries: • MT R&D seems to be at an all-time high, yet we’re not teaching MT any more. • Most (S)MT people come from different backgrounds, but huge danger that some people are merely reinventing the wheel … NCLT, Dublin, April 2007

  28. Thanks! The end beginning! NCLT, Dublin, April 2007

More Related