290 likes | 499 Vues
Current Trends in MT. Andy Way NCLT, School of Computing, Dublin City University, Dublin 9, Ireland away@computing.dcu.ie www.nclt.dcu.ie/mt/. Overview of Talk. Current Trends From EACL-06 to ACL-07 Topics Country of Origin Ongoing and Future Work at DCU Other Important Research
E N D
Current Trends in MT • Andy Way • NCLT, School of Computing, • Dublin City University, • Dublin 9, Ireland • away@computing.dcu.ie • www.nclt.dcu.ie/mt/
Overview of Talk • Current Trends • From EACL-06 to ACL-07 • Topics • Country of Origin • Ongoing and Future Work at DCU • Other Important Research • Future General Directions • Increased convergence within MT • Increased convergence between MT and rest of NLP • Concluding Remarks NCLT, Dublin, April 2007
Current Trends EACL-06 MT Track featured 24 papers in a number of areas: NCLT, Dublin, April 2007
Current Trends: Country of Origin • Of the 24 MT papers: • 18 (75%) were from Europe • 6 from UK • 6 from Spain • 3 from Germany • 1 each from Romania, Italy & Ireland • 6 (25%) were from N. America (5 from USA) • 0 were from Asia NCLT, Dublin, April 2007
Current Trends: Success Rates (by Country) • Of the 24 MT papers, 7 (29%) were accepted (general EACL acceptance rate 19.7%: 52/264) • 2 from USA (out 0f 5) • 2 from Germany (out of 3) • 1 from UK (out of 6) • 1 from Romania (out of 1) • 1 from Canada (out of 1) NCLT, Dublin, April 2007
Current Trends: Success Rates (by Topic) • Of the 7 accepted MT papers • 2 were on SMT (out of 8) • 2 were on word alignment (out of 4) • 2 were on evaluation (out of 5) • 1 was on hybrid MT (out of 1) NCLT, Dublin, April 2007
Current Trends ACL-07 MT Track features 67 papers in a number of areas: NCLT, Dublin, April 2007
Current Trends ACL-07 SMT Track features 29 papers in a number of areas: NCLT, Dublin, April 2007
Current Trends: Summary of Themes • Of the 67 MT papers: • 54 (80%) involve corpus-based MT • 9 (13%) involve evaluation • 3 (4%) involve RBMT NCLT, Dublin, April 2007
Current Trends: Country of Origin • Of the 67 MT papers: • 32 (48%) are from Asia • 19 (28%) are from N. America (18 from USA) • 16 (24%) are from Europe NCLT, Dublin, April 2007
Current Trends: Country of Origin Of the 32 papers from Asia: NCLT, Dublin, April 2007
Current Trends: Country of Origin Of the 16 papers from Europe: NCLT, Dublin, April 2007
Change 06—07 (by Topic) NCLT, Dublin, April 2007
Change 06—07 (by Country) NCLT, Dublin, April 2007
Current Trends: Success Rates (by Country) • Of the 67 MT papers, 17 were accepted accepted (25.4%; overall acceptance rate 22.4%) from the following countries: • USA: 8 (out of 18) • China: 3 (out of 20) • Ireland: 2 (out of 3) • UK: 2 (out of 2) • Canada: 1 (out of 1) • Singapore: 1 (out of 1) NCLT, Dublin, April 2007
Current Trends: Success Rates (by Topic) • Of the 17 successful MT papers: • 3 were on language modelling/decoding • 2 were on evaluation • 2 were on word alignment • 2 were on reordering • 1 was on word-sense disambiguation • 1 was on treestring models • 1 was on SMT via pivot languages • 1 was on multi-parallel corpora • 1 was on hybrid MT • 1 was on transductive learning NCLT, Dublin, April 2007
Consequences of these Trends • The ‘system’ is at breaking point • Do we need a pre-selection phase? • As in many other areas, a ‘new world order’ is emerging • There is very little internal QA as yet • Standard of English and basic structure is lacking • But … they’re doing OK already, and they’ll improve! • Relatively few ‘world centres’ in MT at present • Despite massive increase in MT use, big decrease in teaching of MT – paradox! NCLT, Dublin, April 2007
Ongoing Work in DCU • Integrating Syntax into SMT • Supertag translation and target language models • Adding source language information • Tree-to-Tree Translation (DOT, LFG-DOT: also treestring models), inc. porting monolingual parsing techniques to the bilingual case • Applications • Automatic Translation of DVD subtitles • Sign-Language MT • Large-Scale Open Evaluation (inc. parallel computation) • New Language Pairs, Corpora etc. NCLT, Dublin, April 2007
System Development NCLT, Dublin, April 2007
Ongoing Work in DCU (cont’d) • Dependency- (and Semantically) Marked-Up Corpora • New models of Word Alignment • New integrated models of subtree/substring alignment • New dependency-based Evaluation metrics • New Decoders • EBMT • Memory-Based • Open-Source Components NCLT, Dublin, April 2007
Ongoing Work in DCU (cont’d) Collaborative work: • Tilburg (Memory-based Decoding) • Donostia (Basque MT) • Aachen (Sign-Language MT) • Amsterdam (Integrating Syntax & SMT) • St. Andrew’s (DOT) • Edinburgh (SMT) • CMU (Hybrid SMT—EBMT) NCLT, Dublin, April 2007
Future Work in DCU • Spoken Language Translation NCLT, Dublin, April 2007
Future Work in DCU • MT via SMS • Automatic Interpreting • Enhanced hybrid models • Scalability • Tuning MT to text type & genre • MT using Pivot languages (‘triangulation’) • Better quality phrases (cf. CONLL monolingual chunking shared task) • … NCLT, Dublin, April 2007
Future General Directions • Corpus Building (integrating syntax, semantics … discourse …) • cf. data size vs. data quality … • Filtering/pruning training data (‘safe’ alignments) • Word Alignment • Language Modelling • Decoding • Evaluation Methods • Large-scale Open Evaluations • Further Convergence between models NCLT, Dublin, April 2007
Dekai Wu’s 3D MT Space NCLT, Dublin, April 2007
Convergence between MT and Rest of NLP • For some time now not many MT researchers doing syntax and vice-versa. • With move (back) to trees instead of strings: • Reconnect with wealth of tree automata literature • Get lots of implemented algorithms for free! NCLT, Dublin, April 2007
Concluding Remarks So … there’s plenty for us still to do! Two worries: • MT R&D seems to be at an all-time high, yet we’re not teaching MT any more. • Most (S)MT people come from different backgrounds, but huge danger that some people are merely reinventing the wheel … NCLT, Dublin, April 2007
Thanks! The end beginning! NCLT, Dublin, April 2007