100 likes | 204 Vues
Explore the recent history and current research in Machine Translation at NCLT Dublin, covering hybrid approaches, online systems, evaluation metrics, and more. Collaboration with international partners and future work on scalability, SMS translation, and quality improvements are highlighted.
E N D
MT in the NCLT • Andy Way • NCLT, School of Computing, • Dublin City University, • Dublin 9, Ireland • away@computing.dcu.ie • www.nclt.dcu.ie/mt/
MT in the NCLT: Recent History • Marker-Based EBMT • [Nano Gough, PhD 2005] • Computational Linguistics 2003 • NLE 2005; Machine Translation 2005 • AMTA 02, MT Summit 03; TMI 04, EAMT 04 … • Data-Oriented Translation • [Mary Hearne, PhD 2005] • MT Summit 03, COLING 04, IJCNLP 04, EAMT 05, EAMT 06 … • Hybrid Approaches (EBMT & SMT) • [Declan Groves, PhD 2007] • Machine Translation 2006 • ACL 05, EAMT 06, …
MT in the NCLT: Recent History • Improving Online MT Systems (TransBooster) • [Bart Mellebeek, PhD 2007] • [Karolina Owczarzak] • MT Summit 05, AMTA 06, EAMT 05, 06 … • Automatic Translation of DVD subtitles • [Steve Armstrong, MSc 2007] • [Other students’ ongoing PhD work in SALIS] • Perspectives 06 • ASLIB 06 …
Current Research • Hybrid MT (MaTrEx) • Nicolas Stroppa et al. • AMTA 06, OpenLab 06, IWSLT 06, NIST 06, MT Summit 07 • Dependency-Based Automatic Evaluation Metrics • Karolina Owczarzak, Josef Van Genabith • MT Summit 07, Workshops at NAACL 07, ACL 07 • Integrating Syntax into SMT (Using Supertags) • Hany Hassan [& Khalil Sima’an] • IEEE SLT 06, ACL 07 … • Sign Language MT • Sara Morrissey [& RWTH Aachen] • MT Summit 05, LREC 06, MT Summit 07 …
Current Research • Word and Phrase Alignment in SMT • Yanjun Ma, Nicolas Stroppa • ACL 07 … • Sub-Tree Alignment • John Tinsley, Ventzi Zhechev, Mary Hearne • MT Summit 07 … • Parameter Estimation in MT • John Tinsley, Ventzi Zhechev, Mary Hearne [& Khalil Sima’an] • Constraint-Based MT • Yvette Graham, Josef Van Genabith
Language Pairs • FrenchEnglish (EBMT) • EnglishGerman (EBMT) • SpanishEnglish (SMT, Hybrid) • SpanishBasque (Hybrid) • ChineseEnglish (SMT, EBMT) • ArabicEnglish (SMT, Hybrid) • ItalianEnglish (Hybrid) • JapaneseEnglish (EBMT, Hybrid) • DutchEnglish (Hybrid, SMT) • Sign LanguageEnglish (Hybrid) • …
Collaboration • Tilburg (Memory-based Decoding) • Donostia (Basque MT) • Aachen (Sign-Language MT) • Amsterdam (Integrating Syntax & SMT) • Edinburgh (SMT) • CMU (Hybrid SMT—EBMT) • Toshiba Beijing (Chinese MT) • …
Future Work • MT via SMS • Automatic Interpreting • Enhanced hybrid models • Scalability • Tuning MT to text type & genre • MT using Pivot languages • Better quality phrases (cf. CONLL monolingual chunking shared task) • …
Current and Future Funding • Irish Government Sources • Science Foundation Ireland • Enterprise Ireland • IRCSET • Companies • IBM • Microsoft • Under Review • EU STREP (MT for Minority Languages) • UPC, FBK-IRST, Edinburgh … • SFI CSET in Next Generation Localisation • TCD, UCD, UL, IBM, Microsoft, Symantec …