1 / 18

Josef van Genabith & Andy Way TransBooster (2003-2006)

Previous MT Work & GramLab. Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006-2008) GramLab (2001-2008). TransBooster. TransBooster (2003-2006) Enterprise Ireland funded Basic Research Project

regina
Télécharger la présentation

Josef van Genabith & Andy Way TransBooster (2003-2006)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Previous MT Work & GramLab • Josef van Genabith & Andy Way • TransBooster (2003-2006) • LaDEva: Labelled Dependency-Based MT Evaluation (2006-2008) • GramLab (2001-2008)

  2. TransBooster • TransBooster (2003-2006) • Enterprise Ireland funded Basic Research Project • PI: Josef van Genabith Col: Andy Way • Students: Bart Mellebeek, Anna Khasin, Karolina Owczarzak

  3. TransBooster • TransBooster Basic Idea: • MT systems are better on short (= simple) sentences than on longer ones. • Capitalise on this! • Divide up long sentences (automatically) into shorter components • Feed those components to MT system • Translate (get better results for shorter components) • Put (better) translations together in target (= get better translation) • A bit like Controlled Language, but automatic and without the restrictions (to particular syntax etc.)!

  4. TransBooster TransBooster Example

  5. TransBooster • Wrapper technology • Tricks MT system to produce better results …

  6. TransBooster • TransBooster needs • Good parsers • Head and argument/adjunct finding rules • TransBooster with • Rule-Based MT (Systran, Logomedia) • Example-Based MT (DCU system) • Statistical MT (standard Aachen PBSMT) • Multi-engine MT • Improves results! => full details Bart Mellebeek’s PhD & publications

  7. TransBooster Bart Mellebeeks PhD dissertation 2007

  8. LaDEva • LaDEva: Labelled Dependency Based Evaluation for MT (2005-2008) • Microsoft Ireland funded Basic Research Project • PIs: Josef van Genabith/Andy Way • Students: Karolina Owczarzak

  9. LaDEva • Basic Idea: • Automatic evaluation methods extremely important for MT • String-based MT evaluation (BLEU etc.) unfairly penalises perfectly valid • - lexical variation/paraphrases • - syntactic variation/paraphrases • Compare: • John resigned yesterday. • Yesterday, John quit. • Use labelled dependencies (instead of surface strings) for automatic evaluation

  10. LaDEva LaDEva example (syntactic variation): Use WordNet and PBSMT alignments for lexical variation …

  11. LaDEva • LaDEva needs • Very (!) robust dependency parsers that can parse MT output (as opposed to grammatical language) • DCU GramLab treebank-based LFG parsers • Microsoft Parsers • WordNet, PBSMT alignments • Evaluate LaDEva using • BLEU • NIST • GTM • Meteor • in terms of correlation with human judgments

  12. LaDEva

  13. LaDEva Karolina Owczarzak’s PhD thesis 2008

  14. GramLab • GramLab (2001 – 2008) • - Automatic Annotation of Penn-II Treenbank with LFG F-Structures (2001-2004) Enterprise Ireland funded Basic Research Project • Team: PI: Josef van Genabith, Col: Andy Way, Aoife Cahill, Mairead McCarthy, Mick Burke, Ruth O’Donovan • - GramLab: Chinese, Japanese, Arabic, Spanish, French, German, English(2004-2008) Science Foundation Ireland funded Principal Investigatorship • Team: PI: Josef van Genabith, Grzegorz Chrupala, Natalie Schluter, Ines Rehbein, Yuqing Guo, Masanori Oya, Amine Akrout, Dr. Aoife Cahill, Dr. Yaffa Al-Raheb, Dr. Deirdre Hogan, Dr. Sisay Adafre, Dr. Lamia Tounsi, Dr. Mohammed Attia

  15. GramLab • GramLab (2001 – 2008) • Basic Idea: • Handcrafting deep wide coverage grammars is time-consuming, expensive and difficult to scale to unrestricted text. • Acquire grammars automatically from treebanks => shallow grammars • New: acquire deep grammars automatically from treebanks

  16. GramLab • Shallow Grammar: defines language as set of strings and associates syntactic structure to string • Deep Grammar: shallow grammar + maps strings to information (meaning, dependencies, predicate argument structure – “who did what to whom”) + non-local dependency resolution

  17. GramLab

  18. GramLab • Probabilistic Parsing & Probabilistic Generation • Used in MT Evaluation (Karo), Question Answering System (Sisay) • Outperforms best hand-crafted resources (XLE, RASP) for English • Lots of publications, including 2 Computational Linguistics Journal Papers, 6 ACL, COLING, EMNLP Papers (2004-2008) • Aoife Cahill, Michael Burke, Ruth O'Donovan, Stefan Riezler, Josef van Genabith and Andy Way, Wide-Coverage Deep Statistical Parsing using Automatic Dependency Structure Annotation in Computational Linguistics, 2008 • Ruth O'Donovan, Michael Burke, Aoife Cahill, Josef van Genabith and Andy Way (2005) Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II and Penn-III Treebanks, Computational Linguistics, 2005 • Transfer-based probabilistic data-driven MT … (Yvette Graham) • LORG industry strength parsers and generators for IE/IR & QA (Jennifer & Deirdre)

More Related