1 / 12

11-734 Advanced MT Seminar Spring 2008

11-734 Advanced MT Seminar Spring 2008. Instructors: Alon Lavie and Stephan Vogel. Course Objectives. Objective: Study and review in depth a selection of important research topics in current state-of-the-art MT Main Focus: Data-driven search-based MT approaches

marina
Télécharger la présentation

11-734 Advanced MT Seminar Spring 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 11-734Advanced MT SeminarSpring 2008 Instructors: Alon Lavie and Stephan Vogel

  2. Course Objectives • Objective: Study and review in depth a selection of important research topics in current state-of-the-art MT • Main Focus: Data-driven search-based MT approaches • MT resources are primarily acquired automatically from large volumes of monolingual and bilingual corpora • Translation process is framed as a computational search optimization problem, driven by various statistical “models” and ML-based features • Other important or related topics may also be explored

  3. Course Format • Course Format: Graduate-level Seminar • Stephan and Alon will present a few introductory lectures • Students will present and lead the remaining lectures and discussions • Individual Student Tasks: • Select and define a specific research topic • Identify 1-2 basic research papers (for everyone to read) • Conduct a broad literature review of the topic • Prepare a class presentation on the topic and lead class discussion • Write a 10-15 page literature review “white paper” on current state of the research topic and on its future directions

  4. Course Format • Requirements and Expectations: • 1-2 basic readings for each topic should be announced at least one week in advance • Everyone is expected to attend all class meetings, read the 1-2 basic papers before class, prepare questions • Student Presentations: present an overview of the topic in class (not just the basic papers) and lead the discussion about important issues, open research questions, future directions, etc. • White Papers will be due towards the end of the semester • Grading: • 40% Presentation • 40% White Paper • 20% Class Participation

  5. Preliminary List of Topics • Models and Approaches for Word, Phrase and Structure Alignment: • Hierarchical Alignment Models: ITG-style, Hiero-style, Syntax-based models: tree-to-string, string-to-tree, tree-to-tree • Discriminative Alignment Models • Constrained Alignment Models • Methods for Phrase Extraction from word-aligned parallel data • Methods for Rule Extraction from parsed and word aligned parallel data • Word Reordering Models: • Word and phrase-based, POS-based, syntax-based

  6. Preliminary List of Topics • Search-based Decoding: • Basic decoding algorithms, computational complexity and efficiency issues • Decoders for various “flavors” of data-driven search-based MT • Optimization issues, monotonicy, pruning, hypothesis re-combination • Language Modeling for MT: • Very large scale statistical LMs: technical challenges and solutions • Domain and Genre adaptation • Syntactic LMs • Discriminative LMs, Factored LMs, “unconventional” approaches • Architecture and Design of Large-scale MT systems: • Training methods and tools • MERT and parameter tuning • Runtime architectures, online vs. offline systems

  7. Preliminary List of Topics • Morphology and Word Segmentation and their integration within MT: • Morphological analysis and generation tools • Integrating morphological processing within MT • Input segmentation issues, ambiguity and confusion networks • Multi-Engine MT and System Combination Approaches • MT Evaluation: • Automatic metrics for MT evaluation; methods for assessing MT eval metrics, strengths and weaknesses • Human evaluation, Subjective and Objective metrics, Confidence scores • Evaluation campaigns and how they are conducted • Online Translation Services and how they work: • Google, Babelfish, MS Word tools, instant messaging

  8. Tentative Schedule • Jan 16: Organization + Stephan: Basic Word Alignment Models • Jan 23: Stephan: Word Alignment Models, Phrase Extraction methods • Jan 30: Stephan and/or Alon: TBD (Decoding basics? MT Evaluation?) • Feb 6: Student #1 • Feb 13: Student #2 • Feb 20: NO CLASS (Stephan and Alon away) • Feb 27: Student #3 • Mar 5: Student #4 • Mar 12: NO CLASS (Spring Break) • Mar 19: Student #5 • Mar 26: Student #6 • Apr 2: Student #7 • Apr 9: NO CLASS (GALE PI Meeting) • Apr 16: Student #8 • Apr 23: Student #9 • Apr 30: Student #10

  9. Task #1 • By next week’s class meeting (Wed 1/23): • Select a research topic • Write a one-page description that outlines and scopes your selected research topic, and lists 1-2 basic readings on the topic • Email Alon and Stephan your one-page description, plus three preferred presentation dates • Act Fast! We will coordinate topic selections and presentation date preferences primarily by logical order and by receipt time

  10. Students and Topics • Abhaya Agarwal: Discriminative Methods for Training Translation Models • Aaron Phillips: Methods for Context Incorporation in MT 3/05 2/27 3/19 • Jason Adams: WSD and its Integration within MT 3/26 2/27 2/20 • Alok Parlikar: Phrase-based SMT and Solutions to ‘Out of Order’ Problem 3/19 3/05 2/27 • Amr Ahmed: Syntax-based Machine Translation Models 3/26 4/02 4/16 • Eric Davis: Morphology and Segmentation Issues in MT 2/06 2/13 2/27 • Greg Hanneman: Towards Syntactically-Constrained Statistical Word Alignment 4/16 4/23 4/30 • Linh Nguyen: Morphology and Word Segmentation and their integration within MT 3/05 or later • Qin Gao: Large Scale Architecture for MT Systems 3/05 2/27 4/02 • Vamshi Ambati: Dependency Structures in Syntax oriented Machine Translation 3/19 3/26 4/02 • Rashmi Gangadharaiah:Factored and Syntactic Language models 4/02 3/19 3/05

  11. Proposed Schedule • Jan 16: Organization + Stephan: Basic Word Alignment Models • Jan 23: Stephan: Word Alignment Models, Phrase Extraction methods • Jan 30: Stephan: Decoding basics • Feb 6: Student #1: Eric Davis – Morphology and/or Segmentation • Feb 13: Student #2: Linh Nguyen – Morphology and/or Segmentation • Feb 20: NO CLASS (Stephan and Alon away) • Feb 27: Student #3: Jason Adams: WSD in MT • Mar 5: Student #4: Aaron Phillips – Incorporating Context in MT • Mar 12: NO CLASS (Spring Break) • Mar 19: Student #5: Alok Parlikar – Reordering in Phrase-based SMT • Mar 26: Student #6: Amr Ahmed – Syntax-based Models and their training • Apr 2: Student #7: Vamshi Ambati – Dependency Structures in MT • Apr 9: NO CLASS (GALE PI Meeting) • Apr 16: Student #8: Rashmi – Factored and Syntax-based LMs • Apr 23: Student #9: Greg Hanneman – Syntactically-constrained WA • Apr 30: Student #10: Qin Gao – Large-scale MT Architectures • May 7: Student #11: Abhaya Agarwal - Discriminative Training Methods

  12. MT Lunch Slots • Currently held reservations (all on Tuesdays): • Feb 19 (Alon and Stephan away) • Mar 18 (12:30-2:00) • Apr 22 (12:30-2:00) • May 20 • Jun 17

More Related