1 / 17

CS 479, section 1: Natural Language Processing

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 479, section 1: Natural Language Processing. Lecture #33: Intro. To Machine Translation. Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the

diem
Télécharger la présentation

CS 479, section 1: Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 479, section 1:Natural Language Processing Lecture #33: Intro. To Machine Translation Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture.

  2. Announcements • Reading Report #13: M&S ch. 13 on alignment and MT • Due now; discussing at end of lecture today or on the group • Homework 0.3 Feedback • Question one did not contribute to your grade • Compare with the key • Homework 0.4 • Posted Tuesday

  3. Final Project • Project #4 • Note the updates to the tutorial with the flowchart slides from lecture #29 • Project #5 • Instructions to be updated today • Help session: Tuesday • Propose-your-own • Move forward • Feedback to be sent today • Project Report: • Early: Wednesday after Thanksgiving • Due: Friday after Thanksgiving • Check the schedule • Plan enough time to succeed!

  4. Quiz – keep the ideas fresh • What are the four steps of the Expectation Maximization (EM) algorithm? • Think of the document clustering example, if that helps • What is the primary purpose of EM?

  5. Objectives • Introduce the problem of machine translation • Appreciate the need for alignment in statistical approaches to translation

  6. Machine Translation is Hard According to the data provided today by the Ministry of Foreign Trade and Economic Cooperation, as of November this year, China has actually utilized 46.959 billion US dollars of foreign capital, including 40.007 billion US dollars of direct investment from foreign businessmen. REF: IBM4: the Ministry of Foreign Trade and Economic Cooperation, including foreign direct investment 40.007 billion US dollars today provide data include that year to November china actually using foreign 46.959 billion US dollars and Yamada & Knight: today’s available data of the Ministry of Foreign Trade and Economic Cooperation shows that china’s actual utilization of November this year will include 40.007 billion US dollars for the foreign direct investment among 46.959 billion US dollars in foreign capital

  7. But MT is Real http://www.microsofttranslator.com/ http://translate.google.com/

  8. Why so hard? • What makes translation so hard?

  9. History • 1950’s: Intensive research activity in MT • Roll video …

  10. History • 1950’s: Intensive research activity in MT • Roll video … • 1960’s: Direct word-for-word replacement • 1966 (ALPAC): NRC Report on MT • Conclusion: MT no longer worthy of serious scientific investigation. • 1966-1975: “Recovery period” • 1975-1985: Resurgence (Europe, Japan) • 1985-present: Gradual Resurgence (US)

  11. How? • How would you implement automatic translation on a computer?

  12. Big Idea: Word Alignment • Start with parallel corpora • Learn word alignment • Hidden variable: alignment from foreign (target) word to source word. • Use EM!

  13. Vauquois Triangle Interlingua Semantic Composition Semantic Decomposition Semantic Structure Semantic Structure Semantic Analysis Semantic Generation Semantic Transfer Syntactic Structure Syntactic Structure Syntactic Transfer Syntactic Generation Syntactic Analysis Word Structure Word Structure Direct Morphological Generation Morphological Analysis Target Text Source Text

  14. Methods • Rule-based Methods • Expert system-like rewrite systems • Lexicons constructed by people • Can be very fast, and can accumulate a lot of knowledge over time • e.g., SysTran – the engine behind the venerable Babelfish • Statistical Methods • Word-to-word translation • Phrase-based translation • Syntax-based translation (tree-to-tree, tree-to-string, etc.) • Trained on parallel corpora • Usually noisy-channel (at least in spirit), but increasingly direct

  15. Your Questions • Take the discussion online

  16. To be continued …

More Related