1 / 26

Machine Translation

Machine Translation. Surma Mukhopadhyay 29 th March, 2007. Followed By:-. The Soldiers are in the Coffee – An Introduction to Machine Translation By Marieke Napier October Cultivate Interactive, issue 2, 16, 2000 & Language Technology Machine Translation

arnie
Télécharger la présentation

Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Translation Surma Mukhopadhyay 29th March, 2007

  2. Followed By:- • The Soldiers are in the Coffee – An Introduction to Machine Translation By Marieke Napier October Cultivate Interactive, issue 2, 16, 2000& • Language Technology Machine Translation (From the course Material COMP248) By Rolf Schweitzer Department of Computing,Macquarie University, NSW 2109, Australia

  3. Introduction • Though research in Machine Translation (MT) has already celebrated its fiftieth birthday, understanding of its successes is still minimal • The increase in availability of Machine Translation software due to the globalization of the Internet has had little impact. • User's knowledge of the complexities behind translating remains limited and judgments are based on one off personal experiences.

  4. What is Machine Translation? • The European Association for Machine Translation gives the following definition for MT: "Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another "

  5. Types of Machine Translation Unassisted Machine Translation: • Unassisted MT takes pieces of text and translates them into output for immediate use with no human involvement • The result is unpolished text and gives only a gist of the source, hence the term 'gisting' • The ultimate aim of this type of MT is sometimes known as Fully Automatic High Quality Translation (FAHQT), perfect translation created solely by a computer • Examples of this form of MT include IBM alphaworks native search, Babel Fish 2020 , Worldlingo and Dragon systems

  6. Types of Machine Translation Assisted machine Translation: • Assisted MT uses a human translator to clean up after, and sometimes before, translation in order to get better quality results. • Usually the process is improved by limiting the vocabulary through use of a dictionary and the types of sentences/grammar allowed. • The use of a 'controlled language' has been fairly successful. Some systems have also been set up to learn from corrections.

  7. Assisted Machine Translation • Assisted MT can be divided into Human Aided Machine Translation (HAMT), a machine that uses human help, and Machine Aided Human Translation (MAHT). • Computer Aided Translation (CAT) is a more recent form of MAHT.

  8. Natural Language Processing • Another area of MT that is worth mentioning here is Natural Language Processing (NLP) • NLP parses sentences and determines their underlying meaning in order for databases to answer SQL queries entered in the form of a question • For further information on the structure of MT systems see the recent special report on the future of translation featured in ‘Wired’ magazine (www.wired.com)

  9. Transfer Component & Interlingua

  10. Concept of Transfer Component • The structure of MT systems can vary but all use some sort of transfer component. • This component is specialized so that a pair of languages can produce a target sentence. • The transfer component has a correspondence lexicon, which is a comprehensive list of the source-language patterns and phrases mapped to a target language. • Some MT systems use systematic transfer systems, which apply software parsers to analyses the source language sentences. • This type of transfer system means that for every two languages that translation is required between a new a correspondence lexicon must be created.

  11. Concept of Interlingua • An alternate to the transfer component is an Interlingua, a type of intermediate language • A translation is made from the source language into the Interlingua and then into the target language • The benefits of using an Interlingua are that only one part is required for each language and therefore further languages can be added easily

  12. Why Machine Translation is Difficult? • A single word can have more than one meaning • Lexical gaps: single-word concepts with no simple translation • Idioms • Different languages use different syntactic structures • Some syntactic structures are not possible in some languages

  13. Why Machine Translation is Difficult? • We need to find the correct interpretation • Literal translation does not produce fluent text • Literal translation does not preserve semantic information • Literal translation does not preserve pragmatic information

  14. Various approaches to MT

  15. Direct Machine Translation: • Word-for-word substitution with some local adjustment. • Transfer-based MT • Analysis of source into a syntactic structure representation • Transfer of that representation into the target structure, • Synthesis of the output from that structure. • Interlingua-based MT • Analysis of source into an abstract meaning representation, • Generating target language from this interlingua.

  16. Transfer Based Machine Translation • Transfer-based MT needs n(n-1) transfer modules for n languages. • If the transfer modules are bidirectional, then [n(n-1)]/2.

  17. Example of Transferred Based MT

  18. Interlingua Based Machine Translation • For n languages, only n language analyzer/generator are needed. • Problem: Different languages "carve the world up" differently.

  19. Translation Memory • Translation memory software stores matching source and target language segments that were translated by a translator in a database for future reuse • Newly encountered source language segments are compared to the database content, and the resulting output (exact, fuzzy or no match) is reviewed by the translator

  20. KANTOO • KANTOO is a interlingua-based MT system • KANTOO is designed for multilingual document production • KANTOO includes modules for source language analysis target language generation source terminology management target terminology management knowledge source development

  21. KANTOO Architecture

  22. KANTOO~ Some Features • Controlled language checker which is used for vocabulary and grammar checking in each document. • Batch translator is actually an analyzer and generator, utilized as standalone batch servers. • Knowledge maintenance tool is a graphical user interface which allows developers to test their knowledge changes in the context of a complete working system. • Knowledge server provides network access to a version controlled repository.

  23. Performance of an Analyzer in KANTOO Analyzer performs • tokenization • morphological processing • lexical lookup • syntactic parsing with a unification grammar • semantic interpretation yielding one or more interlingua expression for each valid sentence

  24. Performance of a Generator in KAANTOO Generator performs • lexical selection • structural mapping • syntactic generation • morphological realization for a target language

  25. More features of KAANTOO • Lexical maintenance tool is used by domain experts to maintain source terminology • Language translation database is used by translators to create target translations of new source terminology

  26. CONCLUSION • The future of MT remains uncertain but with the growth of international trade and the continuing increase in use of MT technologies on the Web, things are looking up. It is expected that more MT products will come to market than ever before and a larger number of languages can be tackled.

More Related