1 / 25

Machine Translation

Machine Translation. Research Seminar on Software Business 21.5.2003 Antti Ilmo. Outline. Introduction Translation and Machine Translation Techniques The Early Machine Translation Systems Problems of Machine Translation Proposed Solutions to the Problems Summary. Introduction.

naava
Télécharger la présentation

Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Translation Research Seminar on Software Business 21.5.2003 Antti Ilmo

  2. Outline • Introduction • Translation and Machine Translation Techniques • The Early Machine Translation Systems • Problems of Machine Translation • Proposed Solutions to the Problems • Summary

  3. Introduction • The Internet and globalisation have increased the need for localization of documentation and interaction between different nationalities • Localization is expensive and time consuming • Machine Translation a potential solution • But…

  4. Introduction (2) • MT quality is not good enough • language works on many levels • interpretation • dictionary may tell a meaning, but not how it is interpreted • competence, experience and internal models of language users important • local usage etc. (Canadian French and French French) • translation may sound ”wrong” in a dialect • typos • syntactic errors occur

  5. Outline • Introduction • Translation and Machine Translation Techniques • The Early Machine Translation Systems • Problems of Machine Translation • Proposed Solutions to the Problems • Summary

  6. What is translation? • Preservation of the original text • stylistic and semantic characteristics • word-for-word • meaning-for-meaning • Rules of language • e.g. letters ”c”, ”a” and ”t” form a word only in the right order • Translation process (translating) and translation product (translated text) • translation concept consists of both of the above • Translator re-codes the message into a different language

  7. MT Technology • Machine Translation (MT) • machine takes care of translation process • Machine Aided Translation (MAT) • Machine-Assisted Human Translation (MAHT) • humans translate, machine assists • Human-Assisted Machine Translation (HAMT) • machine translates, humans assist • e.g. choosing a correct word from a dictionary • Terminology Databanks (TD) • technical terminology • most commonly used nowadays

  8. Linguistic Techniques • Direct vs. indirect • direct uses word replacement • indirect tries to express a meaning • Interlingua vs. transfer • Interlingua does not take into account variations in target languages • transfer approach uses language-specific meaning • local vs. global • local scope uses word-level analysis • global scope analyses sentences or even more

  9. Outline • Introduction • Translation and Machine Translation Techniques • The Early Machine Translation Systems • Problems of Machine Translation • Proposed Solutions to the Problems • Summary

  10. Early Systems (GAT) • Georgetown Automatic Translation • one of the earliest MT projects • development began in 1952, in use 1964-1979 • physics texts from Russian to English • replacement of words • no real linguistic theory • ”The spirit is willing, but the flesh is weak” translated to Russian and then back to English. The result: ”The wine is agreeable, but the meat has spoiled”

  11. Early Systems (CETA) • Centre d’Etudes pour la Traduction Automatique • launched in 1961 in Grenoble • in use 1967-71 • approximately 400,000 words translated • Russian to French • sentence based analysis • Interlingua and transfer mixed • grammatical level vs. dictionary level • Realization: Interlingua approach not a good one

  12. Early Systems (SYSTRAN) • one of the first systems marketed • installed in 1970 (US Air Force Foreign Technology Division) • used also at NASA and EURATOM • semantic features ad hoc • negative feedback at first • post-editing found to be a good approach • GM of Canada claimed the system speeded up the work of human translators three to four times (3000-4000 words a day, approximately the same a human translator now translates with the help of translation workbenches)

  13. Early Systems (TAUM-METEO) • TAUM-METEOwas the first truly automatic MT system • developed in 1960’s • used by Canadian Meteorological Center • scanned network for English weather reports and translated them to French • corrected its own errors without post-editors • forwarded offending content to human translators • 24,000 words/day • problems • communication noise • misspellings • words missing from the dictionary • specialised language made translations possible

  14. Outline • Introduction • Translation and Machine Translation Techniques • The Early Machine Translation Systems • Problems of Machine Translation • Proposed Solutions to the Problems • Summary

  15. Problems • Translation is not straightforward • it is not replacing words for words • word orders • rewriting of text into another language • choosing the right words • e.g. imperative mood in English infinitive in French

  16. Problems (2) • Automation of translation not easy • quality is poor • homographs • ”fan” a ventilator or an enthusiast • different word classes • e.g. ”love” both a verb and a noun • ”you” can be both singular and plural • idioms • e.g. ”country music” meaning type of music • personal pronouns • second person pronouns may vary in familiar and formal situations • also post-editing can take more time than translating from a scratch

  17. Problems (3) • Morphological analysis • e.g. Chinese and Japanese do not use punctuations • sentences are not separated by anything • Syntactic analysis • modifiers a problem • ”The boy saw a girl with a telescope” • the girl had a telescope vs. the boy used a telescope to see a girl • Analysis of context • 20-40 words in a sentence • 100 million possible translations • There are always going to be problem cases

  18. Outline • Introduction • Translation and Machine Translation Techniques • The Early Machine Translation Systems • Problems of Machine Translation • Proposed Solutions to the Problems • Summary

  19. AI-Based Approach • Raman & Alwar 1990 • Conversations carried out across enquiry counters on railway stations in India • System should understand a text before translating it • analysis of text to understand the meaning and storing it in a language-free semantic map • semantic maps used to generate translations • Analyzer analyses one sentence at a time • unnecessary adjectives not taken into account • morphological analysis first • building of semantic map second • stages work concurrently • large dictionary needed

  20. AI-Based Approach (2) • Natural language generator builds a sentence in target language • analyzer’s result fed into the generator • translate everything vs. leave something out • definition of structure • words in right order and inflected correctly • minimal importance to style • Successful in specific application and a restricted set of sentences

  21. Interactive Approach • Sen, Zhaoxiong and Heyan 1997 • Knowledge of MT systems incomplete -> incorrect translations • Possibility for an MT system to learn • quality should improve • Interaction starts when a sentence is found that the system cannot analyse properly • message to the user • user responds with a coded message • updates systems knowledge base • interaction limited to three stages • lexical analysis • uncertain modifiers • multiple translations

  22. Multiple Translation Engines & Sentence Partitioning • Ren, Shi and Kuroiwa 2000 • Multiple MT systems running in parallel • all use different MT techniques • controller coordinates translating • each engine translates a sentence indepedently • controller chooses the best translation • no proper translations leads to sentence partitioning • process starts from beginning • in the end the partitioned sentence is put back together

  23. Multiple Translation Engines & Sentence Partitioning (2) • Parallel processing should improve success rate • correct translation preserved through procedures • combining the best translations should improve quality • Morphological analysis • analysis gives results that are used as inpupts for the engines • engines are then ran on parallel • if more than one result amount of engines increase • if no results sentence is partitioned • problem of partitioning a sentence e.g. Chinese & Japanese • In a test situation with four engines the results improved dramatically • consumed time doubled • 1 MT system translated 45.6 % of sentences correctly with multiple engines the result was 74.2 % (Japanese to Chinese)

  24. Outline • Introduction • Translation and Machine Translation Techniques • The Early Machine Translation Systems • Problems of Machine Translation • Proposed Solutions to the Problems • Summary

  25. Summary • Definite solution is still to be found • Biggest problems of MT are linguistic • it is very hard to cover all the rules and adjust them to all possible languages and variations • misspellings cause problems which means a very good proof-reading function is needed • There is a long way to go before MT systems replace human translators • Machine Translation can be used in applications where the language is very specific

More Related