1 / 54

Chapter 21: Machine Translation

Chapter 21: Machine Translation. Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran. What is MT?. Machine Translation (MT) means translation using computers. Machine-aided human translation (MAHT) Human-aided machine translation (HAMT) Fully automated machine translation (FAMT)

caradoc
Télécharger la présentation

Chapter 21: Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 21: Machine Translation Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran

  2. What is MT? • Machine Translation (MT) means translation using computers. • Machine-aided human translation (MAHT) • Human-aided machine translation (HAMT) • Fully automated machine translation (FAMT) • Fully human translation

  3. Some definitions • “Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another.” EAMT • “…Machine Translation (MT) as it is generally known --- the attempt to automate all, or part of the process of translating from one human language to another.” Arnold D J. MACHINE TRANSLATION: An Introductory Guide • “…presumably means going by algorithm from machine-readable source text to useful target text, without recourse to human translation or editing." ALPAC report, 1966

  4. An Example Translation between Chinese & English

  5. Different tasks with MT • Tasks which rough translation is adequate • Tasks where a human post-editor is used • Tasks limited to small sublanguage domains in which fully automatic high quality translation (FAHQT) is still achievable • Tasks with Software Localization …

  6. Machine Translation History • 1946-1954: Optimistic attitude towards the new technologies in MT • 1949: Informal Memorandum • Word-to-word translation especially Russian-English • 1954: The demonstration of the Georgetown University • Vocabulary: 250 words, Grammar: 6 rules, Corpus: a few simple Russian sentences

  7. Machine Translation History • 1954-1966: Criticism on the subject of MT • 1966 ALPAC-Report (Automatic Language Processing Advisory Committee) • MT is slower, not very reliable and twice as expensive as human translation

  8. Machine Translation History • 1966-1975: Revision of the aims and goals of MT • Definition of more realistic goals • Limitation of the research to technical languages • Syntactical analysis of the source text • Development of different translation strategies

  9. Machine Translation History • 1975-1989 ±: Increasing interest and promotion for MT • Rapid increase of the demand for translations • Improvements in hard- and software • The use of artifical intelligence methodes is now possilbe

  10. Machine Translation History • 1990-2000 • Development of comercial products based on personal computers • Specialized supplementary information (medicine, law, economics...) • Translation of spoken language (VERBMOBIL)

  11. Machine Translation History • 2000-Now • Statistical Approaches and Hybrid Models • Google Translation Engine ( http://translate.google.com ) • Yearly MT Official Evaluation race ( http://www.nist.gov ) • Automated MT Evaluation (NIST, BLEU)

  12. Machine Translation History

  13. What happened between ALPAC and Now? • Need for MT and other NLP applications confirmed • Change in expectations • Computers have become faster, more powerful • WWW • Political state of the world • Maturation of Linguistics • Development of hybrid statistical/symbolic approaches

  14. Language Similarities or Differences • Universal: some aspects which is true for every language • Every Language has words referring to people, or every language has nouns or verbs • Typology: Study of systematic cross-linguistics similarities and differences • Morphology Aspects: • isolating Vs. Polysynthetic • Agglutinative Vs. fusion • Syntactical Aspects: • SVO , SOV or VSO • Syntactical-Morphological Aspects: • Head-Marking Vcs. Dependent-marking • Specific differences: Date Format and Standards, verb tense differences, • Lexical Differences : Different scenes

  15. Lexical Differences English: leg, foot, paw French: etape, patte, jambe, pied

  16. Different Machine Translation Systems • Rule-based • Statistical Approaches • Hybrid Systems (Using Statistical approach in an Rule-based Architecture or … )

  17. Three MT Approaches: Direct, Transfer, Interlingua

  18. Machine Translation Architectures • Direct architecture • Direct architecture was used for most MT systems of the first generation • there are no intermediate stages in the process of translation

  19. Direct Architecture, 4 Steps

  20. Machine Translation Architectures • Characteristics of direct MT systems:  • no complex linguistic theories or parsing strategy • make use of syntactic, semantic and lexical similarities between the source and the target-language • based on a single language pair • direct MT systems are ´robust`, they even translate sentences with incomplete information • dictionaries are the most important components of the direct MT systems

  21. Machine Translation Architectures • Transfer architecture • Itconsists of three separate stages: • analysis • Transfer (Syntactical or Lexical) • synthesis/generation

  22. Transfer Architecture,

  23. Transfer Example: eng->SpanishMary did not slap the green witch

  24. Transfer: English->Japanese

  25. Some Examples

  26. Persian Example • I ate the apple من سیب را خوردم • VP  V NP  VP  NP RA V • I asked the man من از سیب خوردم • VP  V NP  VP  AZ NP V

  27. Machine Translation Architectures • Characteristics of transfer MT systems: • consist of complete linguistic conceptions, not only single grammatical or syntactic rules • the analysis and generation components can be used again for further language pairs, if the components are exactly separated • the dictionaries of the transfer MT systems are also separated

  28. Machine Translation Architectures • Interlingua architecture • The interlingua system consists of two stages:  • The source text is analysed into an interlingual representation from which the text of the target language will be directly generated • Semantic Analyzer

  29. Interlingua Architecture

  30. Machine Translation Architectures • Interlingua architecture: • Advantage: • The interlingua representation can be used for any other language • Disadvantage: • It is difficult to create language-independent representations

  31. Statistical Approaches

  32. Statistical Approach • 3 stages: • Language model P(E) • Translation model P(F|E) • Decoder

  33. SYSTRAN • Developed in the late 1950s by Peter Toma • Initial system for Russian-English translations • Later adapted for US Air Force and NASA • Adaptation for other languages • Important because it had a big influence on many Japanese MT systems

  34. SYSTRAN • Rule-based System • Using finite state grammar (ATN) • Using a large knowledge-base • Working on 23 languages specially UE languages • Customers: AltaVista, Lycos, AOL, Compuserve, Terra, Google, Apple و...

  35. AppTek TranSphere ® • Rule-based System • Using LFG (Lexical Functional Grammar) • Analyze the semantic, morphological and syntactic structures in English and produce their equivalents in the target language • Utilize a general-purpose lexicon in addition to special domain micro-dictionaries • Translate English to Arabic, Korean, Chinese, Turkish, Persian/Dari and Pashto-English • Bi-Translate French, German, Italian, Portuguese, Russian, Spanish, Ukrainian, Hebrew and Dutch

  36. MÉTÉO • Development of an English-French translation system by the TAUM Group to cope with the bilingual policy of the Canadian government • 1975 Contract to develop a system to translate public weather forecasts • 1984 Development of Météo 2 • This program proved to be more reliable, faster and more cost-effective • 1989 Development of a French-English version

  37. Sakhr Enterprise Machine Translation • Using transfer Architecture • analysis on all linguistic levels: morphological, lexical, syntactic and semantic • Arabic - English

  38. CiyaTran MT • English - Arabic-scripts languages : Arabic-Persian-Pashto • Analyzing the semantic, morphological and syntactical structure of input text • Utilizing Fuzzy Logic and Statistical Analysis • Using a general-purpose lexicon, as well as 85 domain-specific databases with over 3,000,000 words and phrases

  39. ARIANE (GETA) • 1960-1970: Development of CETA System for three language pairs • Change of the name to ARIANE (GETA) as the system was changed into a ‘Transfer’ system

  40. EUROTRA • Developed for the translation requirements within the European Community • A system designed to replace the Systran system because of its several limitations • 3 phases in the development of the program • One of the biggest MT project regarding expenditure, organizations and people involved

  41. Google Translation • Lunched on 2004 • Beta version on English  Arabic and English  Chinese • Fully Statistical • Commercial usage : no technical document found • On 2005, become the best translator on these two language : http://www.nist.gov

  42. Shiraz Project • This project involved the creation of an extensible research prototype of a Persian to English machine translation system • Persian to English • Transfer Based Translation • Syntactic Analysis • Unification Based context free grammar • Stopped …

  43. Moses statisticalMT • Open source with C++ • allows you to automatically train translation models for any language pair. • All you need is a collection of translated texts (parallel corpus). • beam-search • phrase-based

  44. PSMT (Prolog Statistical Machine Translation) • Used Prolog to Translate simple structures • 3 sections: • Language Model Learner • Dictionary Learner • Search Program

  45. Phramer Statistical Machine Translation • Phrase-based • Open-Source with Java • Using Bayesian model

  46. EGYPT • Statistical MT • French-English • Academic • Some workshops related to EGYPT established

  47. MT Challenges: Ambiguity • Syntactic AmbiguityI saw the man with the telescope S S NP VP NP VP VP PP V NP I I PP V NP With the telescope NP saw With the telescope saw the man the man

  48. MT Challenges: Ambiguity • Syntactic AmbiguityI saw the man on the hill with the telescope • Lexical Ambiguity E: book • Semantic Ambiguity • Homography:ball(E) = pelota, baile(S) • Polysemy:kill(E), matar, acabar (S) • Semantic granularityesperar(S) = wait, expect, hope (E)be(E) = ser, estar(S)fish(E) = pez, pescado(S)

  49. How do we evaluate MT? • Human-based Metrics • Semantic Invariance • Pragmatic Invariance • Lexical Invariance • Structural Invariance • Spatial Invariance • Fluency • Accuracy • “Do you get it?” • Automatic Metrics: Bleu

  50. BiLingual Evaluation Understudy (BLEU —Papineni, 2001) • Automatic Technique, but …. • Requires the pre-existence of Human (Reference) Translations • Produce corpus of high-quality human translations • Judge “closeness” numerically (word-error rate) • Compare n-gram matches between candidate translation and 1 or more reference translations

More Related