1 / 36

Computer communication B

Computer communication B. Automatic (Machine) translation. Bibliography. Arnold, D., Balkan, L., Lee Humphreys, R, Maijer, S. & Sadler, L. (1994) Machine translation: an introductury guide. Blackwell, Oxford Hutchins, W. John. (2000) Early years in machine translation. Benjamins

daxia
Télécharger la présentation

Computer communication B

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer communication B Automatic (Machine) translation

  2. Bibliography • Arnold, D., Balkan, L., Lee Humphreys, R, Maijer, S. & Sadler, L. (1994) Machine translation: an introductury guide. Blackwell, Oxford • Hutchins, W. John. (2000) Early years in machine translation. Benjamins • More that you find to go deeper into the topic is fine

  3. Automatic translation • Automatic translation: • It`s a process to translate in an automatic way all or part of the process to translate from one human language to the other • AT has several aspects • A social political aspect • Automation of translation can be a necessity for societies which do not want to impose a common language on their members. • An economic aspect • Human translators are expensive and human trasmation can take lots of time. • Scientific aspect • AT it`s at the interface between AI, linguistics and computer science

  4. Automatic translation: a brief history • First patent application for machine translation in the 30 • First discussions about the possibility to have automatic translation around 1946/47 • 1949 (Weaver Warren, letter to the Rockfeller foundation) • “I have a text in front of me which is written in Russian but I am going to pretend that is really written in English and that it has been coded in some strange symbols. All I need to do is strip off the code in order to retrieve the information contained in the text” • Around those years theoretical doubts (and linguistic) for ambiguous sentences • The philosopher Bar-Hillel: Fully Automatic High Quality Machine Translation was impossible both from a theoretical and a technical perspective)

  5. Automatic translation: a brief history • In the 50ties AT was introduced as an academic topic • 1954 • First public demonstration of a Automatic Translation system wich translated from English to Russian (Systran) • 1955 • First AT activities in the Soviet Union • These were the years were AT gained a lot of popularity and gained a lot of financing as well • 1964 (published in 1966 within: ”Language and machines: computers in translation and linguistics”) • ALPAC (Automatic Language Processing Advisory Committee) report: “There is no immediate or predictable prospect of useful machine translation”

  6. Automatic translation: the dark years (66-75) • In consequence to the ALPAC report, there was a huge drop in the financing for AT, and most of all a drop in the motivation. • Many research groups closed • Only 3 systems remain active in those years (two in the US and one within the EURATOM project in Ispra Italy). AT systems were developed by some groups of the Mormon churches to develop a translation of the Bible.

  7. Automatic translation: The renaissance • Around the 80ties the Commission of the European Communities (CEC) bought the English-French version of Systran • The METAL systems (Siemens) were developed • AT began to be adapted for companies • From the 8ties on there is a large development of AT systems in Japan • EUROTRA project • New flow of money for AT from the 80ties on

  8. Automatic translation: The present times • In the 90ties: AT based on statistics • Verbmobil project in Germany: Translation of spoken language bidirectionally from German to English and German to Japanese. • Developed between 1993 and 2000 with the partnership of Siemens and Philips) • Possible applications for small companies • http://verbmobil.dfki.de/overview-us.html • Present times: Hybrid AT • The needs of AT grow bigger (internet, globalization) • Technical advancement: elaboration of big corpora and development of statistical methods for AT • AT for small languages as well.

  9. Image of rollercoaster

  10. AT: Evaluation • Positive • It is important to keep a good perspective: • Results do not need to be perfect • Sketch of the scenario for users • Help for translators (who work in any case with the results) • Provide rough translations to check if something is important • To become better • Input quality: How easy are text to be translated? • Most of the texts have not been in advance intended for a translation

  11. AT: a possible stage scenario • For the most part the aim of AT is to make the first translation process. It is composed by 8 main stages • 1) Documents are in an electronic form • 2) Several computers are linked by a network • 3) There is a MT system (called for example X) • 4) Bits of the document to be translated are sent to the MT system • 5) The text needs to be translated • 6) The MT system gives an output • 7) Post-editing • 8) Human double check

  12. AT: The stage process • 1) The submitted test should be in a format that helps the MT system • Bad input = Bad output • Short sentences • Grammatical • Avoid semantic and syntactic ambiguities • A good input means a better output, therefore less post-editing time is required. • There are text-critique systems built in MT programs

  13. Systran online: an example • Input: “ieri sera sono andata a casa e dopo aver visto che il cartone del latte si era aperto e tutto il latte era sparso sul pavimento mi sono sentita male” • Output: “yesterday evening has gone to house and after to have since the latte cardboard of the latte ones had been opened and all the era scattered on the pavement they are felt to me badly”

  14. Systran online: an example • Input: Too long sentences Ambiguous words between the two languages: “Era” • Output: Not optimal Post editing is needed Human double check is needed as well

  15. What does a translator need (human) and machine need? • Knowledge of the source language • Knowledge of the target language • Knowledge of the correspondences between L1 and L2 • Cultural knowledge • Common sense • All types of knowledge should be at several levels: • Lexical • The lexical knowledge is implemented by dictionaries • Phonological • Morphological • Syntactic • Semantic • Pragmatic • Discourse

  16. What does a translator need (human) and machine need? • Knowledge of the source language • Knowledge of the target language • Knowledge of the correspondences between L1 and L2 • Cultural knowledge • Common sense • All types of knowledge should be at several levels: • Lexical • The lexical knowledge is implemented by dictionaries • Phonological • Morphological • Syntactic • Semantic • Pragmatic • Discourse

  17. Formal representations of grammar • A language is not a simple concatenation of words but words are put together in specific groups of words called constituents which are eventually unified in phrases (NP, VP, PP, NegP etc) • More simply a (English) sentence is usually formed by a subject, a verb and an object, with the possible presence of auxiliaries or modal verbs, or Wh-elements • In linguistics the structure of a sentence (syntax) is formally represented in many ways. According to the graphical way proposed by Jackendoff (1977) the most used representation is the “Tree structure” with a typical X-bar schema

  18. The phrase structure NP Specifier Complement (typically another phrase Head

  19. Parsing (or analysis) by MT • MT use annotated text • Part of speech POS-tagging (if a element in a sentence is a Noun or a verb etc) • Syntactic (how phrases are related to each other) • Semantic • MT use formal representation of grammar as models and they parse (derive) the syntactic structure of sentences using more or less complicated algorithms. • Usually lists are used instead of the formal linguistic representations

  20. Automatic parsers • Automatic parsers use a formal grammar as base • They take an input sentence • They apply that specific grammar to the sentences • To check whether the sentence is grammatical • And it can show and derive how words are combined into phrases • They give an insight in the syntactic structure of a sentence

  21. Translation engines • Translation engines are the part of MT that actually perform the automatic translation • They can be classified according to their architecture • Transformer architecture • Linguistic knowledge architectures • Tranfer systems • Interlingual

  22. Transformer (or transfer) engines • Input sentences can be transformed into output (target language) sentences by carrying out the simplest possible parse. • The source words are replaced with their target language equivalents as specified in a bilingual dictionary , and then roughly re-arranging their order to suit the rules of the target language • Stages • Parser (analysis of the source language) • Transformation rules (include bilingual dictionary and some re-order rules. Some morphological transformations are present as well (morphological component)

  23. Transformer engines 2 • Transformers do not have much independent knowledge neither of the source language nor of the target language • They rarely recognize ungrammaticalities • The output can sometimes be totally ungrammatical, getting similar to a word-salad • + Points • Robust: it does not stop in case it encounters unknown words • - Points • Not linguistically oriented • Can give ungrammatical output • Difficult to expand into a multilingual system

  24. Linguistic knowledge architectures • For a high quality MT a linguistic knowledge of both the input and the output languages is needed (together with a knowledge of the differences between them) • Linguistic knowledge architectures have a deeper syntactic analysis. • They have a substantial grammar for both input and output languages • They have a comparative grammar to compare the input and the output languages • The two grammars are developed and represented quite separately.

  25. Linguistic knowledge architectures 2 • Analysis • Parser and grammar are used to analyze the input language • Transfer • A transfer is made to change the underlying representation of the input language into the one of the output language • Synthesis • From the generated underlying representation of the output language the generator creates a sentence in the output language (using the relevant grammar as well) • All these processes can be made having the two grammars available • But to solve some differences between the grammars of the input and the output languages comparative rules are needed

  26. Linguistic knowledge architectures 3 • But to solve some differences between the grammars of the input and the output languages comparative rules are needed • Example • Le mele piacciono a Gianni • Le mele (subj) piacciono (V) a Gianni (object) • Gianni likes apples • Gianni (subj) likes (V) apples (obj) • For every differences in grammar between the two languages specific comparative grammar rules will have to be written • The deeper the level of abstraction of the parser is the smaller the amount of comparative grammar rules have to be written

  27. LK architectures 3 • Advantages • The output will be always grammatical • It is theoretically a reversible system (from L1 to L2 and conversely from L2 to L1)

  28. Interlingua • As the need for contrastive grammar decreases (given by a deeper level of the parser) what is called INTERLINGUA arises • Interlingual systems are language independent • Interlingual representations concern meanings • Interlingua tries to explain how the world is made and how their elements are out together Interlingua Parser depth Comparative grammar

  29. Interlingua • Interlingua is suitable meaning-representation for all languages • John can not go: obligatory(not(go(john))) • Each translation happens in two steps • The source language is translated into the interlingua • The interlingua is then translated to the output language • But: each translation happens in two steps • The process is longer • Generalization to the worst case

  30. Interlingua • Interlingua is suitable meaning-representation for all languages • John can not go: obligatory(not(go(john))) • Each translation happens in two steps • The source language is translated into the interlingua • The interlingua is then translated to the output language • But: each translation happens in two steps • The process is longer • Generalization to the worst case

  31. Interlingua Think, by analogy, of individuals living in a series of tall closed towers, all erected over a common foundation. When they try to communicate with one another, they shout back and forth, each from his own closed tower. It is difficult to make the sound penetrate even the nearest towers, and communication proceeds very poorly indeed. But, when an individual goes down his tower, he finds himself in a great open basement, common to all the towers. Here he establishes easy and useful communication with the persons who have also descended from their towers. Thus it may be true that the way to translate from Chinese to Arabic, or from Russian to Portuguese, is not to attempt the direct route, shouting from tower to tower. Perhaps the way is to descend, from each language, down to the common base of human communication -the real but as yet undiscovered universal language- and then re-emerge by whatever particular route is convenient. Warren Weaver

  32. AT: other methods • Example-based methods • Texts in parallel corporas are compared • The process matches against stored examples translations • It works on having a corpus of bilingual translations and the goal is to find the best matching translation (using specific algorithms) • The challenge is to be able to draw conclusions about the rules of translation

  33. AT: other methods • Statistical methods • Translate the words having as a result a literal translation in the target language • This translation is edited in order to make a good expression in the target language • They are based on probabilistic statistics • Some problems • Not all good languages in the target language are a good translation • With case ambiguities • Den Vorschlag lehnt die Kommision ab • The proposal rejects the commission • The commission rejects the proposal

  34. AT: other methods • Example-based methods • Texts in parallel corporas are compared • The process matches against stored examples translations • It works on having a corpus of bilingual translations and the goal is to find the best matching translation (using specific algorithms) • The challenge is to be able to draw conclusions about the rules of translation

  35. AT: some problems • Idiomatic expressions (very difficult) • “Non piangere sul latte versato” • Do not cry on the poured (spilled) milk • It is useless to cry on a done damage • http://babelfish.altavista.com/tr • http://www.systran.co.uk/ • http://www.google.com/translate_t • Lexical and morphological mistakes

  36. AT: some problems • Semantic ambiguities • Il capo ascolta la musica • The boss listens to the music • Morphology • Der Urinstinkt ist noch immer vorhanden • The primitive instinct is still present • Complicated constructions Too long sentences, with too many subordinate sentences “Se avessi saputo che sarebbe andata a casa l'avrei immediatamente fermata” “If I would have known that she would have gone home, I would have immediately stopped her”

More Related