340 likes | 451 Vues
Explore the complexities of machine translation, decoding paradigms, ambiguity challenges, and implementing spelling correction algorithms for accurate globalization resources. Understand translation models, interlingua usage, and human involvement in the translation process. Learn about historical notes, Bayesian rules, and the pragmatic approach to multilingual information retrieval. Delve into transfer and inter-lingua translation models to enhance language accuracy for global communication.
E N D
Week 9: resources for globalisation • Finish spell checkers • Machine Translation (MT) • The ‘decoding’ paradigm • Ambiguity • Translation models • Interlingua and First Order Predicate Calculus • Human involvement • Historical note
Spelling dictionaries • Implementing spelling identification and correction algorithm
Spelling dictionaries • Implementing spelling identification and correction algorithm • STAGE 1: compare each string in document with a list of legal strings; if no corresponding string in list mark as misspelled • STAGE 2: generate list of candidates • Apply any single transformation to the typo string • Filter the list by checking against a dictionary • STAGE 3: assign probability values to each candidate in the list • STAGE 4: select best candidate
Spelling dictionaries • STAGE 3 • prior probability • given all the words in English, is this candidate more likely to be what the typist meant than that candidate? • P(c) = c/N where N is the number of words in a corpus • likelihood • Given, the possible errors, or transformation, how likely is it that error y has operated on candidate x to produce the typo? • P(t/c), calculated using a corpus of errors, or transformations • Bayesian rule: • get the product of the prior probability and the likelihood • P(c) X P(t/c)
Spelling dictionaries • non-word errors • Implementing spelling identification and correction algorithm • STAGE 1: identify misspelled words • STAGE 2: generate list of candidates • STAGE 3a: rank candidates for probability • STAGE 3b: select best candidate • Implement: • noisy channel model • Bayesian Rule
Resoucres for Globalisation:Machine translation • The ‘decoding’ paradigm • Assumes one-to-one relation between source symbol and target symbol
Resoucres for Globalisation:Machine translation • The ‘decoding’ paradigm • Assumes one-to-one relation between source symbol and target symbol • one-to-many (homonymy)
Resoucres for Globalisation:Machine translation • The ‘decoding’ paradigm • Assumes one-to-one relation between source symbol and target symbol • one-to-many (homonymy) • one-to-many (hypernym → hyponyms):
Resoucres for Globalisation:Machine translation • The ‘decoding’ paradigm • Assumes one-to-one relation between source symbol and target symbol • one-to-many (homonymy) • one-to-many (hypernym → hyponyms): • many-to-one (hyponyms → hypernym)
Machine translation • The ‘decoding’ paradigm • one-to-many (homonymy) • bank → Ufer, Bank (German)
Machine translation • The ‘decoding’ paradigm • one-to-many (homonymy) • one-to-many (hypernym → hyponyms): • brother →otooto, oniisan (Japanese) • blue → синий, голубой (Russian) • many-to-one (hyponyms → hypernym)
Machine translation • The ‘decoding’ paradigm • one-to-many (homonymy) • one-to-many (hypernym → hyponyms): • many-to-one (hyponyms → hypernym) • hill, mountain →Berg (German) • learn, teach → leren (Dutch)
Machine translation and globalisation • Ambiguity ‘I made her duck’ “The possibility of interpreting an expression in two or more distinct ways” Collins English Dictionary
Machine translation • Ambiguity • Challenge of the translation depends on the level of ambiguity that arises • This depends on the closeness of the source and target languages w.r.t. the following: • vocabulary • homonyms • grammar • structural ambiguity • conceptual structure • specificity ambiguity • lexical gaps
Machine translation • Pragmatic approach
Machine translation • Pragmatic approach • aim for a rough translation, ‘gist’ translation • Used for multi-lingual information retrieval
Machine translation • Pragmatic approach • aim for a rough translation, ‘gist’ translation • Used for multi-lingual information retrieval • involve human translators in the process: computer-aided translation
Machine translation • Translation models • Transfer model • ‘the dog bit my friend’ Hindi: kutte-ne mere dost ko-kata dog my friend bit
Machine translation • Translation models • Transfer model • Alter grammatical structure of source language to make it adhere to the grammatical structure of target language • Use transformation rule • Analysis process (source) • Transfer process (‘bridge’) • Generation process (target) • Problem: each source-target pair will need it own unique set of transformation rules
Machine translation • Translation models • Inter-lingua model • Extract the meaning from the source string • Give it a language independent representation, i.e. an interlingua • Translation process takes the interlingua as its input • Multiple translation processes take the same input for multiple target language outputs
Machine translation • Translation models • What is the inter-lingua? • for words, some sort of semantic analysis, e.g. (GO, BY-FOOT) (GO, BY-TRANSPORT) Russian: идтиехать English: go go
Machine translation and globalisation • Translation models • What is the inter-lingua? • for sentences, a logical language e.g. First Order Predicate Calculus
Meaning representation • Goal: 1. the semantic representation must give you a one-to-one mapping to non-linguistic knowledge of the world 2. The representation must be expressive, i.e. handle different types of data
Meaning representation • First Order Predicate Calculus • computationally tractable • objects (terms) • properties of objects • relations amongst objects • Predicate argument structure • large composite representations • logical connectives
Meaning representation • First Order Predicate Calculus • Object: referred to uniquely by a term • constant e.g. SurreyUniversity • function e.g. LocationOf(SurreyUniversity) • variable
Meaning representation • First Order Predicate Calculus • Relations amongst objects • Predicates: “symbols that refer to, or name, the relations that hold among some fixed number of objects” (J & M) • Educates(SurreyUniversity, Citizens) • two-place predicate
Meaning representation • First Order Predicate Calculus • Relations amongst objects • Predicates: • Can specify the category of an object • University(SurreyUniversity) • one-place predicate
Meaning representation • First Order Predicate Calculus • properties / parts of objects • functions: • LocationOf(SurreyUniversity)
Meaning representation • First Order Predicate Calculus • Composite representations through predicates and functions: Near(LocationOf(SurreyUniversity), LocationOf(Cathedral))
Meaning representation • First Order Predicate Calculus • Logical connectives • combine basic representations to form larger more complex representations e.g ٨ operator = ‘and’
Meaning representation • First Order Predicate Calculus • Logical connectives • combine basic representations to form larger more complex representations Educates(SurreyUniversity, Citizens) ٨ ¬ Remunerates(SurreyUniversity, Staff)
Machine translation and globalisation • Machine translation and globalisation: change of priorities • 1954: IBM and Georgetown University, first MT demo • goal: ‘perfect’ translation • 1967: Automatic Language Process Advisory Committee (ALPAC) report: damning of goal • Post ALPAC • Goal: rough translation, involve human element • Current situation: online translation, e.g. Babel Fish, descendant of SYSTRAN whose goal was rough translation • Journal of Machine Translation
Next week • Globalisation as an industry • SDL and the SDLX-TRADOS globalisation application