Machine Translation

Machine Translation Dai Xinyu 2006-10-27

Outline • Introduction • Architecture of MT • Rule-Based MT vs. Data-Driven MT • Evaluation of MT • Development of MT • MT problems in general • Some Thinking about MT from recognition

"I have a text in front of me which is written in Russian but I am going to pretend that it is really written in English and that it has been coded in some strange symbols. All I need do is strip off the code in order to retrieve the information contained in the text" Introduction • machine translation - the use of computers to translate from one language to another • The classic acid test for natural language processing. • Requires capabilities in both interpretation and generation. • About $10 billion spent annually on human translation. • http://www.google.com/language_tools?hl=en

Introdution - MT past and present • mid-1950's - 1965: • Great expectations • The dark ages for MT: • Academic research projects • 1980's - 1990's: • Successful specialized applications • 1990's: • Human-machine cooperative translation • 1990's - now: • Statistical-based MT • Hybrid-strategies MT • Future prospects: • ???

Interest in MT • Commercial interest: • U.S. has invested in MT for intelligence purposes • MT is popular on the web—it is the most used of Google’s special features • EU spends more than $1 billion on translation costs each year. • (Semi-)automated translation could lead to huge savings

Interest in MT • Academic interest: • One of the most challenging problems in NLP research • Requires knowledge from many NLP sub-areas, e.g., lexical semantics, parsing, morphological analysis, statistical modeling,… • Being able to establish links between two languages allows for transferring resources from one language to another

Related Area to MT • Linguistics • Computer Science • AI • Compile • Formal Semantics • … • Mathematics • Probability • Statistics • … • Informatics • Recognition

Architecture of MT -- (Levers of Transfer)

Rule-Based MT vs. Data-Driven MT • Rule-Based MT • Data-Driven MT • Example-Based MT • Statistics-Based MT

Rule-Based MT 语言学语义学认知科学人工智能写规则规则自然语言输入翻译系统翻译结果

Rule-Based MT

Hmm, every time he sees “banco”, he either types “bank” or “bench” … but if he sees “banco de…”, he always types “bank”, never “bench”… Man, this is so boring. Translated documents

Example-Based MT • origins: Nagao (1981) • first motivation: collocations, bilingual differences of syntactic structures • basic idea: • human translators search for analogies (similar phrases) in previous translations • MT should seek matching fragment in bilingual database, extract translations • aim to have less complex dictionaries, grammars, and procedures • improved generation (using actual examples of TL sentences)

EBMT still going • Bi-lingual corpus Collection • Store • Searching and matching • …

Statistical MT Basics • Based on assumption that translations observed statistical regularities • origins: Warren Weaver (1949) • Shannon’s information theory • core process is the probabilistic ‘translation model’ taking SL words or phrases as input, and producing TL words or phrases as output • succeeding stage involves a probabilistic ‘language model’ which synthesizes TL words as ‘meaningful’ TL sentences

Statistical MT 统计学习建立模型自然语言输入概率模型学习系统预测自然语言输入预测系统

Statistical MT schema

Statistical MT processes • Bilingual corpora: original and translation • little or no linguistic ‘knowledge’, based on word co-occurrences in SL and TL texts (of a corpus), relative positions of words within sentences, length of sentences • Alignment: sentences aligned statistically (according to sentence length and position) • Decoding: compute probability that a TL string is the translation of a SL string (‘translation model’), based on: • frequency of co-occurrence in aligned texts of corpus • position of SL words in SL string • Adjustment: compute probability that a TL string is a valid TL sentence (based on a ‘language model’ of allowable bigrams and trigrams) • search for TL string that maximizes these probabilities argmaxeP(e/f) = argmaxeP (f/e) P (e)

Language Modeling • Determines the probability of some English sequence of length l • P(e) is normally approximated as: where m is size of the context, i.e. number of previous words that are considered, m=1, bi-gram language model m=2, tri-gram language model

Translation Modeling • Determines the probability that the foreign word f is a translation of the English word e • How to compute P(f | e) from a parallel corpus? • Statistical approaches rely on the co-occurrence of e and f in the parallel data: If e and f tend to co-occur in parallel sentence pairs, they are likely to be translations of one another

SMT issues • ignores previous MT research (new start, new ‘paradigm’) • basically ‘direct’ approach: • replaces SL word by most probable TL word, • reorders TL words • decoding is effectively kind of ‘back translation’ • originally wholly word-based (IBM ‘Candide’ 1988) ; now predominantly phrase-based (i.e. alignment of word groups); some research on syntax-based • mathematically simple, but huge amount of training (large databases) • problems for SMT: • translation is not just selecting the most frequent ‘equivalent’ (wider context) • no quality control of corpora • lack of monolingual data for some languages • insufficient bilingual data (Internet as resource) • lack of structure information of language • merit of SMT: evaluation as integral process of system development

Rule-Based MT & SMT • SMT black box: no way of finding how it works in particular cases, why it succeeds sometimes and not others • RBMT: rules and procedures can be examined • RBMT and SMT are apparent polar opposites, but gradually ‘rules’ incorporated in SMT models • first, morphology (even in versions of first IBM model) • then, ‘phrases’ (with some similarity to linguistic phrases) • now also, syntactic parsing

Rule-Based MT & SMT • Comparison from following perspectives: • Theory background • Knowledge expression • Knowledge discovery • Robust • Extension • Development Cycle

Evaluation of MT • Manual: • Precise / fluency / integrality • 信达雅 • Automatically evaluation: • BLEU: percentage of word sequences (n-grams) occurring in reference texts • NIST

Development of MT - MT System

Shallow/ Simple MT Development - Research Original statistical MT Word-based only Electronic dictionaries Example-based MT Phrase tables Knowledge Acquisition Strategy Hand-built by experts Hand-built by non-experts Learn from annotated data Learn from un-annotated data All manual Fully automated Original direct approach Syntactic Constituent Structure Typical transfer system Semantic analysis New Research Goes Here! Classic interlingual system Interlingua Knowledge Representation Strategy Deep/ Complex

MT problems in general • Characters of language • Ambiguous • Dynamic • Flexible • Knowledge • How to express • How to discovery • How to use

Some Thinking about MT from recognition • Human Cerebra • Memory • Progress - Learning • Model • Pattern • Translation by human… • Translation by machine…

Further Reading • Arturo Trujillo, Translation Engines: Techniques for Machine Translation, Springer-Verlag London Limited 1999 • P.F. Brown, et al., A Statistical Approach to MT, Computational Linguistics, 1990,16(2) • P.F. Brown, et al., The Mathematics of Statistical Machine Translation: Parameter Estimation, Computational Linguistics, 1993, 19(2) • Bonnie J. Dorr, et al, Survey of Current Paradigms in Machine Translation • Makoto Nagao, A Framework of a Mechanical Translation between Japanese and English by Analog Principle, In A. Elithorn and R. Banerji(Eds.), Artificial and Human Intelligence. NATO Publications, 1984 • Hutchins WJ, Machine Translation: Past, Present, Future. Chichester: Ellis Horwood, 1986 • Daniel Jurafsky & James H. Martin, Speech and Language Processing, Prentice-Hall, 2000 • Christopher D. Manning & Hinrich Schutze, Foundations of Statistical Natural Langugae Processing, Massachusetts Institute of Technology, 1999 • James Allen, Natural Language Understanding, The Benjamin/Cummings Publishing Company, Inc. 1987

Machine Translation