Postgraduate Diploma In Translation

Postgraduate DiplomaIn Translation Example Based Machine Translation Statistical Machine Translation Machine Translation II.2

Three ways to lighten the load • Restrict coverage to specialised domains • Exploit existing sources of knowledge (convert machine readable dictionaries) • Try to manage without explicit representations • Example Based MT (EBMT) • Statistical MT (SMT) Machine Translation II.2

Today’s Lecture • Example Based MT • Statistical MT Machine Translation II.2

Part I Example Based Machine Translation Machine Translation II.2

EBMT • Basic idea is that instead of being based on on rules and abstract representations, translation should be based on a database of examples. • Each example is pairing of a source/target fragment. • The original intuition came from Nagao, a well-known pioneer in the field of En/Jp translation. Machine Translation II.2

EBMT (Nagao 1984) Man does translation by: • by properly decomposing an input sentence into certain fragmental phrases, then • by translating these phrases into other language phrases, and finally • by properly composing these fragmental translations into one long sentence. Machine Translation II.2

Three Step Process • Match: identify relevant source language examples in database. • Align: find corresponding fragments in target language. • Recombine: target language fragments to form sentences. Machine Translation II.2

EBMT An Example Based Machine Translation as used in the Pangloss system at Carnegie Mellon University Based on Notes by Dave Inman Machine Translation II.2

Corpus S1: The cat eats a fish. Le chat mange un poisson. S2: A dog eats a cat. Un chien mange un chat. S99,999,999 …. Index the: S1 cat: S1,S2 eats: S1,S2 fish: S1 dog: S2 EBMT Corpus & Index Machine Translation II.2

EBMT: find chunks • A source language sentence is input.The cat eats a dog. • Chunks of this sentence are matched against the corpus. The cat: S1 The cat eats: S1 The cat eats a: S1 a dog: S2 Machine Translation II.2

Match and Align Chunks • For each chunk retrieve target. • the cat eats : S1The cat eats a fish. Le chat mange un poisson • a dog: S2a dog. Un chien mange un chat. • The chunks are aligned with target sentences The cat eatsLe chat mange un poisson • Alignment is difficult. Machine Translation II.2

Recombination • Chunks are scored to find good match… • The cat eats/Le chat mange Score 78% • The cat eats /Le chat dorme Score 43% • a dog/un chien Score 67% • a dog/le chien Score 56% • a dog/un arbre Score 22% • The best translated chunks are put together to make the final translation. • The cat eats/Le chat mange • a dog/un chien Machine Translation II.2

What Data Are Needed? • A bilingual dictionary …but we can induce this from the corpus. • A target language root/synonym list. … so we can see similarity between words and inflected forms (e.g. verbs) • Classes of words easily translated … such as numbers, towns, weekdays. • A large corpus of parallel sentences. …if possible in the same domain as the translations. Machine Translation II.2

How to create a bilingual lexicon • Take each sentence pair in the corpus. • For each word in the source sentence, add each word in the target sentence and increment the frequency count. • Repeat for as many sentences as possible. • Use a threshold to get possible alternative translations. Machine Translation II.2

How to create a lexiconThe cat eats a fish. Le chat mange un poisson. Machine Translation II.2

After many sentences … the le,956 la,925 un,235 ------ Threshold ---------- chat,47 mange,33 poisson,28 .... arbre,18 Machine Translation II.2

After many sentences … cat chat,963 ------ Threshold ---------- le,604 la,485 un,305 mange,33 poisson,28 .... arbre,47 Machine Translation II.2

Indexing the Corpus • For speed the corpus is indexed on the source language sentences. • Each word in each source language sentence is stored with info about the target sentence. • Words can be added to the corpus and the index easily updated. • Tokens are used for common classes of words (e.g. numbers). This makes matching more effective. Machine Translation II.2

Finding Chunks to Translate • Look up each word in the source sentence in the index. • Look for chunks in the source sentence (at least 2 words adjacent) which match the corpus. • Select last few matches against the corpus (translation memory). • Pangloss uses the last 5 matches for any chunk. Machine Translation II.2

Matching a chunk against the target. • For each source chunk found previously, retrieve the target sentences from the corpus (using the index). • Try to find the translation for the source chunk from these sentences. • This is the hard bit! • Look for the minimum and maximum segments in the target sentences which could correspond with the source chunk. Score each of these segments. Machine Translation II.2

Scoring a segment… • Unmatched Words : Higher priority is given to sentences containing all the words in an input chunk. • Noise : Higher priority is given to corpus sentences which have fewer extra words. • Order : Higher priority is given to sentences containing input words in the order which is closer to their order in the input chunk. • Morphology : Higher priority is given to sentences in which words match exactly rather than against morphological variants. Machine Translation II.2

Whole Sentence Match • If we are lucky the whole sentence will be found in the corpus! • In that case the target sentence is used without previous alignment. • Useful if translation memory is available (sentences recently translated are added to the corpus). Machine Translation II.2

Quality of Translation • Pangloss was tested against source sentences in a different domain to the examples in the corpus. • Pangloss “covered” about 70% of the sentences input. • This means a match was found against the corpus…. • …but not necessarily a good match. • Others report around 60% of the translation can be understood by a native speaker. Systran manages about 70%. Machine Translation II.2

Speed of Translation • Translations are much faster than for Systran. • Simple sentences translated in seconds. • Corpus can be added to (translation memory) at about 6MBytes per minute (Sun Sparc Station) • A 270 Mbytes corpus takes 45 minutes to index. Machine Translation II.2

Positive Points • Fast • Easy to add a new language pair • No need to analyse languages (much) • Can induce a dictionary from the corpus • Allows easy implementation of translation memory Machine Translation II.2

Negative Points • Quality is second best at present • Depends on a large corpus of parallel, well translated sentences • 30% of source has no coverage (translation) • Matching of words is brittle – we can see a match Pangloss cannot. • Domain of corpus should match domain to be translated - to match chunks Machine Translation II.2

Conclusions • An alternative to Systran • Faster • Lower quality • Quick to develop for a new language pair – if corpus exists! • Needs no linguistics • Might improve as bigger corpora become available? Machine Translation II.2

Part II Statistical Translation Machine Translation II.2

Statistical Translation • Robust • Domain independent • Extensible • Does not require language specialists • Uses noisy channel model of translation Machine Translation II.2

Noisy Channel ModelSentence Translation (Brown et. al. 1990) target sentence sourcesentence sentence Machine Translation II.2

Basic Principle John loves Mary (S) Jean aime Marie (T) • Given T, I have to find S such that • Ptrans = probability that T is a translation of S • Ps = probability of S • Ptrans x Ps is greater than for any other S’ Machine Translation II.2

A Statistical MT System S T Source Language Model Translation Model Ps Ptrans T S Decoder Machine Translation II.2

The Three Components of a Statistical MT model • Method for computing language model (Ps) probabilities • Method for computing translation (Ptrans) probabilities • Method for searching amongst source sentences for one that maximises Ptrans x Ps Machine Translation II.2

Simplest Language Model • Probability Ps of any sentence is the product of the probabilities of the words in it. • For example, Probability ofJohn loves Mary= P(John) x P(loves) x P(Mary) Machine Translation II.2

Simplest Translation Model (1) Assumption: target sentence is generated from the source sentence word-by-word S: John loves Mary T: Jean aime Marie Machine Translation II.2

Simplest Translation Model (2) • Ptrans is just the product of the translation probabilities of each of the words. • Ptrans =P(Jean|John) * P(aime|loves) * P(Marie|Mary) Machine Translation II.2

More Realistic Example The proposal will not now be implemented Les propositions ne seront pas mises en application maintenant Machine Translation II.2

More Realistic Translation Models • Better translation models include other features such as • Fertility: the number of words in the target that are paired with each source word: (0 – N) • Distortion: the difference in sentence position between the source word and the target word Machine Translation II.2

Searching • Maintain list of hypotheses. Initial hypothesis: (Jean aime Marie | *) • Search proceeds interatively. At each iteration we extend most promising hypotheses with additional wordsJean aime Marie | John(1) *Jean aime Marie | * loves(2) *Jean aime Marie | * Mary(3) *Jean aime Marie | Jean(1) * Machine Translation II.2

Building Models • In general - large quantities of data • For language model, we need only source language text. • For translation model, we need pairs of sentences that are translations of each other. • Use EM Algorithm (Baum 1972) to optimize model parameters. Machine Translation II.2

Experiment 1 (Brown et. al. 1990) • Hansard. 40,000 pairs of sentences = approx. 800,000 words in each language. • Considered 9,000 most common words in each language. • Assumptions (initial parameter values) • each of the 9000 target words equally likely as translations of each of the source words. • each of the fertilities from 0 to 25 equally likely for each of the 9000 source words • each target position equally likely given each source position and target length Machine Translation II.2

French Probability le .610 la .178 l’ .083 les .023 ce .013 il .012 de .009 à .007 que .007 Fertility Probability 1 .871 0 .124 2 .004 English: the Machine Translation II.2

French Probability pas .469 ne .460 non .024 pas du tout .003 faux .003 plus .002 ce .002 que .002 jamais .002 Fertility Probability 2 .758 0 .133 1 .106 English: not Machine Translation II.2

French Probability bravo .992 entendre .005 entendu .002 entends .001 Fertility Probability 0 .584 1 .416 English: hear Machine Translation II.2

Experiment 2 • Perform translation using 1000 most frequent words in the English corpus. • 1,700 most frequently used French words in translations of sentences completely covered by 1000 word English vocabulary. • 117,000 pairs of sentences completely covered by both vocabularies. • Parameters of English language model from 570,000 sentences in English part. Machine Translation II.2

Experiment 2 contd • 73 French sentences tested from elsewhere in corpus. Results were classified as • Exact – same as actual translation • Alternate – same meaning • Different – legitimate translation but different meaning • Wrong – could not be intepreted as a translation • Ungrammatical – grammatically deficient • Corrections to the last three categories were made and keystrokes were counted Machine Translation II.2

Results Machine Translation II.2

Results - Discussion • According to Brown et. al., system performed successfully 48% of the time (first three categories). • 776 keystrokes needed to repair 1916 keystrokes to generate all 73 translations from scratch. • According to authors, system therefore reduces work by 60%. Machine Translation II.2

Bibliography • Statistical MTBrown et. al., A Statistical Approach to MT, Computational Linguistics 16.2, 1990 pp79-85 (search “ACL Anthology”) Machine Translation II.2

Postgraduate Diploma In Translation