Extraction of Bilingual Information from Parallel Texts

Extraction of Bilingual Information from Parallel Texts Mike Rosner CSAW 2004

Outline • Machine Translation • Traditional vs. Statistical Architectures • Experimental Results • Conclusions CSAW 2004

Translational Equivalence:many:many relation SOURCE TARGET CSAW 2004

Traditional Machine Translation CSAW 2004

Remarks • Character of System • Knowledge based. • High quality results if domain is well delimited. • Knowledge takes the form of specialised rules (analysis; synthesis; transfer). • Problems • Limited coverage • Knowledge acquisition bottleneck. • Extensibility. CSAW 2004

Statistical Translation • Robust • Domain independent • Extensible • Does not require language specialists • Uses noisy channel model of translation CSAW 2004

Noisy Channel ModelSentence Translation (Brown et. al. 1990) target sentence sourcesentence sentence CSAW 2004

The Problem of Translation • Given a sentence T of the target language, seek the sentence S from which a translator produced T, i.e. find S that maximises P(S|T) • By Bayes' theorem P(S|T) = P(S) x P(T|S) P(T) whose denominator is independent of S. • Hence it suffices to maximise P(S) x P(T|S) CSAW 2004

A Statistical MT System S T Source Language Model Translation Model P(S) * P(T|S) = P(S,T) T S Decoder CSAW 2004

The Three Components of a Statistical MT model • Method for computing language model probabilities (P(S)) • Method for computing translation probabilities (P(S|T)) • Method for searching amongst source sentences for one that maximisesP(S) * P(T|S) CSAW 2004

A Simple Alignment Based Translation Model Assumption: target sentence is generated from the source sentence word-by-word S: John loves Mary T: Jean aime Marie CSAW 2004

Sentence Translation Probability • According to this model, the translation probability of the sentence is just the product of the translation probabilities of the words. • P(T|S) =P(Jean aime Marie|John loves Mary) =P(Jean|John) * P(aime|loves) * P(Marie|Mary) CSAW 2004

More Realistic Example The proposal will not now be implemented Les propositions ne seront pas mises en application maintenant CSAW 2004

Some Further Parameters • Word Translation Probability:P(t|s) • Fertility: the number of words in the target that are paired with each source word: (0 – N) • Distortion: the difference in sentence position between the source word and the target word: P(i|j,l) CSAW 2004

Searching • Maintain list of hypotheses. Initial hypothesis: (Jean aime Marie | *) • Search proceeds interatively. At each iteration we extend most promising hypotheses with additional wordsJean aime Marie | John(1) *Jean aime Marie | * loves(2) *Jean aime Marie | * Mary(3) * CSAW 2004

Parameter Estimation • In general - large quantities of data • For language model, we need only source language text. • For translation model, we need pairs of sentences that are translations of each other. • Use EM Algorithm (Baum 1972) to optimize model parameters. CSAW 2004

Experiment (Brown et. al. 1990) • Hansard. 40,000 pairs of sentences = approx. 800,000 words in each language. • Considered 9,000 most common words in each language. • Assumptions (initial parameter values) • each of the 9000 target words equally likely as translations of each of the source words. • each of the fertilities from 0 to 25 equally likely for each of the 9000 source words • each target position equally likely given each source position and target length CSAW 2004

French Probability pas .469 ne .460 non .024 pas du tout .003 faux .003 plus .002 ce .002 que .002 jamais .002 Fertility Probability 2 .758 0 .133 1 .106 English: not CSAW 2004

French Probability bravo .992 entendre .005 entendu .002 entends .001 Fertility Probability 0 .584 1 .416 English: hear CSAW 2004

Bajada 2003/4 • 400 sentence pairs from Malta/EU accession treaty • Three different types of alignment • Paragraph (precision 97% recall 97%) • Sentence (precision 91% recall 95%) • Word: 2 translation models • Model 1: distortion independent • Model 2: distortion dependent CSAW 2004

Bajada 2003/4 CSAW 2004

Conclusion/Future Work • Larger data sets • Finer models of word/word translation probabilities taking into account • fertility • morphological variants of the same words • Role and tools for bilingual informant (not linguistic specialist) CSAW 2004

Extraction of Bilingual Information from Parallel Texts

Extraction of Bilingual Information from Parallel Texts

Presentation Transcript

Bilingual term extraction revisited

Information Extraction from Web Documents

Information Extraction From Recipes

Information Extraction from Scientific Texts

Information Extractions from Texts

Information Extraction from Biomedical Text

Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web as a Corpus

UNDERSTANDING BILINGUAL TRANSLATION OF SPECIALIZED TEXTS

Information extraction from text

Information Extraction from biomedical texts

Information Extraction from Literature

Information extraction from text

Information extraction from text

Extracting Parallel Texts from Massive Web Documents

Mutual bilingual terminology extraction

Information extraction from Queries

Populating a Database from Parallel Texts using “Ontology-based” Information Extraction

Information extraction from text

Information extraction from text

Information extraction from text

Information extraction from Queries

Information extraction from text