1 / 23

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages. Katharina Probst April 5, 2002. Overview of the talk. Introduction and Motivation Overview of the AVENUE project Elicitation of bilingual data Rule Learning Seed Generation

lacy
Télécharger la présentation

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002

  2. Overview of the talk • Introduction and Motivation • Overview of the AVENUE project • Elicitation of bilingual data • Rule Learning • Seed Generation • Seeded Version Space Learning • Conclusions and Future Work

  3. Overview of the talk • Introduction and Motivation • Overview of the AVENUE project • Elicitation of bilingual data • Rule Learning • Seed Generation • Seeded Version Space Learning • Conclusions and Future Work

  4. Introduction and Motivation • Basic idea: opening up Machine Translation to Languages to minority languages • Scarce resources for minority languages: • Bilingual text • Monolingual text • Target language grammar • Due to scarce resources, statistical and example-based methods will likely not perform as well • Our approach: • A system that elicits necessary information about the target language from a bilingual informant • The elicited information is used in conjunction with any other available target language information to learn syntactic transfer rules

  5. System overview SL Input Run-Time Module Learning Module SL Parser EBMT Engine Elicitation Process SVS Learning Process Transfer Rules Transfer Engine TL Generator User Unifier Module TL Output

  6. Overview of the talk • Introduction and Motivation • Overview of the AVENUE project • Elicitation of bilingual data • Rule Learning • Seed Generation • Seeded Version Space Learning • Conclusions and Future Work

  7. Elicitation • Eliciation is the process of presenting a bilingual speaker with sets of sentences. The user translates the sentences and specifies how the words align • The elicitation process serves multiple purposes: • Collection of data • Feature detection

  8. Feature Detection • Feature detection is a process by which the learning module answers questions such as “Does the target language mark number on nouns?” • The elicitation corpus is organized in minimal pairs, i.e. pairs of sentences that differ in only one feature. For example: • You (John) are falling.[2nd person m, subj, present tense] • You (Mary) are falling.[2nd person f, subj, present tense] • You (Mary) fell.[2nd person f, subj, past tense] • Sentences 1 and 2 and sentences 2 and 3 are minimal pairs. • By comparing the translations for “you”, the system gets indications of whether plural is marked on nouns. • The results of feature detection will be used to guide the system in navigating through the elicitation corpus by eliminating parts used on Implicational Universals • The results will also be used by the rule learning module

  9. More on the elicitation corpus • Eliciting data from bilingual informants entails a number of challenges: • The bilingual informant him/herself • Morphology and the lexicon • Learning grammatical features • Compositional elicitation • Elicitation of non-compositional data • Verb subcategorization • Alignment issues • Bias towards the source language

  10. Overview of the talk • Introduction and Motivation • Overview of the AVENUE project • Elicitation of bilingual data • Rule Learning • Seed Generation • Seeded Version Space Learning • Conclusions and Future Work

  11. Rule Learning in the AVENUE project - Introduction • The goal is to semi-automatically (i.e. with the help of the user) infer syntactic transfer rules • Rule learning can be divided into two main steps: • Seed Generation: The system produces an initial “guess” at a transfer rule based on only one sentence. The produced rule is quite specific to the input sentence. • Version Space Learning: Here, the system takes the seed rules and generalize them.

  12. Transfer rule formalism A transfer rule (TR) consists of the following components: • Source language sentence, Target language sentence that the TR was produced from • Word alignments • Phrase information such as NP, S, … • Part-of-Speech sequences for source and target language. • X-side constraints, i.e. constraints on the source language. These are used for parsing. • Y-side constraints, i.e. constraints on the target language. These are used for generation. • XY-constraints, i.e. constraints that transfer features from the source to the target language. These are used for transfer.

  13. Seed Generation

  14. A word on compositionality • Basic idea: if you produce a transfer rule for a sentence, and there already exist transfer rules that can translated parts of the sentence, why not use them? • Adjust the alignments, part-of-speech sequences, and the constraints • The trickiest part is to find new constraints that cannot be in the lower-level rule, but are necessary to translate correctly in the context of a sentence

  15. Clustering • Seed rules are “clustered” into groups that warrant attempt to merge • Clustering criteria: POS sequences, Phrase information, Alignments • Main reason for clustering: divide the large version space into a number of smaller version spaces and run the algorithm on each version space separately • Possible danger: Rules that should be considered together (such as “the man”, “men”) will not be

  16. The Version Space • A set of seed rules in a cluster defines a version space as follows: The seed rules form the specific boundary (S). A virtual rule with the same POS sequences, alignments, and phrase information, but no constraints forms the general boundary (G): G boundary: virtual rule with no constraints Generalizations of seed rules, less specific than rule in G S boundary: seed rules

  17. The partial ordering of rules in the version space • A rule TR2 is said to be strictly more general than another rule TR1 if the set of f-structures that satisfy TR2 are a superset of the set of f-structures that satisfy TR1. It is said to be equivalent to TR1 if the set of f-structures that satisfies TR1 is the same as the set of f-structures that satisfies TR2. • We have defined three operations that move a transfer rule to a strictly more general rule

  18. Generalization operations • Operation 1: delete value constraint, e.g. ((X1 agr) = *3pl) → NULL • Operation 2: delete agreement constraint, e.g. ((X1 agr) = (X2 agr)) → NULL • Operation 3: merge two value constraints to an agreement constraint ((X1 agr) = *3pl) , ((X2 agr) = *3pl) → ((X1 agr) = (X2 agr))

  19. Merging two transfer rules At the heart of the seeded version space learning algorithm is the merging of two transfer rules (TR1 and TR2) to a more general rule (TR3): • All constraints that are both in TR1 and TR2 are inserted into TR3 and removed from TR1 and TR2. • Perform all instances of Operation3 on TR1 and TR2 separately. • Repeat step 1.

  20. Seeded Version Space Algorithm • Remove duplicate rules from the S boundary • Try to merge each pair of transfer rules • A merge is successful only if the CSet (set of covered sentences, i.e. sentences that are translated correctly) of the merged rule is a superset of the union of the CSets of the two unmerged rules • Pick the successful merge that optimizes an evaluation criterion • Repeat until no more merges are found

  21. Evaluating a set of transfer rules • Initial thought: evaluate a merge based on the “goodness” of the new rule, i.e. its CSet and based on the size of the rule set • Goal: maximize coverage and minimize set • Currently: merges are only successful if there is no loss in coverage, so size of rule set only criterion used • Future(1): Coverage should be measured on a test set • Future(2): Relax the constraint that a successful merge cannot result in loss of coverage

  22. Overview of the talk • Introduction and Motivation • Overview of the AVENUE project • Elicitation of bilingual data • Rule Learning • Seed Generation • Seeded Version Space Learning • Conclusions and Future Work

  23. Conclusions and Future Work • Novel approach to data-driven MT: less data, more encoded linguistic knowledge • Still in the first stages, so system is under heavy development and subject to major changes • Current work: compositionality • Future work includes: • Expanding coverage • Addressing (much) more complex constructions • Eliminating some assumptions

More Related