Stochastic Inversion Transduction Grammars Dekai Wu

Stochastic Inversion Transduction Grammars Dekai Wu 11-734 Advanced Machine Translation Seminar Presented by: Sanjika Hewavitharana 04/13/2006

Overview • Simple Transduction Grammars • Inversion Transduction Grammars (ITGs) • Stochastic ITGs • Parsing with SITGs • Applications of SITGs • Main Reading: Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora (1997)

Introduction • Mathematical models of translation • IBM Models (Brown et al.): String generates String • Syntax based (Yamada & Kenji): Tree generates String • ITG (Wu): two trees are generated simultaneously • ITGs • A formalism for modeling bilingual sentence pairs • Not intended to use as full translation models, but to use for parallel corpus analysis • Extract useful structures from input data • Generative view rather than translation view • two output trees are generated simultaneously, one for each language

Transduction Grammars • A simple transduction grammar is a CFG whose terminals are pairs of symbols (or singletons) • Can be used to model the generation of bilingual sentence pairs E The Financial Secretary and I will be accountable. C

Transduction Grammar Rules E.g. • Simple Rules: • Inversion Rule:

Transduction Grammars • A simple transduction grammar is a CFG whose terminals are pairs of symbols (or singletons) • Can be used to model the generation of bilingual sentence pairs E C

Transduction Grammars • In general, they are not very useful • two languages should share exactly the same grammatical structure • So some sentence pairs cannot be generated • ITG removes the rigid parallel ordering constraint • Constituent order in one language may be the inverse of the other language • Order is the same for both (square brackets): • Order is inverted for one (angle brackets):

ITGs • e.g. • With ITG we can parse the previous sentence pair • Inversion rule: VP   VV PP 

ITG Parse Tree

Expressiveness of ITGs

Expressiveness of ITGs • Not all matching are possible with ITG • e.g. ‘Inside-out’ matching are not allowed • This helps to reduce the combinatorial growth of matchings with the number of tokens • The number of matchings eliminated increases rapidly as the number of tokens increases • Author claims this is a benefit

Expressiveness of ITGs

Normal Form of ITG • For any ITG there exists an equivalent grammar in the normal form • Right hand side of all rules have either: • Terminal couples • Terminal singletons • Pairs of non-terminals with straight orientation • Pairs of non-terminals with inverted orientation

Stochastic ITGs • A probability can be assigned to each rewrite rule • The probabilities of all the rules with a given left hand side must sum to 1. • An SITG will give the most probable matching (ML) parse for a sentence pair. • Similar to Viterbi or CYK (Chart) parsing

Parsing with SITGs • Every node (q) in the parse tree has 5 elements: • Begin & end indices for language-1 string (s,t) • Begin & end indices for language-2 string (u,v) • Non-terminal category (i) • Each cell (in the chart) stores the probability of the most likely parse covering the appropriate substrings, rooted in the appropriate category

Parsing with SITGs - Algorithm • Initialize the cells corresponding to terminals using a translation lexicon • For the other cells, recursively find the most probable way of obtaining that nonterminal category. • Compute the probability by multiplying the probability of the rule by the probabilities of both the constituents • Store that probability plus the orientation of the rule • Complexity: O(n3m3)

Applications of SITGs • Segmentation • Bracketing • Alignment • Bilingual Constraint Transfer • Mining parallel sentences from comparable corpora [Wu & Fung 2005]

Applications of SITGs - Segmentation • Word boundaries are not marked in Chinese text • No word chunks available for matching • One option : do word segmentation as preprocessing • Might produce chunks with that does not agree bilingually • Solution: extend the algorithm to accommodate segmentation • Allow the initialization step to find strings of any length in the translation lexicon • The recursive step stores the most probable way of creating a constituent, whether it came from the lexicon or from rules

Applications of SITGs – Bracketing • How to assign structure to a sentence with no grammar available? • Especially problematic for minority language • A solution using ITGs: • Get a parallel corpus pairing it with some other language • Get a reasonable translation dictionary • Parse it with a bracketing transduction grammar

Bracketing Transduction Grammar • A minimal ITG • Only one nonterminal: A • Production rules: • Lexical translation probabilities has prominence • Small prob. values for the two singleton production rules • Also, a very small value for

Bracketing with Singletons • Singletons cause bracketing errors • Some refinements: • Depending on the language, bias the singletons attachment either to the left or the right of a constituent • Apply a series of transformations which would push the singletons as closely as possible towards couples e.g. [ xA B  ] ⇌xA B ⇌ x A  B⇌  [x A ] B • Before: • After:

Bracketing Experiments • Used 2000 Chinese-English sentence-pairs from HKUST corpus • Some filtering: • Remove sentence pairs that were not adequately covered by the lexicon (>1 unknown words) • Remove sentence pairs with high unmatched words (>2) • Bracketing precision: • 80% for English • 78% for Chinese • Errors mainly due to lexical imperfections • A statistical lexicon (~6.5k English, ~5.5k Chinese words) • Can be improved with extra information • e.g. POS, grammar-based bracketer

Applications of SITGs - Alignment • Alignments (phrasal or word) are a natural byproduct of bilingual parsing • Unlike ‘parse-parse-match’ methods, this • Doesn’t require a robust grammar for both languages • Guarantees compatibility between parses • Has a principled way of choosing between possible alignments • Provides a more reasonable ‘distortion penalty’ • Recent empirical studies show ITGs produce better alignments in various applications [Wu & Fung 2005]

Bilingual Constraint Transfer • A high-quality parse for one language can be leveraged to get structure for the other • Alter the parsing algorithm: • only allow constituents that match the parse that already exists for the well-studied language • This works for any sort of constraint supplied for the well-studied language

References: • Dekai Wu (1997), Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora, Computational Linguistics, Vol. 23, no. 1, pp. 377-403. • Dekai Wu (1995), Grammarless Extraction of Phrasal Translation Examples from Parallel Texts, 6th Intl. Conf.on Theoretical and Methodological Issues in Machine Translation, Vol. 2, pp. 354-372. Leuven, Belgium. • Dekai Wu and Pascale FUNG (2005), Inversion Transduction Grammar Constraints for Mining Parallel Sentences from Quasi-Comparable Corpora, 2nd Intl. Joint Conf. on Natural Language Processing (IJCNLP-2005), Jeju, Korea, October.

Stochastic Inversion Transduction Grammars Dekai Wu

Stochastic Inversion Transduction Grammars Dekai Wu

Presentation Transcript

Grammars

TRANSDUCTION

Transduction

Stochastic Context Free Grammars

Inversion:

Transduction

Inversion

Transduction

Inversion

Grammars

Stochastic Context Free Grammars for RNA Modeling

Grammars

Grammars

Inversion

Inversion ?

Stochastic Grammars: Overview

Stochastic Context Free Grammars

INVERSION

Grammars

Stochastic Context-Free Grammars for Modeling RNA

Stochastic Inversion Transduction Grammars Dekai Wu