Competitive Grouping in Integrated Segmentation and Alignment Model

Competitive Grouping in Integrated Segmentation and Alignment Model Ying Zhang Stephan Vogel Language Technologies Institute School of Computer Science Carnegie Mellon University

Integrated Segmentation and Alignment Model • Phrase alignment models (Och et al., 1999; Marcu and Wong, 2002; Kohen et al., 2003) • Many of these models rely on the pre-calculated word alignment. • Use different heuristics to extract phrase pairs from the Viterbi word alignment path. • Integrated Segmentation and Alignment model (Zhang 2003) • No such word alignments needed • Segment source and target sentences into phrases and align them simultaneously • Use chi-square(f, e) instead of the conditional probability P(f|e) for word pair associations • Greedy search for phrase pairs • Key idea: competitive grouping algorithm • Inspired by the competitive linking algorithm (Melamed 1997) for word alignment

Competitive Linking Algorithm • A greedy word alignment algorithm. • The word pair has the highest likelihood L(f,e) “wins” the competition. • One-to-one assumption: when pair{f, e} is “linked”, neither f nor e can be aligned with any other words. • Example:

Competitive Grouping Algorithm • Discard the one-to-one assumption in competitive linking, make it less greedy. • When a pair {e, f} wins the competition, inviting the neighboring pairs to join the “winner’s club”. • Introducing the locality assumption: a source phrase of adjacent words can only be aligned to a target phrase of adjacent words. • Words inside the aligned phrase pairs can not be aligned to other words

Expanding the Phrase Pair Aligned • Two criteria have to be satisfied to expand the seeding word pair to phrase pairs • If a new source word f is to be grouped, the best e that f is associated should not be “blocked” by this expansion; similar for grouping a new target word. • The highest word pair likelihood value in the expanded area needs to be “similar” to the seed value • According to the locality assumption, words in the aligned phrase pairs can not be aligned with other words again.

Exploring All Possible Phrase Pairs • Criterion 2 is used to control the granularity of the phrase pairs aligned • Two short phrase pairs • Or one long phrase pairs • Short phrases give better coverage for unseen testing data • Long phrases encapsulate more context, e.g. local reordering, word sense, and etc. • Hard to decided on the optimal granularity without knowing the testing data • Solution: for each grouping, try all possible granularities

Exploring All Possible Phrase Pairs French: Je déclare reprise la session English: I declare resumed the session

The Likelihood of Word Associations • Chi-square statistics is used to measure the likelihood of word associations for pair {e, f} • For each word pair {e, f} null hypothesis: e and f are independent of each other. • Calculating to measure how true is this hypothesis • Construct the contingency table using the counts from the corpus given the current alignment, e.g. uniform alignment • O11: number of times when e and f are aligned • O12: number of times when e aligned with other f • O21: number of times when f aligned with other e • O22: number of times when other f aligned with other e

In WPT-05 • Submitted results for all four languages • Training data as provided • Language model as provided • Decoder (Pharaoh) as provided

Conclusion • Competitive grouping algorithm at the core of the ISA model • Simple and efficient model • Comparable results as other phrase alignment models

The Evolution of ISA

Matrix of the Likelihood

Expanding the Phrase Pairs

Competitive Grouping in Integrated Segmentation and Alignment Model

Competitive Grouping in Integrated Segmentation and Alignment Model

Presentation Transcript

Model Segmentation

An Integrated Phrase Segmentation/Alignment Algorithm for Statistical Machine Translation

Competitive Stream Model

Grouping and Segmentation

Grouping and Segmentation

Segmentation and Grouping

Cluster Classroom Grouping Model

Grouping and Segmentation

Regional Appearance in Deformable Model Segmentation

Segmentation and Grouping

Grouping and Segmentation

Data Segmentation Model

Cluster Classroom Grouping Model

Grouping model elements (issue #13928)

Segmentation and Perceptual Grouping

Quasi-Competitive Model

Lecture 11 Segmentation and Grouping

Organizational Alignment Model

Strategic Segmentation Strategic Information Model Competitive Positioning

Quasi-Competitive Model