IJCNLP2005-paraphrasing

IJCNLP2005-paraphrasing Weigang LI 2005-10-21

Jeju, The Republic of Korea

Outline • Paraphrasing in Main Conference • Paraphrasing in Workshop • Harvest in IJCNLP • Pities in IJCNLP • Conclusions

Paraphrasing in Main Conference • Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web • Web-Based Unsupervised Learning for Query Formulation in Question Answering • Exploiting Lexical Conceptual Structure for Paraphrase Generation

Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web

Basic Information • Author: Marius Pasca and Peter Dienes • Affiliation: Google Inc. • Main Idea • IF two sentence fragments have common word sequences at both extremities, then the variable word sequences in the middle are potential paraphrases of each other • A significant advantage of this extraction mechanism is that it can acquire paraphrases from sentences whose information content overlaps only partially, as long as the fragments align

An example

Pre-processing • Filtering out HTML tags, POS • Words number: 5 < n < 30 • At least one verb • At least one noun word starts in lowercase • Every word length less than 30 • Less than half words are numbers More than One Billion sentences

Algorithm

Problem of this method • An example • “decided to read the government report published last month” • “decided to read the edition published last month” • How to avoid this problem?

Alignment Anchors • Ngram-Only • Ngram-Entity • Preceding and following named entities, here, just use the noun • Ngram-Relative • Several lexico-syntactic patterns

Results

Web-Based Unsupervised Learning for Query Formulation in Question Answering

Basic Information • Author: Yi-Chia Wang, Jian-Cheng Wu, Tyne Liang, and Jason S. Chang • Affiliation: National Chiao Tung University, National Tsing Hua University, • Query Formulation

Main Idea • Training-data: questions are classified into a set of fine-grained categories of question patterns • Using a word alignment technique: the relationships between the question patterns and n-grams in answer passages are discovered • Finally, the best query transforms are derived by ranking the n-grams which are associated with a specific question pattern

Transforming Question to Query • Search the Web for Relevant Answer Passages • Question Pattern Extraction • Some rules are manually made • Learning Best Transforms • Word Alignment Across Q and AP • SMT aligned Technology to apply qi and ai (bigram) • Select top k bigrams, t1, t2,.., tk, for every question pattern or keyword q • Distance Constraint and Proximity Ranks (between bigrams and answer) • Combing Alignment and Proximity Ranks

Runtime Transformation of Questions • Pre-processing • Classified according to the rules • According to the training result to select the top bigrams (or’s) • Query conjunction

Experiments • Training corpus • 3806 Q-A pairs • 338 question patterns, 95,926 answer passages • 45 questions as test corpus

Result

Basic information about workshop • Total: 12 published papers • 3 papers from USA (2 of them are from MSR, 1 of them from New York University) • 5 papers from Japan(3 of them are from ATR, 1 from Nagaoka U., 1 from Kyoto U.) • 2 papers from UK (The open University) • 1 paper from Australia (Macquarie U.) • 1 paper from China (HIT)

3 sessions • phrase-level • Automatic paraphrase discovery based on context and keywords between NE pairs • Sentence-level • Automatic generation of paraphrases to be used as translation references in objective evaluation measures of machine translation • discourse-level • Support vector machines for paraphrase identification and corpus construction

Automatic paraphrase discovery based on context and keywords between NE pairs • Author: Satoshi Sekine • Affiliation: New York University • Task: Aim to extract the phrases between two NEs as paraphrases

Overview • NE taggers (140 NE catatories, rule-based system) • Gather instances with NEs • C1-C2 domain with topic keywords using TF/ITF, and the same keywords are clustered together • Phrases linked individual NEs as paraphrases in the same domain

Experiments • 0.63 million instances with NE pairs • Total: 2,000 NE category pairs, 5184 keywords • 13,976 phrases with keywords

Results

Limitations • Just one keywords • Not using any structural information • The chunks number is less 5 between two NEs, can’t process long distance problem

Automatic generation of paraphrases to be used as translation references in objective evaluation measures of machine translation • Author: Yves Lepage and Etienne Denoual • Affiliation: ATR - Spoken language communication research labs • Task: To produce reference sentences using machine translation evaluation

Algorithm • Detection: find sentences which share a same translation in the multilingual resource • Generation: produce new sentences by exploiting commutations; limit combinatorics by contiguity constraints

Results • The lower the scores, the better the lexical and syntactical • variation

Support vector machines for paraphrase identification and corpus construction • Author: Chris Brockett and William B. Dolan • Affiliation: Natural Language Processing Group ,Microsoft Research • Task: Paraphrase Identification and Corpus Construction

Background • Paraphrasing  SMT • How to construct large scale paraphrase corpora, it’s a very hard task! • Annotated datasets • Using SVM to induce larger monolingual paraphrase corpora

Datasets • Randomly select 10,000 sentence pairs • Hand-tagged (1or 0) • 2968 positive and 7032 negative examples

Features • Total: 264,543 features • After filtering: less than 1000 features • String Similarity Features • Morphological Variants • WordNet Lexical Mappings • Word Association Pairs • Composite Features

Results

Outline • Paraphrasing in Main Conference • Paraphrasing in Workshop • Thought on Future Work of Paraphrasing • Harvest in IJCNLP • Pities in IJCNLP • Conclusions

The Biggest Harvest • Know many fellow people • Old • Young • Man • Women (few)

Other harvest • Beautiful prospect • Beautiful food • Beautiful show • Beautiful people

Pities in IJCNLP • Poor English hearing block communication • Korean has poorer English • So many mosquito • Few beautiful girls 

Conclusions • Know many peoples • Wide one’s views • Exercise one’s self-confidence • Grasp the newest research direction • Enjoy taking part in international conference!

Thanks!

IJCNLP2005-paraphrasing

IJCNLP2005-paraphrasing

Presentation Transcript

Paraphrasing

Paraphrasing

Paraphrasing

Paraphrasing

Paraphrasing

Paraphrasing

Paraphrasing

Paraphrasing

Paraphrasing

Paraphrasing

Paraphrasing

PARAPHRASING

Paraphrasing

Paraphrasing

Paraphrasing

Paraphrasing

paraphrasing

Paraphrasing

Paraphrasing

PARAPHRASING

Paraphrasing