120 likes | 267 Vues
This guide explores the concept of semantic entailment within natural language processing (NLP), highlighting various challenges such as paraphrasing, negation, presuppositions, and the reliance on world knowledge. It provides practical examples to illustrate these concepts, such as how similar sentences can convey the same meaning despite structural differences. Tools like MALLET and word matching techniques are discussed for improving system performance. The aim is to better understand and tackle the complexities associated with semantic entailment, ultimately striving for greater accuracy in NLP applications.
E N D
Semantic Entailment Nathaniel Story Ginger Buckbee Greg Lorge Billy Dean
What is it? Given sentence A, can you infer sentence B?
Challenges • Paraphrasing • Negation • Pre-Suppositions • World Knowledge • Juiciness
Paraphrasing Example • “There is a cat on the table.” • “A cat is on the table.” • Different structurally, but infers same meaning
Negation • “I am lazy” • “I am not lazy” • “I’m not unhappy” (Double negation) • “I’m happy” • “It’s not unnecessary” • “It’s necessary”
Pre-Suppositions • “Bob doesn’t think it’s raining” • “Bob doesn’t know it’s raining” • Conversational Pragmatics • Contextual knowledge
World Knowledge • “Japan is the only country that currently has an emperor.” • “Columbia doesn’t have an emperor.” • First sentence entails second, but you need to know that Columbia is a country.
Approach • Tools: • Stemmer • Parser from Dan Bikel’s site • MALLET (maxEnt classifier) • Wordnet (synset) • Focusing on Comparable Document task • Start with simple features like word matching, synonym matching • Add in more complicated functions like phrase structure comparisons • Test the system out, see how it works. Continue adding features to improve performance.
Data • Recognizing Textual Entailment Challenge (RTE) training data set • Training set is labeled • Best data set as was used in the European Competition
Evaluation • International Competition • Best ≈ 60% accuracy • Strive for >52% accuracy • Comparing against annotated test set • Improvement: Print out incorrect ones, then look for mistakes.
The End Questions?