170 likes | 309 Vues
Web Based Probabilistic Textual Entailment. Oren Glickman, Ido Dagan and Moshe Koppel Bar Ilan Univ. Classical Entailment Definition. A text t entails an hypothesis h if h is true in every circumstance (possible world) in which t is true i.e., the truth of t implies the truth of h.
E N D
Web Based Probabilistic Textual Entailment Oren Glickman,Ido Dagan and Moshe Koppel Bar Ilan Univ.
Classical Entailment Definition • A text t entails an hypothesis h if h is true in every circumstance (possible world) in which t is true • i.e., the truth of t implies the truth of h
Probabilistic Entailment • Example 312: • (t) Gandhi can be defeated in the next elections in India if between now and 2009, BJP can make Rural India Shine. • (h) Next elections in India will take place in 2009. • tdoes not entail h (in the classical sense) • Then why is it annotated as True?!
Rational • Example 312: • (t) Gandhi can be defeated in the next elections in India if between now and 2009, BJP can make Rural India Shine. • (h) Next elections in India will take place in 2009. • t does add substantial information about the correctness of h • Given that t was stated we’d expect that h is most likely true
A Probabilistic Space • T: The set of all texts • H: The set of all hypotheses • propositional statements which can be assigned a truth value • w: a possible world • truth assignment (to {0=False, 1=True}) for all hypotheses • W - the set of all possible worlds (2H)
A Generative Model We assume a probabilistic generative model: • At each generation event a text is produced along with a (hidden) possible world • based on a probability distribution over T W.
Probabilities • For a given text t and hypothesis h, we consider the following probabilities: • P(Trh=1) = P(h is assigned a truth value of 1) • P(Trh=1| t) = P(h is assigned a truth value of 1 given that the generated text is t)
Textual entailment relationship Definition: • t probabilistically entails h if: • P(Trh = 1| t) > P(Trh= 1) (≡ positive PMI) • t increases the likelihood of h being true
Lexical Entailment • Are the individual terms in h entailed from t • not necessarily holding the right relations • Example #2070: • (t) The Queen of Holland is now owned by Robert Mouawad. • (h) Robert Mouawad is the Queen of Holland.
A Probabilistic Lexical Model • Goal: capture lexical co-occurrence statistics • Assumption 1: Independent lexical truth assignments • Assumption 2: Alignment Iv -- the event that a generated text contains v
Estimating Lexical Entailment Probabilities from the Web • web documents -- sample generated by source • Problem: • Truth assignments not observed • Assumption 3: • Term is true iff appears in document • P(Tru=1|Iv) = P(Iu|Iv) • co-occurrence counts from search engine
Challenge Submission • Tokenize text and remove stop words • Collect counts from AltaVista • Classification: • p = P(Trh = 1| t) • t h if p > λ ; conf = p • Conf = 1-p for negative examples • λ tuned on dev set
Resulting Alignments • Some good: Japan Japanese, voter vote • Some dubious: turnout half, percent less
Precision-Recall • High confidence low precision!!
Did the probs help? Baseline: P(w1|w2) = { 1 w1=w2 ; 0otherwise
Conclusions • Defined probabilistic setting – as needed for modeling probabilistic entailment • Proposing: t probabilistically entails h if it increases the likelihood that h is true • A concrete probabilistic model • incorporating word co-occurrence statistics • based on the proposed setting • The simple model performs as well as more complex systems!