Third Recognizing Textual Entailment Challenge

Third Recognizing Textual Entailment Challenge Potential SNeRG Submission

RTE3 Quick Notes • RTE Web Site: http://www.pascal-network.org/Challenges/RTE3/ • Textual Entailment resource pool: http://aclweb.org/aclwiki/index.php?title=Textual_Entailment_Resource_Pool • New development set released to correct errors last week • Test set released on March 5th • !!! New !!! submission date March 12th • Report deadline date March 26th

Development set examples • Example of a YES result <pair id="5" entailment="YES" task="IE" length="short"> <t>A bus collision with a truck in Uganda has resulted in at least 30 fatalities and has left a further 21 injured.</t> <h>30 die in a bus collision in Uganda.</h> </pair> • Example of a NO result <pair id="20" entailment="NO" task="IE" length="short"> <t>Blue Mountain Lumber is a subsidiary of Malaysian forestry transnational corporation, Ernslaw One.</t> <h>Blue Mountain Lumber owns Ernslaw One.</h> </pair>

Development set examples – cont. • 4 Different types of entailment tasks • Information Retrieval (IR) • Question Answering (QA) • Information Extraction (IE) • Multi-document summarization (SUM) • Development set consists of 200 samples of each type of entailment • 400 evaluate to “YES” and 400 to “NO” • Another attribute “length” in the development set has only 134 long and 666 short. [Note to self: gather a group of demon hunters to hunt down the short samples, will need volunteers and holy water.]

Evaluation • Two submissions per team can be made • Program output is a file that contains the following information. Line 1 must contain: “ranked:<blank space>yes/no” Line 2..end contains: “pair_id<blank space>judgment “ For example: ranked: yes 4 YES 3 YES 6 YES 1 NO 5 NO 2 NO • Accuracy is calculated from the answers returned correct • Precision is determined by the order and the correctness of the answers returned by the formula: 1/R * sum for i=1 to n (E(i) *#-correct-up-to-pair-i/i) n is the number of the pairs in the test set R is the total number of positive pairs in the test set E(i) is 1 if the i-th pair is positive and 0 otherwise and i ranges over the pairs, ordered by their ranking.

Possible Implementation • Discover features that can be measured with a continuous variable For example: • Wordbag match ratio = # of words matched between text and hypothesis / # of words in the hypothesis • Arrange feature values in a feature vector x • Apply the general multivariate normal density for the assembled feature vector x

Implementation to Determine Baseline • I have done an implementation to determine the baseline of what we can expect out of a full implementation of all syntactic features • First baseline result: Used 1 feature: Wordbag count > n, where n is decided after development set is processed Success: 509 , Fail: 290 Final rate: 63.9% • Second baseline result: Used simple preprocessing and Wordbag count: removing punctuation, case insensitivity, ignoring simple words Success: 534 , Fail: 265 Final rate: 66.8% • Attempted a little semantic processing, like increasing weight based on “negative” words for returning negative results, but results did not increase • In RTE2 competition highest accuracy was only 70%!

Potential Features • Wordbag ratio = # of matches between text and hypothesis / # of words in hypothesis Works for: <t>A bus collision with a truck in Uganda has resulted in at least 30 fatalities and has left a further 21 injured.</t> <h>30 die in a bus collision in Uganda.</h> Wordbag ratio = 6 / 8 Fails for: <t>Blue Mountain Lumber is a subsidiary of Malaysian forestry transnational corporation, Ernslaw One.</t> <h>Blue Mountain Lumber owns Ernslaw One.</h> Wordbag ratio = 5 / 6 • Potential solution needs to include processing semantic knowledge about the relationship between the highlighted red words.

Potential Features – cont. • Word proximity = average distance between matched words in the text For example: <t>A bus collision with a truck in Uganda has resulted in at least 30 fatalities and has left a further 21 injured.</t> <h>30 die in a bus collision in Uganda.</h> Matched words: 30 in bus collision in Uganda 30: 3, 12, 11, 3, 6 in: 3, 5, 4, 1 bus: 12, 5, 1, 5, 6 collision: etc… • May not help much or at all, but by adding additional independent features (from a gaussian distribution), we can potentially increase the P(wn|x)

Potential Features – cont. • Word grouping = average of counts of word groups of length 2 / possible combos For example: <t>A bus collision with a truck in Uganda has resulted in at least 30 fatalities and has left a further 21 injured.</t> <h>30 die in a bus collision in Uganda.</h> Matched groups: “bus collision”, “in Uganda”, 7 possible combinations = 2/7 <t>Blue Mountain Lumber is a subsidiary of Malaysian forestry transnational corporation, Ernslaw One.</t> <h>Blue Mountain Lumber owns Ernslaw One.</h> Matched groups: “Blue Mountain”, “Mountain Lumber”, “Ernslaw One”, 5 combinations = 3/5 • Once again this may not help much or at all, but may help us brainstorm a bit

Potential Features – cont. • Quick and easy stats we can generate may include using • Stemmers – count matching verbs? • Synonyms/Antonyms – count any matches for both types • Parts of speech - brainstorm anyone? • Removal or weighting of names, place-names – make a multiple word “match” into a single symbol so as not to give extra weight to names or place-names • Matching phrases that appear similar in both the text and the hypothesis • Any “count” that can be created from any processing of semantic or syntactic information would be able to be used • I am now using Matlab to implement, so any Unix program can be used to process a feature – maybe there is some existing feature extraction Unix command-line program that someone knows about

RTE3 Important Dates • Test set released on March 5th • Gives us 10 days before we can submit • Last day to submit is March 12th • Submission consists of running the data yourself and then submitting the result file • A cheater says whaaaa? • Technical report deadline March 26th • I will be working on this on and off until March 6th, then I can devote full time for our submission

Third Recognizing Textual Entailment Challenge

Third Recognizing Textual Entailment Challenge

Presentation Transcript

Recognizing Textual Entailment

From Textual Entailment to Knowledgeable Machines

Recognizing Textual Entailment using UNL framework

RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Textual Entailment, QA4MRE, and Machine Reading

Textual Entailment

Recognizing Textual Entailment Challenge PASCAL

Textual Entailment

Relation Alignment for Textual Entailment Recognition

Baselines for Recognizing Textual Entailment

Web Based Probabilistic Textual Entailment

Recognizing Textual Entailment with LCC’s Groundhog System

Tree Mining and Textual Entailment

Textual entailment inference in machine translation

TEXTUAL ENTAILMENT Lliçons sobre un desastre anunciat

The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3

Overview of the Fourth Recognising Textual Entailment Challenge

Recognizing Textual Entailment using the UNL framework

CS626-449: Lecture 29 Recognizing Textual Entailment using the UNL framework

Textual entailment inference in machine translation

RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Incorporating Discourse Information within Textual Entailment Inference