Tree Mining and Textual Entailment

Sophia Katrenko AID project Human Computer Studies Laboratory, IvI katrenko@science.uva.nl Tree Mining and Textual Entailment

Outline • Task statement • Tree mining: methods • Why textual entailment? • Experiments • Discussion

Task statement • While learning ontologies, we focus on two subtasks, concept (which can be represented as finding a hypothesis h, such that for a given data set X classes Y are obtained) and relation learning

Methods to learn instances & relations from text • Make use of a context (hand-written patterns, statistically significant co-occurrence of terms, etc.) (Ch. Khoo) • Sometimes use background knowledge (Cl. Nédellec) • Look for the certain syntactic relations (e.g., subject-verb-object) (M.-L. Reinberger)

From another perspective… Complex structure • What do these two pictures have in common? (Scottish handwriting (17th century)) Complex structure!

Text as a complex structure • Text can be analyzed on various levels, including semantic, syntactic, morphological etc. • From the syntactic point of view, text can be represented either by constituency structure or by dependency trees. Constituency: [s [John] [bought] [a new car]] Dependency: bought car john a new

Motivation • Provided the syntactically analyzed text where each sentence is represented by its dependency tree, it is possible to use the same approach both, to the concept learning and relation learning • While working on tree structures, it is necessary to discover which types of subtrees contribute at most to the learning task

“…for the presence of the bitter compounds in fractions F15, F17 …” Mining on trees: example presence Frequent subtrees as local context are useful for the concept learning Combinations of frequent subtrees are crucial for relation discovery of compounds in F15 the bitter F17 , fractions

Tree 1 Tree 2 Idea:trees can be compared in order to find highly similar structures Motivation Tree mining is an intermediate step which allows for the frequent subtree discovery While looking for the most frequent subtrees, we can relax the restrictions on how similar two subtrees should be Step further…Trees: why mining? A A H C C B L F D F E D

Experiments: textual entailment • Textual entailment recognition is the task of deciding, given two text fragments, whether the meaning of one text is entailed (can be inferred) from another text <pair id="1989" value="FALSE" task="CD"> <t>Intel has decided to push back the launch date for its 4-GHz Pentium 4 desktop processor to the first quarter of 2005.</t> <h>Intel would raise the clock speed of the company’s flagship Pentium 4 processor to 4 GHz.</h></pair> <pair id="1652" value="TRUE" task="IE"> <t>Rome is in Lazio province and Naples in Campania.</t> <h>Rome is located in Lazio province.</h></pair>

Data Dependency parsing Depth first search (DFS, preorder) Rooted ordered induced tree mining Setting thresholds Evaluation Methodology

Data • Data sets come from the “Recognizing Textual Entailment” challenge (http://www.pascal-network.org/Challenges/RTE/ ) • Development sets consist of 279 and 284 pairs of sentences each • Test set contains 792 instances

Data preprocessing • Each pair of sentences has been parsed by Minipar (Dekang Lin) • Each dependency tree has been transformed by incorporating edge labels into node labels • Each transformed tree has been presented in preorder (or DFS)

Thresholds • Provided two sentences (trees, consequently) S1 and S2 where =|S1| and =|S2|, let the size of the rooted maximal induced tree be . I define the similarity score as a ratio

Does it work? or Distintion between TE and concept/relation learning • TE relies on the matching between two trees, whereas for the concept and relation learning it is important to look for the frequent tree patterns • It is crucial to have a large collection for the latter. For TE, the size of the collection does not play a significant role

Possible extensions for TE • To use the synonyms/antonyms from WordNet • Propose a method of handling situation where there are several maximal subtrees • Discard syntactic functions for the mining. In most cases, the combination of a word&syntactic function in one label leads to the data sparseness • Combine lexical entailment (e.g., word similarity) with methods using syntactic information

Bedankt!

Tree Mining and Textual Entailment

Tree Mining and Textual Entailment

Presentation Transcript

Approximating Textual Entailment with LFG and FrameNet Frames

Knowledge Representation and Inference Models for Textual Entailment

Textual Entailment as a Framework for Applied Semantics

Textual Entailment: A Perspective on Applied Text Understanding

Recognizing Textual Entailment

From Textual Entailment to Knowledgeable Machines

Recognizing Textual Entailment using UNL framework

Textual Entailment, QA4MRE, and Machine Reading

Third Recognizing Textual Entailment Challenge

EVALITA 2009 Recognizing Textual Entailment (RTE) Italian Chapter

Textual Entailment

Recognizing Textual Entailment Challenge PASCAL

Textual Entailment

Relation Alignment for Textual Entailment Recognition

Baselines for Recognizing Textual Entailment

Web Based Probabilistic Textual Entailment

Approximating Textual Entailment with LFG and FrameNet Frames

Recognizing Textual Entailment with LCC’s Groundhog System

Textual entailment inference in machine translation

TEXTUAL ENTAILMENT Lliçons sobre un desastre anunciat

Recognizing Textual Entailment using the UNL framework

Textual entailment inference in machine translation