1 / 18

Tree Mining and Textual Entailment

Sophia Katrenko AID project Human Computer Studies Laboratory, IvI katrenko@science.uva.nl. Tree Mining and Textual Entailment. Outline. Task statement Tree mining: methods Why textual entailment? Experiments Discussion. Task statement.

milla
Télécharger la présentation

Tree Mining and Textual Entailment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sophia Katrenko AID project Human Computer Studies Laboratory, IvI katrenko@science.uva.nl Tree Mining and Textual Entailment

  2. Outline • Task statement • Tree mining: methods • Why textual entailment? • Experiments • Discussion

  3. Task statement • While learning ontologies, we focus on two subtasks, concept (which can be represented as finding a hypothesis h, such that for a given data set X classes Y are obtained) and relation learning

  4. Methods to learn instances & relations from text • Make use of a context (hand-written patterns, statistically significant co-occurrence of terms, etc.) (Ch. Khoo) • Sometimes use background knowledge (Cl. Nédellec) • Look for the certain syntactic relations (e.g., subject-verb-object) (M.-L. Reinberger)

  5. From another perspective… Complex structure • What do these two pictures have in common? (Scottish handwriting (17th century)) Complex structure!

  6. Text as a complex structure • Text can be analyzed on various levels, including semantic, syntactic, morphological etc. • From the syntactic point of view, text can be represented either by constituency structure or by dependency trees. Constituency: [s [John] [bought] [a new car]] Dependency: bought car john a new

  7. Motivation • Provided the syntactically analyzed text where each sentence is represented by its dependency tree, it is possible to use the same approach both, to the concept learning and relation learning • While working on tree structures, it is necessary to discover which types of subtrees contribute at most to the learning task

  8. “…for the presence of the bitter compounds in fractions F15, F17 …” Mining on trees: example presence Frequent subtrees as local context are useful for the concept learning Combinations of frequent subtrees are crucial for relation discovery of compounds in F15 the bitter F17 , fractions

  9. Tree 1 Tree 2 Idea:trees can be compared in order to find highly similar structures Motivation Tree mining is an intermediate step which allows for the frequent subtree discovery While looking for the most frequent subtrees, we can relax the restrictions on how similar two subtrees should be Step further…Trees: why mining? A A H C C B L F D F E D

  10. Experiments: textual entailment • Textual entailment recognition is the task of deciding, given two text fragments, whether the meaning of one text is entailed (can be inferred) from another text <pair id="1989" value="FALSE" task="CD"> <t>Intel has decided to push back the launch date for its 4-GHz Pentium 4 desktop processor to the first quarter of 2005.</t> <h>Intel would raise the clock speed of the company’s flagship Pentium 4 processor to 4 GHz.</h></pair> <pair id="1652" value="TRUE" task="IE"> <t>Rome is in Lazio province and Naples in Campania.</t> <h>Rome is located in Lazio province.</h></pair>

  11. Data Dependency parsing Depth first search (DFS, preorder) Rooted ordered induced tree mining Setting thresholds Evaluation Methodology

  12. Data • Data sets come from the “Recognizing Textual Entailment” challenge (http://www.pascal-network.org/Challenges/RTE/ ) • Development sets consist of 279 and 284 pairs of sentences each • Test set contains 792 instances

  13. Data preprocessing • Each pair of sentences has been parsed by Minipar (Dekang Lin) • Each dependency tree has been transformed by incorporating edge labels into node labels • Each transformed tree has been presented in preorder (or DFS)

  14. Thresholds • Provided two sentences (trees, consequently) S1 and S2 where =|S1| and =|S2|, let the size of the rooted maximal induced tree be . I define the similarity score as a ratio

  15. Does it work? or Distintion between TE and concept/relation learning • TE relies on the matching between two trees, whereas for the concept and relation learning it is important to look for the frequent tree patterns • It is crucial to have a large collection for the latter. For TE, the size of the collection does not play a significant role

  16. Possible extensions for TE • To use the synonyms/antonyms from WordNet • Propose a method of handling situation where there are several maximal subtrees • Discard syntactic functions for the mining. In most cases, the combination of a word&syntactic function in one label leads to the data sparseness • Combine lexical entailment (e.g., word similarity) with methods using syntactic information

  17. Bedankt!

More Related