1 / 31

LEILA – Learning to Extract Information by Linguistic Analysis

LEILA – Learning to Extract Information by Linguistic Analysis. presented at the 2 nd Workshop on Ontology Learning and Population (OLP2). Fabian M. Suchanek , Georgiana Ifrim, Gerhard Weikum (Max-Planck Institute for Computer Science Saarbrücken/Germany). Overview. ر Motivation

ginata
Télécharger la présentation

LEILA – Learning to Extract Information by Linguistic Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LEILA – Learning to Extract Information by Linguistic Analysis presented at the 2nd Workshop on Ontology Learning and Population (OLP2) Fabian M. Suchanek, Georgiana Ifrim, Gerhard Weikum (Max-Planck Institute for Computer Science Saarbrücken/Germany) LEILA - Learning to Extract Information by Linguistic Analysis

  2. Overview ر Motivation ر The LEILA System ر Plan of Attack ر System Architecture ر Experiments ر Conclusion LEILA - Learning to Extract Information by Linguistic Analysis

  3. Motivation Meat dish Google Search I'm feeling hungry ? This page has been created to enlighten the public about the Wiener Schnitzel. [...] LEILA - Learning to Extract Information by Linguistic Analysis

  4. Motivation To know that a Schnitzel is a meat dish, we need an ontology. رUse hand-crafted ontologies (like WordNet) (but: low coverage, high cost, fast aging) ر Or: Gather ontological data from Web documents LEILA - Learning to Extract Information by Linguistic Analysis

  5. Goal Given ر a binary target relation (e.g. subclassOf) ر a set of Web documents extract all pairs of entities that are in the target relation LEILA - Learning to Extract Information by Linguistic Analysis

  6. Related Work Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll) X is a Y A Schnitzel is a meat dishfrom Austria. LEILA - Learning to Extract Information by Linguistic Analysis

  7. Related Work Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll) X is a Y A Schnitzel, also called Wiener Schnitzel, is a meat dish. LEILA - Learning to Extract Information by Linguistic Analysis

  8. Related Work Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll) ┌──────Subject───────────┐┌Obj─┐ A Schnitzel, also called Wiener Schnitzel, is a meat dish. Idea: Learn linguistic patterns! LEILA - Learning to Extract Information by Linguistic Analysis

  9. Plan of Attack subclassOf (Output pairs) (Web documents) (Target relation) LEILA - Learning to Extract Information by Linguistic Analysis

  10. Preprocessing subclassOf (Output pairs) (Web documents) (Target relation) The Schnitzel (0.0314946089 stones) is best enjoyed with Ösibräu. LEILA - Learning to Extract Information by Linguistic Analysis

  11. Preprocessing subclassOf (Output pairs) (Web documents) (Target relation) The Schnitzel (200g) is best enjoyed with Ösibräu. LEILA - Learning to Extract Information by Linguistic Analysis

  12. Preprocessing subclassOf (Output pairs) (Web documents) (Target relation) The Schnitzel (200g) is best enjoyed with Oesibraeu. LEILA - Learning to Extract Information by Linguistic Analysis

  13. Preprocessing subclassOf (Output pairs) (Web documents) (Target relation) The Schnitzel is best enjoyed with Oesibraeu. The Schnitzel ( 200 g ) LEILA - Learning to Extract Information by Linguistic Analysis

  14. Preprocessing subclassOf participle mod comp det subj adv The Schnitzel is best enjoyed with Oesibraeu. adj adj adj adj adj The Schnitzel ( 200 g ) LEILA - Learning to Extract Information by Linguistic Analysis

  15. Preprocessing subclassOf (Output pairs) (Web documents) (Target relation) LEILA - Learning to Extract Information by Linguistic Analysis

  16. Algorithm + - (Output pairs) (Seed pairs) (Web documents) A dog is a mammal. LEILA - Learning to Extract Information by Linguistic Analysis

  17. Algorithm + - (Output pairs) (Seed pairs) (Web documents) A X is a Y. This dog is a nag. (Positive patterns) LEILA - Learning to Extract Information by Linguistic Analysis

  18. Algorithm + - (Output pairs) (Seed pairs) (Web documents) A X is a Y. This X is a Y. (Positive patterns) (Negative patterns) LEILA - Learning to Extract Information by Linguistic Analysis

  19. Algorithm + - (Output pairs) (Seed pairs) (Web documents) A Schnitzel is a meat dish. A X is a Y. (Generalized positive patterns) LEILA - Learning to Extract Information by Linguistic Analysis

  20. LEILA: System Architecture (Output pairs) (Seed pairs) (Web documents) Seed pair data sets LEILA kNN Learner Preprocessing, stemming LinkParser (Sleator, CMU) SVMLight (Joachims, Cornell U) LEILA - Learning to Extract Information by Linguistic Analysis

  21. Gold Standard for Evaluation (Output pairs) (Web documents) (Target relation) Schnitzel meat dish A Schnitzel is practically vitamin-free and thus the meat dish is extremely popular in Europe. (Ideal pairs) LEILA - Learning to Extract Information by Linguistic Analysis

  22. Results with different relations birthDate Seed pairs are given by a function that decides whether a word pair is ر an example (here: list of birth dates from www.famousbirthdays.com) ر a counterexample (here: can be deduced from examples) ر a candidate (here: all pairs of a name and a date) LEILA - Learning to Extract Information by Linguistic Analysis

  23. Results with different relations Target Relation Corpus Precision Recall birthDate Wikip composers 79%8% 70%9% Patterns: X (born in Y) X was born in Y ... (see paper for details on the experiments) LEILA - Learning to Extract Information by Linguistic Analysis

  24. Results with different relations Target Relation Corpus Precision Recall birthDate Wikip composers 79%8% 70%9% synonymy Wikip geography 73%7% 64%7% Examples: all WordNet synsets Counterexamples: all words that are not in a synset Candidates: all pairs of proper names Patterns: X or Y, X (or Y), ... LEILA - Learning to Extract Information by Linguistic Analysis

  25. Results with different relations Target Relation Corpus Precision Recall birthDate Wikip composers 79%8% 70%9% synonymy Wikip geography 73%7% 64%7% instanceOf Wikip composers 58%3% 41%3% Examples: all direct WordNet hyponyms Counterexamples: all words that are not hyponyms of each other Candidates: all pairs of a proper name and a WordNet concept Patterns: an X is a Y, X is unusual among the Y,... LEILA - Learning to Extract Information by Linguistic Analysis

  26. Results with different relations Target Relation Corpus Precision Recall birthDate Wikip composers 79%8% 70%9% synonymy Wikip geography 73%7% 64%7% instanceOf Wikip composers 58%3% 41%3% Wikip random 33%3% 33%3% Google composers 28%3% 17%2% (see paper for details on the experiments) LEILA - Learning to Extract Information by Linguistic Analysis

  27. Results with different competitors (Results in %, LEILA in red) Precision Recall Precision Recall Precision Recall Precision Recall 90 58 58 50 50 41 41 39 32 34 32 30 26 22 15 4 2 4 Snowball TextToOnto,Text2Onto CV-System CV-System headquarters instanceOf instanceOf instanceOf Snowball’s corpus Wikip composers CV’s corpus Wikip composers (see paper for explanations, conditions and details!) LEILA - Learning to Extract Information by Linguistic Analysis

  28. Conclusion Our system LEILA ر can learn arbitrary binary relations from Web documents ر uses a deep linguistic analysis ر compares favorably with other systems See http://www.mpi-inf.de/~suchanek LEILA - Learning to Extract Information by Linguistic Analysis

  29. Results with different competitors System Relation Corpus Precision Recall Snowball headquarters Snowball’s 34%8% 30%7% Snowball’s 90%6% 50%7% LEILA headquarters TextToOnto instanceOf Wikip composers 39%9% 4%1% Text2Onto instanceOf Wikip composers 50% 2%1% LEILA instanceOf Wikip composers 58%3% 41%3% CV’s 32%5% 32%5% CV-System instanceOf LEILA 26%7% 15%4% instanceOf CV’s instanceOf Wikip composers 22% 4%2% CV-System LEILA instanceOf Wikip composers 58%3% 41%3% (see paper for explanations, conditions and details!) LEILA - Learning to Extract Information by Linguistic Analysis

  30. Pattern Generalization – kNN A X is a big Y + A X is a Y. - This X is a Y. + X such as Y (See our paper at KDD for details) LEILA - Learning to Extract Information by Linguistic Analysis

  31. Pattern Generalization – SVM + A X is a big Y A X is a Y. + - This X is a Y. + X such as Y - + (See our paper at KDD for details) LEILA - Learning to Extract Information by Linguistic Analysis

More Related