1 / 29

Extracting Instances of Relations from Web Documents using Redundancy

Extracting Instances of Relations from Web Documents using Redundancy. Victor de Boer OLP-AIO’s Workshop March, 16 th , 2005. Outline. Introduction/Recap Relation Instantiation Task My approach: Redundancy-based Extracting Artists Extracting Periods Future Research

roza
Télécharger la présentation

Extracting Instances of Relations from Web Documents using Redundancy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extracting Instances of Relations from WebDocuments using Redundancy Victor de Boer OLP-AIO’s Workshop March, 16th, 2005

  2. Outline • Introduction/Recap • Relation Instantiation Task • My approach: Redundancy-based • Extracting Artists • Extracting Periods • Future Research • Questions / Discussion

  3. Intro and Research Questions • How can we automatically construct, enrich and populate ontologies using heterogeneous sources on the Web?

  4. OLP • Ontology Learning: • Concepts: • NERC, LSI, … • Hierarchical Structure • Hearst Patterns,… • Other relations • Ontology Population • Instances • Relation Instances • Ontology Enrichment C1 C3 C2 C4 I1 I3 I2

  5. Relation Instantiation • We have: • two Concepts C1 and C2, • a relation R(C1,C2) • and instances I1 of C1 and I2 of C2. • Find for which instances the relation r holds. • Examples: • <Countries, has_city, City> • <Movie, has_director, Director> • <Artstyle, has_artist, Artist> • Information Extraction!

  6. Approaches • Current approaches are not generic enough • Goal of my approach: • A generic method, applicable to heterogeneous sources. Redundancy of information should do the rest.

  7. Redundancy Method: Outline

  8. Extracting Artists • MultimediaN e-culture Project • Art and Architecture Thesaurus (AAT) • Unified List of Artist Names (ULAN) • Relation: <aat:style, aua:has_artist, ulan:artist> • Find instances of this relation e-culture: Has_artist

  9. Extracting Artists

  10. Extracting Artists 200 docs

  11. Extracting Artists Person Name Extractor (CUTE) Match against ULAN: Artists 200 docs

  12. Extracting Artists Person Name Extractor (CUTE) Match against ULAN: Artists Tuples: <Ulan Artists, Doc> 200 docs

  13. Extracting Artists Person Name Extractor (CUTE) Instance Score Document Score Match against ULAN: Artists Tuples: <Ulan Artists, Doc> 200 docs

  14. Experiments (ESWC 2006 submission) • Two Art Styles • ‘aat:Expressionism’ • ‘aat:Impressionism’ • Evaluation: Gold Standard extracted from 11 encyclopedic webpages • Three chosen as seeds • Resulting Ordered List • Precision/Recall/F-graph

  15. Results • Max value of F is 0.70 • recall=0.56 • precision=0.94 • Threshold=0.0012 aat:Expressionism aat:Impressionism • Max value of F =0.76 • recall = 0.73 • precision = 0.79 • Threshold= 0.0084

  16. Iterative

  17. ECAI ’06 Experiments • 12 Art Styles, only iterative • No Gold Standard: only precision • Indication of Iteration stop: • Percentage of max • Maximum nr of extractions

  18. ECAI ’06 Experiments • 12 Art Styles, only iterative • No Gold Standard: only precision • Indication of Iteration stop: • Percentage of max • Maximum nr of extractions • At 30% and max=20 • Dada: 1.0 • Expr: 0.85 • Impr: 0.75 • Table of values vs average Precision

  19. Artstyle-Periods • Same type of approach: • Extract a lot of instances from WWW and rank them according to some. • In this case: extract years and do postprocessing to end up with periods • Steps: • Retrieve 1000 pages for an artstyle (Google) • Extract years (reg.exp.) • Normalize (Google)

  20. The data: Impressionism

  21. Gaussian Mu= 1889.125626 SD= 53.94131969

  22. Baroque

  23. Gaussian Mu= 1661.79996 SD=66.88810033

  24. More Results

  25. More results

  26. Future Research • Artists • Complete evaluation • Threshold? • Values = Statistics? • More domains • Dates • Improve • Integrate in method • Gauss, Block, Fuzzy? • How does this relate to Ontological Knowledge?

  27. More Future • Integrate knowledge from different, heterogeneous sources. • What is the style of a painting X? • X was painted by Y • Y is associated with art styles A,B,C • A = period I1, B = I2, C= I3 • X is painted in year T • T e I2,-> <X has_style B> • Generic Method

  28. W W W Ontological knowledge Statistics, uncertainty, fuzzyness information integration?

  29. end

More Related