1 / 20

Assigning Global Relevance Scores to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts. Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan , Gjergji Kasneci DESWeb 03/31/2014. Structured Data. Advantages of structured data over unstructured data: S earch for explicit facts

studs
Télécharger la présentation

Assigning Global Relevance Scores to DBpedia Facts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb 03/31/2014

  2. Structured Data • Advantages of structured data over unstructured data: • Search for explicit facts • Summarization of possibly interesting information • Automated knowledge discovery • Google Knowledge Graph • RDF Knowledge bases • DBpedia, YAGO/NAGA A handful of salient facts about the query entity. Assigning Global Relevance Scores to DBpedia Facts

  3. Querying YAGO • Asking for classes to which Albert Einstein belongs Assigning Global Relevance Scores to DBpedia Facts

  4. Querying DBpedia • Asking for classes to which Albert Einstein belongs Assigning Global Relevance Scores to DBpedia Facts

  5. Challenge select distinct ?p, ?o where { dbpedia:Barack_Obama ?p ?o} Web Documents Assigning Global Relevance Scores to DBpedia Facts

  6. Challenges Big Data DBpedia 3.8, ClueWebcorpus Architecture Text extraction, score computation/ranking, queryprocessing Ranking Strategies Imrovetherankingresults Evaluation Conductionofuserstudies Assigning Global Relevance Scores to DBpedia Facts

  7. Overview Web application (Django) • Languages • Python • Java • SPARQL • JavaScript • Frameworks: • Django • Lucene Ranking strategies Ranking strategies Intra DBpedia strategies Web Corpus strategies User Studies Querying Web corpus (Lucene Index) Application Data (Postgres) DBpediaEndpoint (Apache Jena) 6 Assigning Global Relevance Scores to DBpedia Facts

  8. Ranking Facts • Query types: • Subject queries - return all physicists • Property queries - return all facts related to Einstein • Ranking strategies • Ranking by frequency and document frequency • Ranking by information diversity • Random walk • Web-based co-occurrence statistics • SELECT ?s { ?stype Physicist } • SELECT ?p ?o{ Albert_Einstein?p ?o } Assigning Global Relevance Scores to DBpedia Facts

  9. Ranking by frequency and document frequency [Shady et al ESWC’11] subjectdocumentof „Albert Einstein“ predicatedocumentof „topic“ <Albert_Einstein> • <topic><Nobel_laureates>; • <topic><Theoretical_physicists>; • <topic><German_physicists>; • <topic><American_inventors>; <type> <Scientist>; <type> <Person>; <type> <Thing>; <residence> "Switzerland"; <residence> "Austria-Hungary"; <residence> "German Empire"; <spouse> "Mileva Maric"; ... • <Newton> <topic> <Theoretical_physicists>. • <Newton> <topic> <Nobel_laureates>. • <Newton> <topic> <Mathematicians>. • <Newton> <topic> <Optical_physicists>. • <Newton> <topic> <History_of_calculus>. • <Newton> <topic> <English_alchemists>. • <Einstein><topic> <Theoretical_physicists>. • <Einstein><topic> <Nobel_laureates>. • <Einstein><topic> <German_physicists>. • <Einstein><topic> <American_inventors>. objectdocumentof „Theoreticalphysicists“ • <Isaac_Newton> <topic> <Theoretical_physicists>. • <Albert_Einstein><topic> <Theoretical_physicists>. • <Bruno_Coppi> <topic> <Theoretical_physicists>. • <Ravi_Gomatam> <topic> <Theoretical_physicists>. ... Assigning Global Relevance Scores to DBpedia Facts

  10. Ranking by frequency and document frequency Isaac Newton academicAdvisor ...; birthDate ...; birthPlace ...; comment ...; ethnicity ...; field ...; influenced ...; influencedBy ...; knownFor ...; label ...; notableStudent ...; subject ...; subject ...; type ...; Ravi Gomatam subject ...; subject ...; subject ...; subject ...; subject ...; • Subject queries: • Global relevance Assigning Global Relevance Scores to DBpedia Facts

  11. Limitations for Property Queries • Property queries: • Global relevant but distinctive to the given subject • typePerson vs. typeScientist Assigning Global Relevance Scores to DBpedia Facts

  12. Ranking by diversity • Following a probabilistic model • Property queries: • Properties and objects that are as discriminative as possible • Subject queries: Assigning Global Relevance Scores to DBpedia Facts

  13. Random Walk Model • Consider the knowledge base as a directed graph • Already applied in [Kasneci CIKM’09] • Problem: literals have no outgoing link • Use Wiki Pagelinks and Infobox Property Mappings • Entities with high indegree, such as countries, are favored • Good for subject queries • Bad for property queries Assigning Global Relevance Scores to DBpedia Facts

  14. Co-occurrence statistics Web Documents • Lemur Project Clueweb09 Category-B web corpus • 50 million web documents (1.5 TB) • Only English-language documents • Includes approx. 2.7 million Wikipedia articles • Create an inverted index • Consider different word distance limits as documents • Rank subject-object pairs • „Albert Einstein“ and „Physicist“ • Store only pairwise co-occurrence: • Compute frequency of s: Assigning Global Relevance Scores to DBpedia Facts

  15. Evaluation • User study 1 • 8 queries • all results • 12 users • 19 approaches/ configurations • 1-4: irrelevant- highly relevant • User study 2 • 8+20 queries • top-10 results of best 4 approaches side-by-side 10 users • Best 3 approaches from user study 1 Assigning Global Relevance Scores to DBpedia Facts

  16. Top 4 Approaches in User study 1 Assigning Global Relevance Scores to DBpedia Facts

  17. User study 2 Assigning Global Relevance Scores to DBpedia Facts

  18. Results Example:Theoretical Physicists DBpedia Random Walk Model Assigning Global Relevance Scores to DBpedia Facts

  19. Results Example: Albert Einstein • DBpedia Co-occurrence statistics Assigning Global Relevance Scores to DBpedia Facts

  20. Conclusions • Investigated multiple approaches to rank DBpedia facts • Information theory, statistical reasoning, random walk, and co-occurrence statistics in web documents • DBpedia Knowledge base already provides enough information to improve the ranking of results • Improvement of property queries through web-based co-occurrence statistics • We provide the annotated datasets at • https://www.hpi.uni-potsdam.de/naumann/sites/dbpedia/ Assigning Global Relevance Scores to DBpedia Facts

More Related