1 / 40

NAGA: Searching and Ranking Knowledge

NAGA: Searching and Ranking Knowledge. Gjergji Kasneci Joint work with: Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum. Motivation. Example queries Which politicians are also scientists? Which gods do the Maya and the Greeks have in common?

kayo
Télécharger la présentation

NAGA: Searching and Ranking Knowledge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NAGA: Searching and Ranking Knowledge Gjergji Kasneci Joint work with: Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum

  2. Motivation • Example queries • Which politicians are also scientists? • Which gods do the Maya and the Greeks have in common? • Keyword queries are too weak to express advanced user intentions such as • concepts, • entity properties • relationships between entities

  3. Motivation

  4. Motivation

  5. Motivation • Keyword queries are too weak to express advanced user intentions such as • concepts, • entity properties • relationships between entities • Data is not knowledge. • Data extraction and organization needed

  6. Query: Results: Benjamin Franklin Paul Wolfowitz Angela Merkel … isA isA Scientist Politician $x

  7. Greek god Query: Results: Thunder god Wisdom god Agricultural god … type $z type Mayan god $y type type $x

  8. SYSTEMS Web  Universal knowledge NAGA Question Answering & Ranking START Katz et al. TREC 2005 TextRunner Banko et al. IJCAI 2007 Ex DBMS Cafarella et al. CIDR 2007 ALICE Banko et al. K-CAP 2007 Question Answering Information Extraction KnowItAll Etzioni et al. WWW 2004 YAGO Suchanek et al. WWW 2007 BLINKS He et al. SIGMOD 2007 Entity Search Cheng et al. CIDR 2007 Semantic Database (Relational Database, XML(XLinks), RDF) Entity (Keyword) (Proximity) Search Ranking Libra Nie et al. WWW 2007 DISCOVER Histridis et al. VLDB 2002 BANKS Bhalotia et al. ICDE 2002 Querying

  9. Outline • Framework • Data model • Query language • Ranking model • Evaluation • Setting • Metrics • Results

  10. Framework (Data model) Excerpt from YAGO: Suchanek et al. WWW 2007 • Entity-relationship (ER) graph • Node label : entity • Edge label : relation • Edge weight : relation “strength” • Fact • Represented by an edge • Evidence pages for a fact f • Web pages from which f was derived • Computation of fact confidence (i.e. edge weights) : locatedIn Max Planck Institute Germany type type type

  11. Framework (Data model) Excerpt from YAGO: Suchanek et al. WWW 2007 • Entity-relationship (ER) graph • Node label : entity • Edge label : relation • Edge weight : relation “strength” • Fact • Represented by an edge • Evidence pages for a fact f • Web pages from which f was derived • Computation of fact confidence (i.e. edge weights) : locatedIn Max Planck Institute Germany type type type

  12. Framework (Data model) Excerpt from YAGO: Suchanek et al. WWW 2007 • Entity-relationship (ER) graph • Node label : entity • Edge label : relation • Edge weight : relation “strength” • Fact • Represented by an edge • Evidence pages for a fact f • Web pages from which f was derived • Computation of fact confidence (i.e. edge weights) : locatedIn Max Planck Institute Germany type type type

  13. Framework (Data model) Excerpt from YAGO: Suchanek et al. WWW 2007 • Entity-relationship (ER) graph • Node label : entity • Edge label : relation • Edge weight : relation “strength” • Fact • Represented by an edge • Evidence pages for a fact f • Web pages from which f was derived • Computation of fact confidence (i.e. edge weights) : locatedIn Max Planck Institute Germany type type type

  14. Framework (Data model) Excerpt from YAGO: Suchanek et al. WWW 2007 • Entity-relationship (ER) graph • Node label : entity • Edge label : relation • Edge weight : relation “strength” • Fact • Represented by an edge • Evidence pages for a fact f • Web pages from which f was derived • Computation of fact confidence (i.e. edge weights) : locatedIn Max Planck Institute Germany type type type

  15. Framework (Query language) • R : set of relationship labels • RegEx(R) : set of regular expressions over R-labels • E : set of entity labels • V : set of variables • Definition (fact template) A fact template is a triple <e1 r e2> where e1 , e2  EV and r  RegEx(R) V. Examples: givenNameOf | familiyNameOf Liu $x $x Albert Einstein Mileva Maric

  16. Framework (Query language) • Definition (NAGA query) A NAGA query is a connected directed graph in which each edge represents a fact template. • Examples 1) Which physicist was born in the same year as Max Planck? isA isA 2) Which politician is also a scientist? Physicist Max Planck $y isA isA Scientist Politician $x bornInYear $x bornInYear 4) Which mountain is located in Africa? loctedIn* isA Mountain Africa $x 3) Which scientist are called Liu? 5) What connects Einstein and Bohr? givenNameOf | familiyNameOf isA Liu Scientist $x * Niels Bohr Albert Einstein

  17. Framework (Query language) • Definition (NAGA query) A NAGA query is a connected directed graph in which each edge represents a fact template. • Examples 1) Which physicist was born in the same year as Max Planck? isA isA 2) Which politician is also a scientist? Physicist Max Planck $y isA isA Scientist Politician $x bornInYear $x bornInYear 4) Which mountain is located in Africa? loctedIn* isA Mountain Africa $x 3) Which scientist are called Liu? 5) What connects Einstein and Bohr? givenNameOf | familiyNameOf isA Liu Scientist $x * Niels Bohr Albert Einstein

  18. Framework (Query language) • Definition (NAGA answer) A NAGA answer is a subgraph of the underlying ER graph that matches the query graph. • Examples 1) Which physicist was born in the same year as Max Planck? isA isA 2) Which mountain is located in Africa? Physicist loctedIn* Max Planck $y isA Mountain Africa $x bornInYear $x bornInYear loctedIn* loctedIn* isA Mountain Africa Tanzania Kilimanjaro isA isA 0.98 0.98 0.96 Physicist 0.96 0.96 Max Planck Mihajlo Pupuin 3) What connects Einstein and Bohr? * 0.97 0.97 Niels Bohr Albert Einstein bornInYear 1858 bornInYear hasWonPrize hasWonPrize Nobel Prize Albert Einstein Niels Bohr 0.95 0.95

  19. Framework (Query language) • Definition (NAGA answer) A NAGA answer is a subgraph of the underlying ER graph that matches the query graph. • Examples 1) Which physicist was born in the same year as Max Planck? isA isA 2) Which mountain is located in Africa? Physicist loctedIn* Max Planck $y isA Mountain Africa $x bornInYear $x bornInYear loctedIn loctedIn isA Mountain Africa Tanzania Kilimanjaro isA isA 0.98 0.98 0.96 Physicist 0.96 0.96 Max Planck Mihajlo Pupin 3) What connects Einstein and Bohr? * 0.97 0.97 Niels Bohr Albert Einstein bornInYear 1858 bornInYear hasWonPrize hasWonPrize Nobel Prize Albert Einstein Niels Bohr 0.95 0.95

  20. isA Einstein $x Einstein isa scientist Einstein isa vegetarian Framework (Ranking model) • Question How to rank multiple matches to the same query? • Ranking desiderata Confidence Correct answers • Certainty of IE • Trust/Authority of source “Max Planck born in Kiel” bornIn (Max_Planck, Kiel) (Source: Wikipedia) “They believe Elvis hides on Mars” livesIn (Elvis_Presley, Mars) (Source: The One and Only King‘s Blog) • Informativeness • prominent results preferred • Frequency of facts in the corpus isA isA • Compactness • Prefer “tightly” connected • answers • Size of the answer graph Einstein vegetarian Tom Cruise bornInYear hasWonPrize Nobel Prize Bohr 1962 hasWonPrize diedInYear

  21. isA Einstein $x Einstein isa scientist Einstein isa vegetarian Framework (Ranking model) • Question How to rank multiple matches to the same query? • Ranking desiderata Confidence Correct answers • Certainty of IE • Trust/Authority of source “Max Planck born in Kiel” bornIn (Max_Planck, Kiel) (Source: Wikipedia) “They believe Elvis hides on Mars” livesIn (Elvis_Presley, Mars) (Source: The One and Only King‘s Blog) • Informativeness • prominent results preferred • Frequency of facts in the corpus isA isA • Compactness • Prefer “tightly” connected • answers • Size of the answer graph Einstein vegetarian Tom Cruise bornInYear hasWonPrize Nobel Prize Bohr 1962 hasWonPrize diedInYear

  22. isA Einstein $x Einstein isa scientist Einstein isa vegetarian Framework (Ranking model) • Question How to rank multiple matches to the same query? • Ranking desiderata Confidence Correct answers • Certainty of IE • Trust/Authority of source “Max Planck born in Kiel” bornIn (Max_Planck, Kiel) (Source: Wikipedia) “They believe Elvis hides on Mars” livesIn (Elvis_Presley, Mars) (Source: The One and Only King‘s Blog) • Informativeness • prominent results preferred • Frequency of facts in the corpus isA isA • Compactness • Prefer “tightly” connected • answers • Size of the answer graph Einstein vegetarian Tom Cruise bornInYear hasWonPrize Nobel Prize Bohr 1962 hasWonPrize diedInYear

  23. isA Einstein $x Einstein isa scientist Einstein isa vegetarian Framework (Ranking model) • Question How to rank multiple matches to the same query? • Ranking desiderata Confidence Correct answers • Certainty of IE • Trust/Authority of source “Max Planck born in Kiel” bornIn (Max_Planck, Kiel) (Source: Wikipedia) “They believe Elvis hides on Mars” livesIn (Elvis_Presley, Mars) (Source: The One and Only King‘s Blog) • Informativeness • prominent results preferred • Frequency of facts isA isA • Compactness • Prefer “tightly” connected • answers • Size of the answer graph Einstein vegetarian Tom Cruise bornInYear hasWonPrize Nobel Prize Bohr 1962 hasWonPrize diedInYear

  24. isA Einstein $x Einstein isa scientist Einstein isa vegetarian Framework (Ranking model) • Question How to rank multiple matches to the same query? • Ranking desiderata Confidence Correct answers • Certainty of IE • Trust/Authority of source “Max Planck born in Kiel” bornIn (Max_Planck, Kiel) (Source: Wikipedia) “They believe Elvis hides on Mars” livesIn (Elvis_Presley, Mars) (Source: The One and Only King‘s Blog) • Informativeness • prominent results preferred • Frequency of facts isA isA • Compactness • Prefer “tightly” connected • answers • Size of the answer graph Einstein vegetarian Tom Cruise bornInYear hasWonPrize Nobel Prize Bohr 1962 hasWonPrize diedInYear

  25. isA Einstein $x Einstein isa scientist Einstein isa vegetarian Framework (Ranking model) • Question How to rank multiple matches to the same query? • Ranking desiderata Confidence Correct answers • Certainty of IE • Trust/Authority of source NAGA exploits language models for ranking “Max Planck born in Kiel” bornIn (Max_Planck, Kiel) (Source: Wikipedia) “They believe Elvis hides on Mars” livesIn (Elvis_Presley, Mars) (Source: The One and Only King‘s Blog) • Informativeness • prominent results preferred • Frequency of facts isA isA • Compactness • Prefer “tightly” connected • answers • Size of the answer graph Einstein vegetarian Tom Cruise bornInYear hasWonPrize Nobel Prize Bohr 1962 hasWonPrize diedInYear

  26. Framework (Ranking model) Statistical Language Models for Document IR [Maron/Kuhns 1960, Ponte/Croft 1998, Lafferty/Zhai 2001] • each doc has LM: generative • prob. distr. with parameters  • query q viewed as sample • estimate likelihood that q • is sample of LM of doc d • rank by descending likelihoods • (best „explanation“ of q) d1 ? LM(1) q d2 ? LM(2) MLE: sparseness mixture model Background model (smoothing)

  27. background model Framework (Ranking model) isA Albert Einstein $x • Scoring answers Query q with templates q1q2 … qn , e.g. Given g with facts g1g2 … gn , e.g. We use generative mixture models to compute P[q | g] isA Albert Einstein Physicist using generative mixture model estimated using knowledge base graph structure based on IE accuracy and authority analysis estimated by correlation statistics

  28. isA Albert Einstein Vegetarian isA Albert Einstein Physicist Framework (Ranking model) isA Consider • Informativeness Albert Einstein $x Possible results NAGA Ranking (Informativeness)

  29. isA Albert Einstein Physicist isA Albert Einstein Vegetarian Physicist isA  Vegetarian more important than Physicist Albert Einstein isA Vegetarian Framework (Ranking model) isA Consider • Informativeness Albert Einstein $x Possible results BANKS Ranking (Bhalotia et al. ICDE 2002) • Relies only on underlying graph structure • Importance of an entity is proportional to its degree

  30. Evaluation (Setting) • Knowledge graph YAGO (Suchanek et al. WWW 2007) • 16 Million facts • 85 NAGA queries • 55 queries from TREC 2005/2006 • 12 queries from the work on SphereSearch (Graupmann et al. VLDB 2005) • We provided 18 regular expression queries

  31. Evaluation (Setting) • The queries were issued to • Google, • Yahoo! Answers, • START (http://start.csail.mit.edu/), • NAGA (Banks scoring) • relies only on the structure of the underlying graph. (see Bhalotia et al. ICDE 2002) • NAGA (NAGA scoring) • top-10 answers assessed by 20 human judges as relevant, less relevant and irrelevant.

  32. Evaluation (Setting) • The queries were issued to • Google, • Yahoo! Answers, • START (http://start.csail.mit.edu/), • NAGA (Banks scoring) • relies only on the structure of the underlying graph. (see Bhalotia et al. ICDE 2002) • NAGA (NAGA scoring) • top-10 answers assessed by 20 human judges as relevant (2), less relevant (1), and irrelevant (0).

  33. Evaluation (Metrics & Results) • NDCG (normalized discounted cumulative gain) • rewards result lists in which relevant results are ranked higher than less relevant ones • Useful when comparing result lists of different lengths • P@1 • to measure how satisfied the user was on average with the first answer of the search engine • We report the Wilson confidence intervals at =0.95%

  34. Evaluation (Metrics & Results) • NDCG (normalized discounted cumulative gain) • rewards result lists in which relevant results are ranked higher than less relevant ones • Useful when comparing result lists of different lengths • P@1 • to measure how satisfied the user was on average with the first answer of the search engine • We report the Wilson confidence intervals at =0.95%

  35. Evaluation (Metrics & Results) • NDCG (normalized discounted cumulative gain) • rewards result lists in which relevant results are ranked higher than less relevant ones • Useful when comparing result lists of different lengths • P@1 • to measure how satisfied the user was on average with the first answer of the search engine • Wilson confidence intervals computed at =0.95%

  36. NAGA is a search engine for Advanced querying of information in ER graphs NAGA queries NAGA answers We a novel scoring mechanism based on generative language models, Applied to the specific and unexplored setting of ER graphs Incorporating confidence, informativeness, and compactness Viability of the approach demonstrated in comparison to state of the art search engines and QA-Systems. Summary

  37. NAGA is a search engine for Advanced querying of information in ER graphs NAGA queries NAGA answers We propose a novel scoring mechanism based on generative language models, Applied to the specific and unexplored setting of ER graphs Incorporating confidence, informativeness, and compactness Viability of the approach demonstrated in comparison to state of the art search engines and QA-Systems. Summary

  38. NAGA is a search engine for Advanced querying of information in ER graphs NAGA queries NAGA answers We propose a novel scoring mechanism based on generative language models, Applied to the specific and unexplored setting of ER graphs Incorporating confidence, informativeness, and compactness Viability of the approach demonstrated in comparison to state of the art search engines and QA-Systems. Summary

  39. Thank youNAGA:http://www.mpi-inf.mpg.de/~kasneci/nagaYAGO: http://www.mpi-inf.mpg.de/~suchanek/yago

More Related