1 / 53

gStore: Answering SPARQL Queries Via Subgraph Matching

gStore: Answering SPARQL Queries Via Subgraph Matching. 1 Peking University, 2 Hong Kong University of Science and Technology, 3 University of Waterloo. Lei Zou 1 , Jinghui Mo 1 , Lei Chen 2 , M. Tamer Özsu 3 , Dongyan Zhao 1. Outline. Background & Related Work Overview of gStore

sal
Télécharger la présentation

gStore: Answering SPARQL Queries Via Subgraph Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. gStore: Answering SPARQL Queries Via Subgraph Matching 1Peking University, 2Hong Kong University of Science and Technology, 3University of Waterloo Lei Zou1, Jinghui Mo1, Lei Chen2, M. Tamer Özsu3, Dongyan Zhao1

  2. Outline • Background & Related Work • Overview of gStore • Encoding Technique • VS*-tree & Query Algorithm • Experiments • Conclusions

  3. Outline • Background & Related Work • Overview of gStore • Encoding Technique • VS*-tree & Query Algorithm • Experiments • Conclusions

  4. Semantic Web “Semantic Web Technologies” is a collection of standard technologies to realize a Web of Data.

  5. RDF Data Model URI Literals URI

  6. RDF Graph Literal Vertex Entity Vertex

  7. SPARQL Queries SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. } Query Graph

  8. Subgraph Match vs. SPARQL Queries

  9. Naïve Triple Store SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. } Too many Self-Joins SQL: Select T3.Subject From T as T1, T as T2, T as T3 Where T1.Predict=“BornOnDate” and T1.Object=“1809-02-12” and T2.Predict=“DiedOnDate” and T2.Object=“1865-04-15” and T3. Predict=“hasName” and T1.Subject = T2.Subject and T2. Subject= T3.subject

  10. Existing Solutions Three categories of solutions are proposed to speed up query processing: • Property Table; Jena [K. Wilkinson et al. SWDB 03], … 2. Vertically Partitioned Solution; SW-store [D. J. Abadi et al. VLDB 07],… 3. Exhaustive-IndexingRDF-3x [T. Neumann et al. VLDB 08], Hexastore [C. Weiss et al. VLDB 08 ],…

  11. Existing Solutions-Property Table SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. } Reducing # of join steps SQL: Select People.hasName from People where People.BornOnDate = “1809-02-12” and People.DiedOnDate = “1865-04-15”.

  12. Existing Solutions-Vertically Partitioned Solution Fast Merge Join

  13. Existing Solutions- Exhaustive-Indexing Range query & Merge Join Each SPARQL query statement can be translated into one “range query”. SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }

  14. Some Limitations • Difficult to handle ``wildcard queries’’. • Difficult to handle updates.

  15. Outline • Background & Related Work • Overview of gStore • Encoding Technique • VS*-tree & Query Algorithm • Experiments • Conclusions

  16. Intuition of gStore Finding Matches over a Large Graph is not a trivial task.

  17. Preliminaries Literal Vertex Entity Vertex

  18. Preliminaries • RDF graph

  19. Preliminaries • Query Graph

  20. Preliminaries • match

  21. Preliminaries • Problem definition

  22. Storage Schema in gStore Encoding all neibhors into a “bit-string”, called signature.

  23. Encoding Technique (1) • |eSig(e).e| = M. • we employ m different string hash functions Hi (i = 1, ...,m) • For each hash function Hi, we set the (Hi(eLabel) MOD M)-th bit in eS ig(e).e to be ‘1’ • Encoding Sig(e).n is the same • |eSig(e).n| = N • n different hash functions

  24. Encoding Technique (2) “Abr”, “bra”, ”rah”, ”aha”, …., 0000 0010 0000 0000 ( hasName, “Abraham Lincoln”) 1000 0000 0000 0000 0010 0000 0000 1000 0010 0100 0001 0000 0000 0100 0000 ( BornOnDate, “1809-02-12”) 0100 0000 0000 0100 0010 0100 1000 0000 0000 0000 0001 OR ( DiedOnDate, “1865-04-15”) 1000 0010 0100 0001 0000 1000 0000 0000 0010 0100 0000 OR ( DiedIn, “y:Washington_D.c”) 0110 1010 0000 1100 0010 0100 1001 0000 0010 0000 1000 0010 0100 0001

  25. Encoding Technique (3)

  26. Encoding Technique (4)

  27. Encoding Technique (5)

  28. Outline • Background & Related Work • Overview of gStore • Encoding Technique • VS-tree & Query Algorithm • Experiments • Conclusions

  29. A Straightforward Solution (1) u2 u1 L1 L2

  30. A Straightforward Solution (2) L1 L2 Large Join Space ! 

  31. VS-tree

  32. VS-Tree query definition

  33. Pruning Technique Reduced Join Space!  u2 u1 10010

  34. Query Algorithm-Top-Down

  35. Optimized method • Too many super edges • Which level to start search • No brute-force enumeration

  36. VS*-Tree Insert • The criterion in the VS-tree only depends on the Hamming distance between the signatures of u and the node in VS-tree. • the criterion in VS∗- tree depends on both node signatures and G∗’s structure

  37. Updates- Insertion in G*

  38. Updates- Insertion in VS*-tree

  39. VS*-Tree split • the B+1 entities of the node will be partitioned into two new nodes, where B is the maximal fanout for a node in VS∗-tree. • 1. we find two entities that have the maximal Hamming distance between them as two seed nodes • 2. we associate each left entry with the nearest seed node, according to Equation 1.

  40. VS*-Tree deletion • Similar to split • if some node d has less than b entries, where b is the minimal fanout of node in VS∗-tree, then d is deleted and its entries are reinserted into VS∗-tree.

  41. Updates- Deletion in VS*-tree To be deleted

  42. Which Level To Begin • a concept “pruning power” of GIwith regard to Q∗ denoted as P(Q∗,GI)

  43. Estimate P(Q*,GI)

  44. Finding Valid Child States • propose a DFS strategy to find all valid child states of J. • start a DFS over G∗ beginning from some vertex vi

  45. Outline • Background & Related Work • Overview of gStore • Encoding Technique • VS*-tree & Query Algorithm • Experiments • Conclusions

  46. Datasets

  47. Offline Performance

  48. Exact Queries

  49. Wildcard Queries

More Related