1 / 27

Exploiting Local Similarity for Indexing Paths in Graph-Structured Data

Exploiting Local Similarity for Indexing Paths in Graph-Structured Data. by Raghav Kaushik , Pradeep Shenoy , Philip Bohannon and Ehud Gudes. Outline. No Outline No Confusing Syntax No Pseudocode Examples Results. XML as Data Graph. oid. label(3). value(13).

ikia
Télécharger la présentation

Exploiting Local Similarity for Indexing Paths in Graph-Structured Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting Local Similarity for Indexing Paths in Graph-Structured Data by RaghavKaushik, PradeepShenoy, Philip Bohannon and Ehud Gudes Abdullah Mueen

  2. Outline • No Outline • No Confusing Syntax • No Pseudocode • Examples • Results Abdullah Mueen

  3. XML as Data Graph oid label(3) value(13) Non-tree edges: model IDREF relationships in the document Abdullah Mueen

  4. Some Notations • node path: • 1.2.3.7.14 • label path: • ROOT.metro.cultural.museum.name • 1.2.3.7 matchesROOT.metro.cultural.museum • 2.3.7 does not matchmetro.cultural.museum.name • 7 and 6 both matchesROOT.etro.cultural.museum • k-path: • Label Path of length ≤ k Abdullah Mueen

  5. Path Expression matches with any label alteration repetition • ROOT.metro.cultural.museum • 6,7 • ROOT.(-.-.-).name • 12,14,16,19,22,24 • ROOT.-*.hotel • All hotel nodes • ROOT.metro.neighborhoods.neighborhood. (-|-.-)?.(hotel|museum).name • 12,14,16,19 label sequencing optional Xpath and other Query Languages that use Path Expressions • http://saxon.sourceforge.net/saxon6.5.3/expressions.html • http://www.w3.org/1999/09/ql/docs/xquery.html Abdullah Mueen

  6. The Problem • Given a graphGand a path expression P, what are the labels of the nodes that match with P. • Possible Solution is to evaluate the path expression query using the data graph. • But data graphcan be Very Large to fit in the main memory and can be Very Large to search completely even if it fits. Abdullah Mueen

  7. Indexing Data Graph • No Schema • No Keys • Only Structural Information is there which can be summarized by a smaller graph I(G). This summary graph serves as an Index for the whole data graph. Abdullah Mueen

  8. Indexing Data Graph : Example(1) R 0 Precise Index eg. DataGuide, 1-index R Extent 11 C A B 1 3 2 C A B 12 14 13 C {3} B D {1} 4 5 6 {2,4} D 15 C 17 C D 7 8 {6} {5,7} D ext(17) = {5,7} ext(13) = {2,4} 18 D 9 {8,9} index graph I(G) data graph G R.A.-*.C = {5,7} R.-.B = {4,2} R.A.-*.C = {5,7} R.-.B = {4,2} Abdullah Mueen

  9. Indexing Data Graph : Example(2) R 0 R 11 C A B C 1 3 2 A B 12 14 13 {3,5,7} {1} C {2,4} B D 4 5 6 D 15 {6,8,9} C D 7 8 index graph I(G) D 9 Safe Index data graph G R.A.-*.C = {3,5,7} R.-.-*.B = {2,4} R.A.-*.C = {5,7} R.-.-*.B = {4} Abdullah Mueen

  10. Indexing Data Graph : Example(3) R 0 R 11 C A B C 1 3 2 A B 12 14 13 {3,5,7} {1} C {2,4} B D 4 5 6 D 15 {6,8,9} C D 7 8 index graph I(G) D 9 Unsafe Index data graph G R.A.-*.C = {5,7} R.-.-*.B = {2} R.A.-*.C = {3,5,7} R.-.-*.B = { } Abdullah Mueen

  11. Bisimilarity R 0 Two nodes u and v are called bisimilar(u ≈b v) if label(u) = label(v) every incoming label path from ROOT to u matches with at least one incoming path from ROOT to v and vice versa. C A B 1 3 2 C B D 4 5 6 • 2,4 are bisimilar. • 5,7 are bisimilar • 8,9 are bisimilar • 6,8 are Not bisimilar C D 7 8 D 9 • ≈b defines an equivalence class over the set of nodes in G • Needs O(m log n) time to find the partitions data graph G R.A.-*.C = {5,7} R.-.B = {4,2} Abdullah Mueen

  12. Equivalence Classb → The 1-index R 0 R 11 C A B 1 3 2 C A B 12 14 13 C {3} B D {1} 4 5 6 {2,4} D 15 C 17 C D 7 8 {6} {5,7} D 18 D 9 {8,9} index graph I(G) data graph G R.A.-*.C = {5,7} R.-.B = {4,2} R.A.-*.C = {5,7} R.-.B = {4,2} Abdullah Mueen

  13. Revisiting Bisimilarity • 1-index is upper bounded by the size (number of nodes) of the data graph • For real large documents it is almost 45% of the size of the data graph Bisimilarity partitions nodes by considering all incoming paths from ROOT which is a global comparison between nodes. Abdullah Mueen

  14. k-bisimilarity R 0 Two nodes u and v are called k-bisimilar(u ≈k v) if label(u) = label(v) every incoming label path of length≤kto u matches with at least one incoming path of length≤kto v and vice versa. C A B 1 3 2 C B D 4 5 6 C D 7 8 D 9 • ≈k defines an equivalence class over the set of nodes in G • The algorithm for computing k-bisimulation will be shown later • 2,4 are 0-bisimilar. • 5,7 are 1-bisimilar • 8,9 are 2-bisimilar • 6,8 are 1-bisimilar Abdullah Mueen

  15. Equivalence Class0 → A(0) index R 0 R 11 C A B C 1 3 2 A B 12 14 13 {3,5,7} {1} C {2,4} B D 4 5 6 D 15 {6,8,9} C D 7 8 D Label grouping / Label partition 9 data graph G index graph A(0) Abdullah Mueen

  16. Equivalence Class1 → A(1) index R 0 R 11 C A B 1 3 2 C A B 12 14 13 C {1} B D {3} 4 5 {2} 6 C B D 15 16 17 C D 7 8 {5,7} {6,8,9} {4} D 9 data graph G index graph A(1) Abdullah Mueen

  17. A(k) index family R 0 R 11 R 11 C A C B A 12 14 B 13 1 3 {3,5,7} 2 {1} {2,4} C D 15 A B 12 14 13 {6,8,9} C {1} B D {3} 4 5 6 {2} A(0) A(1) C R 11 B D R 11 15 16 17 C D {5,7} {6,8,9} 7 8 A C C {4} A B B 12 14 12 14 13 13 {1} {1} {3} {3} {2} {2} C D C B 9 B D D 15 16 17 15 16 17 {4} {5} {5} {6} {4} {6} D data graph G D C 18 C 19 18 19 {8} {7} {8,9} {7} D 18 A(2) A(3) = 1-index {9} Abdullah Mueen

  18. Properties of A(k) index R 0 R 11 C A B 1 3 2 C A B 12 14 13 {1} C B D {3} 4 5 6 {2} C B D 15 16 17 C D 7 8 {5,7} {6,8,9} {4} D 9 A(1) Abdullah Mueen

  19. Properties of A(k) index R 0 R 11 C A B 1 3 2 C A B 12 14 13 {1} C B D {3} 4 5 6 {2} C B D 15 16 17 C D 7 8 {5,7} {6,8,9} {4} D 9 A(1) Abdullah Mueen

  20. How to compute A(1) index R 0 Label partition {1} {2,4} {3,5,7} {6,8,9} Lookup: {1} {2,4} {3,5,7} {6,8,9} C {1} {2} {4} {3,5,7} {6,8,9} A Refining: {1} {2,4} {3,5,7} {6,8,9} B 1 3 2 {1} {2,4} {3,5,7} {6,8,9} C B D {1} {2} {4} {3} {5,7} {6,8,9} 4 5 6 {1} {2,4} {3,5,7} {6,8,9} C D 7 8 {1} {2} {4} {3} {5,7} {6,8,9} D 9 {1} {2,4} {3,5,7} {6,8,9} {1} {2} {4} {3} {5,7} {6,8,9} 1-bisimilar partition Abdullah Mueen

  21. How to compute A(2) index R 0 1-bisimilar partition {1} {2} {4} {3} {5,7} {6,8,9} Lookup: {1} {2} {4} {3} {5,7} {6,8,9} C {1} {2} {4} {3} {5,7} {6,8,9} Refining: {1} {2} {4} {3} {5,7} {6,8,9} A B 1 3 2 {1} {2} {4} {3} {5,7} {6,8,9} {1} {2} {4} {3} {5} {7} {6,8,9} C B D 4 5 6 {1} {2} {4} {3} {5,7} {6,8,9} {1} {2} {4} {3} {5} {7} {6,8,9} C D 7 8 {1} {2} {4} {3} {5,7} {6,8,9} D 9 {1} {2} {4} {3} {5} {7} {6} {8,9} {1} {2} {4} {3} {5,7} {6,8,9} 2-bisimilar partition {1} {2} {4} {3} {5} {7} {6} {8,9} Abdullah Mueen

  22. Query Evaluation : Fwd or Bckwd R 11 C A B 12 14 13 {1} {3} R A {2} - C B D 15 16 17 {5,7} {6,8,9} C {4} R.A.-*.C = {5,7} • Repeated state is prevented • O(|A|*m) • Backward evaluation using label-group Abdullah Mueen

  23. Query Evaluation : Validation R 11 R A C B A B 12 14 13 {1} {3} D C {2} C B D 15 16 17 {5,7} {6,8,9} {4} R.A.B.C.D = {6,8,9} • Repeated state is prevented • O(|A|*m) Abdullah Mueen

  24. Avoiding Validation R 11 R.-*.C.D= {6,8,9} C A B 12 14 13 {1} {3} {2} For Queries like R.-*.p, we can safely avoid validation on A(k) if p is a k-path. C B D 15 16 17 {5,7} {6,8,9} {4} A(1) Abdullah Mueen

  25. Results Abdullah Mueen

  26. Results Abdullah Mueen

  27. Conclusion • A(k) index is smaller than precise indexes and have their advantages, such as faster execution time with significant accuracy. • Future presentations • Change of the indexes with updates. • Incorporating more complex queries. Abdullah Mueen

More Related