1 / 28

Computer Science and Engineering

Computer Science and Engineering. Efficient Subgraph Similarity All-Matching. Gaoping Zhu, Ke Zhu, Wenjie Zhang, Xuemin Lin, Chuan Xiao The University of New South Wales. Outline. Introduction Preliminary Framework Algorithms Experiments Conclusions. Introduction – Graph Data.

zaynah
Télécharger la présentation

Computer Science and Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Science and Engineering Efficient Subgraph Similarity All-Matching Gaoping Zhu, Ke Zhu, Wenjie Zhang, Xuemin Lin, Chuan Xiao The University of New South Wales

  2. Outline Introduction Preliminary Framework Algorithms Experiments Conclusions

  3. Introduction – Graph Data Chem-informatics Chemical Compounds (Small Size) Bio-informatics Protein Interaction Networks (Medium Size) Internet World Wide Web (Large Size)

  4. Introduction – Subgraph All-Matching Problem Subgraph exact all-matching enumerates all exact matches of a query graph q in a large data graph G. Subgraph similarity all-matching enumerates all similarity matches of a query graph q in a large data graph G. Motivations Noisy query graphs due to erroneous user input. Noisy data graphs due to imprecise collection.

  5. Preliminaries Edge Edit Distance The edge edit distance from a graph g1 to another graph g2 is the minimum number of edge insertions required to transform g1 to g2. GED (p1, q) = 0, GED (p2, q) = 1. B B B B B B A C A C A C q p1 p2

  6. Preliminaries Feasible Pattern Given a distance threshold δ, p is called a feasible pattern of q if p is a connected subgraph of q with no missing vertex and GED (p, q) ≤ δ. The feasible patterns of q are p1, p2,p3,p4 for δ = 1. B B B B B B B B B B A C A C A C A C A C q, δ = 1 p1 p2 p3 p4

  7. Preliminaries Similarity Matches A similarity match of q in G is a subgraph isomorphic mapping from any feasible pattern p to q. Must consider any feasible pattern! Exact matches of q in G are also similarity matches! similarity match similarity match B B B A C B B B B B B B G A C A C A C C A q p1 p2 exact match

  8. SAPPER [VLDB’10] Enumerate Phase Search Phase Results Mp1 A C B δ = 1 B B B B D p1 Mp2 G A C A C B B … … B B B B B B B B D D … … Mp5 p2 q A C B B B B B D p5

  9. Motivation I : Effective Search Order A 1 match 1 match 1 match 1 match 4 matches 27 matches 12 matches A A A A B B B B B B B B Search Order One {v1, v2, v3, v4, v5, v6, v7, v8}  47 intermediate matches A C D D B B q B B B B Search Order Two {v4, v3, v5, v6, v2, v1, v7, v8} 350 intermediate matches v1 v4 v8 A C A D B A v3 B B B G B B B B v2 B B B B B B B D D D v6 v7 v5

  10. Motivation II : Sharing Computation f1 f1 f2 f'2 p p' Query Execution Plan One : search p and p’ separately  Share no computation Query Execution Plan Two : search f1, f2 and f’2 and then join Share the computation on f1 v1 v1 v4 v4 v8 v8 A A C C v3 v3 B B B B B B B B B B v2 v2 D D v6 v6 v7 v7 v5 v5

  11. Framework - DecQ Query Decomposition (Phase One) Decompose the query graph q into a set of selective edge-disjoint sub-queries Q = { f1, …, fn }, called fragments. q Query Graph Decompose Fragments f1 f2 f3 f4

  12. Framework - DecQ Local Matching (Phase Two) Enumerate all local (feasible) patternsf’ of each fragment f and apply depth-first search on each pattern f’ to obtain the local matches (exact matches of f’ in G). f Fragments Enumerate f’a f'b f'c f'd Local Patterns Depth-first Search Mf’a Mf’b Mf’c Mf’d Local Matches

  13. Framework - DecQ Global Matching (Phase Three) Enumerate all global (feasible) patternsp and merge the local matches of decomposed local patterns of p to obtain the global matches (exact matches of p in G). p Mp Global Matches Merge Local Matches Mf’1 Mf’2 Mf’3 Mf’4 Retrieve f'1 f'2 f'3 f'4 Local Patterns

  14. Algorithms Local matching Enumerate all local patterns f’ of each fragment f. Search all exact matches of each f’ by depth-first search fashion with effective search order. Effective Search Order It is NP-complete to find an search order with minimum number of intermediate matches produced in the depth-first search.

  15. Algorithms Estimating Exact Matches of a Graph Given a graph f’, assume M(v) / M(e) contains all mappings in G of a vertex v / edge e in f’. For each edge (u, v) in f’, given any u’ in M(u) and v’ in M(v), the probability that there is an edge (u’, v’) in G is: The estimated number of exact matches of f’ in G can be represented by

  16. Algorithms Approximating Optimal Search Order A search order grow a local pattern f’ vertex by vertex. Greedy heuristic: select the vertex v such that the number of estimated exact matches of the current subgraph s of f’ is minimized. s3 f’ s1 s2

  17. Algorithms Global Matching A global pattern p can be either a minimal or a non-minimal pattern. A minimal patternp does not have one subgraph p’, which is also a global pattern with one missing edge in p. A non-minimal patternp has at least one subgraph p’, which is also a global pattern with one missing edge in p.

  18. Algorithms Processing Minimal Patterns For a minimal pattern p, we decompose p into a set of local patterns and merge the local matches to obtain global matches Mp. p' p store the matches of (f’3 ∪ f’4) reuse the matches of (f’3 ∪ f’4) f'1 f'2 f'3 f'4 f‘’1 f‘’2 f'3 f'4 M’1 M’2 M’3 M’4 M’’1 M’’2 M’3 M’4

  19. Algorithms Processing Non-minimal Patterns For a non-minimal pattern p, we pick the child pattern p’ of p with the smallest Mp. We check if the missing edge exists in each exact match of p’ in G. If so, this match is validated as an exact match of p in G. B B A C B B B B B B A C A C C A p p’

  20. Algorithms Decomposition & Query Execution Plan Each decomposition of a global pattern p corresponds to a query execution plan of p. (i.e., as in RDBMS) It is costly to generate a good query execution plan for each global pattern p of q . Recursive Bisection We use heuristic solution to recursively bisect q into a set Q of edge-disjoint fragments. Bisect a graph into two subgraphs such that their graph size are balanced.

  21. Experiments Real Data Data Graph : HPRD (Human Protein Interaction Network, |V(G)| = 9,460 vertices, |E(G)| = 37,081 with vertices labeled by GO Term) Query Graphs : selected subgraphs from HPRD network with 1-3 inserted “noisy” edge. Synthetic Data Data Graphs : obtained by synthetic graph generator Query graphs : selected subgraphs from data graphs with 1-3 inserted “noisy” edge.

  22. Experiments Evaluated Algorithms SAPPER ROND (Random search Order No Decomposition) EOND (Effective search Order No Decomposition) DecQ (Effective Search Order and Decomposition) Default Settings |E(q)| = 40, avg. deg(q) = 4 |E(G)| = 5k, avg. deg(G) = 12, |ΣL| = 100 δ = 2

  23. Experiments Varying Error Threshold

  24. Experiments Varying Query Settings

  25. Experiments Varying Data Graph Settings

  26. Experiments Comparing with SAPPER

  27. Conclusions A novel framework DecQ for subgraph similarity all-matching. Effective search order for local matching with depth-first search fashion. Effective query decomposition plan for global matching with computation sharing.

  28. Thank You! Any Questions?

More Related