1 / 32

On Triangulation-based Dense Neighbourhood Graph Discovery

On Triangulation-based Dense Neighbourhood Graph Discovery. School of Computing National University of Singapore. Outline. Motivation Related Work Terms Definition Triangulation based DN -graph mining Semi-streaming DN -graph model Experimental Study Future Work and Conclusion.

cpearman
Télécharger la présentation

On Triangulation-based Dense Neighbourhood Graph Discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Triangulation-based Dense Neighbourhood Graph Discovery School of Computing National University of Singapore

  2. Outline • Motivation • Related Work • Terms Definition • Triangulation based DN-graph mining • Semi-streaming DN-graph model • Experimental Study • Future Work and Conclusion

  3. Motivation • Define dense graph pattern from the perspective that considers both the size of the substructure and the minimum level of interactions between vertices. • Locate dense patterns within unsolvable restricted resources for large scale graphs.

  4. Related Work • Other Dense Patterns • Clique/Quasi-Clique • High Degree Patterns • Dense Bipartite Patterns • Heavy Patterns • Triangle Counting • CSV • Density-based closed cliques discovery and a linear fashion visualization.

  5. Terms Definition

  6. Terms Definition (cont’d)

  7. DN-graph b a G Proof

  8. DN-graph and Other Dense Patterns Quasi-clique Close-clique (a maximal clique) DN-Graph

  9. DN-graph and Closed Clique Proof

  10. Computation Bottleneck in DN-graph Mining Most sub-graphs are not DN-graphs Most of these operations are redundant

  11. How to tackle the bottleneck ? • Reduce number of joins • Local maximal feature: two DN-graphs share no edge. • All edges sharing common vertices and local maximal λ values comprising of the DN-graph • Locating DN-graph using λ(e) value • All edges within DN-graph have equal λ(e) , noted as λmax • All edges connecting to neighboring vertices have a smaller λ values: λ(e) = λ(u,v) < λmax while u not in G’, v in G’ • Use approximating methods to compute λ(e) efficiently

  12. e

  13. Graph Triangulation • Given a graph triangle, the upper bound of the other two edges can be used to tighten the density estimation of the third edge. λ(w,v) = 3 w v λ(u,w) = 3 λ(u,v)=5 u

  14. Triangulation Based DN-graph Mining • DN-graph Mining Algorithm • Step One: Sort vertices according to their degrees. • Step Two: Generate triangles in a streaming fashion. • Step Three: Obtain the local density information gradually along the triangle streams. • Initial Upper Bound: TC(e) the number of triangles an edge participates in.

  15. Counting of Supporting Nodes Not Supporting Node n2 n2 n2 n2 n2 n3 n1 n1 n1 n1 n1 5 6 8 4 n4 7 5 5 3 a a a a a b b b b b = 4

  16. Convergence Converge First Iteration Second Iteration Initialization Two Support Vertices One Support Vertex 2 V5 The local maximal neighborhood size 𝜆=2 2 𝜆(V2V3) decreases by one 𝜆(V3V6) decreases by one 𝜆(V2V6) decreases by one V6 3 2 2 2 3 2 V3 2 V1V2 1 V2V6 3 2 V4 3 2 1 V3V6 V1V3 1 4 3 2 2 V2V3 4 V3V5 3 2 2 2 2 V2V4 V2V5 V2 V3V4 V5V6 V1 1 2 2 V4V6 2

  17. Semi-Streaming Graph Model • Graph vertices fit into main memory, while edges are in the secondary storage, in the form of adjacency list. • Random access in primary storage (i.e. memory) and only sequential access in secondary storage. • As a feasible solution towards a streaming graph G(V,E), it should not exceed log |V| scans of G’s adjacency list.

  18. DN-graph mining in semi-streaming model • Estimating shared neighbor size using min-wise independent set property. • Min-wise independent set property: Two sets A, B over a universe X, and a uniformly chosen permutation π over X. If there is a total order in X, then the probability that min(π(A)) = min(π(B)) is the same as the Jaccard Coefficient J(A, B)= (n(A)∩n(B))/ (n(A)Un(B)). • We can use that to estimate shared neighbor size (n(A)∩n(B)).

  19. Experimental Setting • Quad-Core AMD Opteron(tm) processor 8356 • 128GB memory • 700 GB hard disk • OS: Windows Server 2003

  20. Experimental Study • Comparison with CSV on Stock Market Dataset

  21. Convergence • Dataset: Flickr graph (1.7million vertices and 22.6 million edges) • Running time per iteration is between 55 minutes to 1 hour.

  22. StreamDN Performance on Flickr Dataset • StreamDN over-estimates with respect to BiTriDN algorithm’s results by 72% during the first 66 scans. • StreamDN can handle streaming setting with reasonable accuracy.

  23. DN-graph Semantics in Various Domain

  24. Future work and Conclusion • DN-graph • DN-graph Mining Problem • Semi-streaming Approach • Future Work

  25. Thank You & Questions

  26. Reference • [WSTT08] N. Wang, P. Srinivasan, K.-L. Tan, and A.K.H. Tung. CSV: visualizing and mining cohesive subgraphs. In SIGMOD’08, pages 445–458, 2008. • [WZTT11] N. Wang, J. Zhang, K.-L. Tan, and A.K.H. Tung. On triangulation-based dense neighbourhood graph discovery. In VLDB’11, volume 4, 2011. • [ABC+04 P. Aloy, BaPttcher, H. Ceulemans, C. Leutwein, C. Mellwig, S. Fischer, and A.C. Gavin. Structure-based assembly of protein complexes in yeast. volume 303, pages 2026–2029, 2004. • [ATH03] I. Akihiro, W. Takashi, and M. Hiroshi. Complete mining of frequent patterns from graphs: Mining graph data. volume 50, pages 321–354, Hingham, MA, USA, 2003. Kluwer Academic Publishers. • [BBP06] V. Boginski, S. Butenko, and Pardalos. P.M. Mining market data: a network approach. Computers and Operations Research, 33(11):3171–3184, 2006. • [GRT05] D. Gibson, K. Ravi, and A. Tomkins. Discovering large dense sub- graphs in massive graphs. In VLDB’05, pages 721–732, Trondheim, Norway, 2005. • [Bla94] R.E. Blake. Partitioning graph matching with constraints. volume 27, pages 439–446, 1994.

  27. Reference (cont.) • [DT99] L. Dehaspe and H. Toivonen. Discovery of frequent datalog patterns. Data Mining and Knowledge Discovery, 3(7-36), 1999. • [HCD94] L. Holder, D. Cook, and S. Djoko. Substructure discovery in the SUBDUE system. In Proceedings of the Workshop on Knowledge Discovery in Databases, pages 169–180, 1994. • [MARW90] E.M. Mitchell, P.J. Artymiuk, D.W. Rice, and P. Willett. Use of techniques derived from graph theory to compare secondary structure motifs in proteins. Journal of Molecular Biology, 212:151–166,1990. • [MK01] K. Michihiro and G. Karypis. Frequent subgraph discovery. In ICDM’01, pages 313–320, 2001. • [RRRT99] K. Ravi, Prabhakar R., Sridhar R., and A Tomkins. Trawling the web for emerging cyber-communities. In Computer Networks, pages 1481–1493, 1999. • [SK98] A. Srivastav and W. Katja. Finding dense subgraphs with semidefinite programming. In APPROX ’98, pages 181–191, London, UK, 1998. Springer-Verlag. • [ZWZK] Z. Zeng, J. Wang, L. Zhou, and G. Karypis. Coherent closed quasi-clique discovery from large dense graph databases. In KDD’06, Philadelphia, USA.

  28. Proof: A DN-graph is a local maximum graph

  29. Proof: DN-graph and Closed Clique

  30. 2 2 3 2 2 3 2 1 4 2 1

  31. 2 2 3 2 2 3 2 1 4 2 1

More Related