1 / 27

Mining Closed Relational Graphs with Connectivity Constraints

Mining Closed Relational Graphs with Connectivity Constraints. Xifeng Yan, X. Jasmine Zhou and Jiawei Han SIGKDD 05 ’ 報告者:蔡明瑾 2005/12/09. Introduction. Relational graphs Modeling large scale networks Biological networks Social networks Each node represents a distinct object

hroland
Télécharger la présentation

Mining Closed Relational Graphs with Connectivity Constraints

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Closed Relational Graphs with Connectivity Constraints Xifeng Yan, X. Jasmine Zhou and Jiawei Han SIGKDD 05’ 報告者:蔡明瑾 2005/12/09

  2. Introduction • Relational graphs • Modeling large scale networks • Biological networks • Social networks • Each node represents a distinct object • genes,enzymes(酵) • DBLP: co-author relations,article reference relations • Graph is large • 10K nodes,1M edges • Mining closed frequent graphs with edge connectivity at least K

  3. Edge Connectivity K (G) • Given a graph G • Edge cut Ec :E(G)- Ec is disconnect • Min cut : min(Ec) • K (G) = |min cut| • Edge Cut Ec :separates V(G) into two sets V and V’, V->V’ • V ∩ V’ =φ • V ∪ V’ = V(G) • Edge in Ec connect V and V’

  4. Edge Connectivity K (G) Minimum cut : e1 K (G) = 1 Average Degree:3.25 Minimum Degree:3

  5. Condensation • G G’,G* is a graph formed from G’ with all vertices in G condensed into a single vertex. • If K(G) > K(G’), then K(G’) = K(G*)

  6. Condensation cont. • V->V’ be min cut of G’ • Since K(G) > K(G’),then V(G) must be subset of V or V’ K(G): 3 2 2

  7. Usage of condensation • Reduce the cost of calculating edge conn. if edge conn. of its subgraph is known. • Condensing all the vertices of g into a single vertex in g’, we only need to check K(g*) • g* is smaller than g’,then cost will be reduced.

  8. Exclusion • G G’, Ec be an edge cut of G’, |Ec|<K • If K(G) ≧ K, then Ec∩ E(G) = φ • G1 edge cut{e1,e2} • If we want to find a subgraph of G1 has edge cut at least 3, it will not have edges e1 and e2

  9. K-Decomposition • Break a graph into non-overlapping subgraphs such that their conn. is at least K • If K(G)of closed frequent graph G is less than K • Find subgraphs G’ whose K(G) ≧K

  10. CLOSECUT pattern growth

  11. decomposition

  12. Minimum Degree Constraint • For any graph, its edge conn. ≦ its minimum degree • If a graph satisfies the edge conn., it must satisfy the minimum degree constraint first.

  13. Shadow graph • G be a frequent graph and X be a set of edges which can be added to G such that G{e} e X is connected and frequent. • Graph G X is called shadow graph of G , is written as • The degree of v in the shadow graph of G is written deg(v) • If deg(v) < K,remove all edges of v in X

  14. SPLAT • Row enumeration based • Intersect relational graphs and decomposes them to obtain highly connected graph

  15. Experiment • Real dataset 32 micro-array experiments • Synthetic dataset • 2.5GHZ Intel Xeon • Main memory 3GB • RedHat 9.0 • C++ with STL

  16. Synthetic dataset Density = average degree / |vertices|

  17. N30O10Ks1kT500I40D0.6d0.005

  18. Scalability seed graph minsup10

  19. Scalability Density

  20. Real dataset • 32 micro array • Node :6661個object(酵母基因) • Edges: 600k

  21. Real datasets

  22. Pattern mined under different conn.

  23. Largest pattern size(edges)

  24. Edge conn. = 3 sup > 19 COS3 UNKNOWN COS1 COS5 COS6 COS2 COS7 COS4 Rest of 7 are belong to a family of proteins(蛋白質) Located closedly in the 染色體

  25. Helicase activity 蛋白晶體結構

  26. Ribosomal biogenesis 核醣體生源體 Transcription DNA UNKNOWN Predicted involved in RNA Processing

  27. rRNA Processing UNKNOWN

More Related