1 / 13

Mining Frequent Subgraphs

Mining Frequent Subgraphs. COMP 790-90 Seminar Spring 2011. FFSM: Fast Frequent Subgraph Mining -- An Overview:. How to solve graph isomorphism problem? A Novel Graph Canonical Form: CAM How to tackle subgraph isomorphism problem (NP-complete)? Incrementally maintained embeddings

oswald
Télécharger la présentation

Mining Frequent Subgraphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Frequent Subgraphs COMP 790-90 Seminar Spring 2011

  2. FFSM: Fast Frequent Subgraph Mining -- An Overview: • How to solve graph isomorphism problem? • A Novel Graph Canonical Form: CAM • How to tackle subgraph isomorphism problem (NP-complete)? • Incrementally maintained embeddings • How to enumerate subgraphs: • An Efficient Data Structure: CAM Tree • Two Operations: CAM-join, CAM-extension.

  3. a b y b x b y x b y 0 d 0 y c 0 y 0 c 0 0 0 y 0 d y y 0 0 a M3 M1 p2 p5 a y c b y y b p1 y x b x a y 0 0 y d y d b 0 y 0 c 0 p4 p3 M2 (P) Adjacency Matrix • Every diagonal entry of adjacency matrix M corresponds to a distinct vertex in G and is filled with the label of this vertex. • Every off-diagonal entry in the lower triangle part of M1 corresponds to a pair of vertices in G and is filled with the label of the edge between the two vertices and zero if there is no edge. 1for an undirected graph, the upper triangle is always a mirror of the lower triangle

  4. b x b y 0 d 0 y 0 c y y 0 0 a a y b M3 y x b 0 y c 0 0 0 y 0 d M1 Code • A Code of n  n adjacency matrix M is defined as sequence of lower triangular entries (including the diagonal entries) in the order: M1,1 M2,1 M2,2 … Mn,1 Mn,2 …Mn,n-1 Mn,n Code(M1): aybyxb0y0c00y0d < Code(M2): aybyxb00yd0y00c < Code(M3): bxby0d0y0cyy00a a y b y x b 0 0 y d 0 y 0 c 0 M2 • TheCanonical Adjacency Matrix is the one produces the maximal code, using lexicographic order.

  5. a a M1 a y b y b y x b a a 0 y c y x b y b y b a 0 0 y c 0 0 y 0 d y 0 b y x b y b 0 M5 M2 M3 M4 M6 MP Submatrix • For an m  m matrix A, an n  n matrix B is A’s maximal proper submatrix (MP Submatrix), iff B is obtained by removing the last none-zero entry from A. • We define a CAM is connected iff the corresponding graph is connected. • Theorem I: A CAM’s MP submatrix is CAM • Theorem II: A connected CAM’s MP submatrix is connected

  6. b b a y d x b y b y 0 c a a a y b y b a a a a a a b y b 0 y d y 0 b y y y y b b b b y y b b x b y x b y 0 c 0 y y y x x x 0 b b b b y 0 x 0 b b 0 y 0 d 0 0 0 0 y y y y 0 0 0 0 c d c c 0 0 y y 0 0 d d a a y b y b 0 x b 0 x b p2 p5 y 0 0 y c 0 0 y d c b y p1 a a a x a y y y b b b y 0 0 y x x 0 b b b y d b 0 0 0 y y y 0 0 0 c c d p4 0 0 0 0 0 0 y y y 0 0 0 d c d p3 (P) CAM Tree: Subgraphs b d c a b b y c x b a b a a y y b b y b x b 0 y c y 0 d 0 0 x x b b

  7. a b a b y b x b a a y b y b 0 x b y 0 b a y b y x b p2 p5 s1 q1 y c b y y y b b s2 p1 q2 x a a a x y y y y d b b b p4 s3 q3 p3 (S) (P) (Q) CAM Tree: Frequent Subgraphs = 2/3

  8. How to Enumerate Nodes in a CAM Tree? • Two operations to explore CAM tree: • CAM-Join • CAM-Extension • Augmenting CAM tree with Suboptimal CAMs • Objectives: • no false dismissal • no redundancy • Plus: We want to this efficiently!

  9. a b y b y c y x b j e e j j e e a a b b b b y b x b y b x b x b x b 0 y c 0 y d 0 y d y 0 c y 0 d 0 y c j j e e j j a a a a b b y b y b y b y b x b x b y x b y x b y 0 c y 0 d y x b y x b p2 p5 y 0 0 y c 0 0 y d 0 y 0 d 0 y 0 c 0 y 0 c 0 y 0 d c b y j j p1 a a x a y y b y b y y x b y x b d b 0 y 0 c 0 0 y d p4 p3 0 0 y 0 d 0 0 y 0 c (P) Suboptimal Tree We define a Suboptimal CAM as a matrix that its MP submatrix is a CAM. d b c a b b a y d x b y b

  10. Summary • Theorem: For a graph G, let CK-1 (Ck) be set of the suboptimal CAMs of all size-(K-1) (K) subgraphs of G (K ≥ 2). Every member of set CK can be enumerated unambiguously either by joining two members of set CK-1 or by extending a member in CK-1.

  11. Experimental Study • Predictive Toxicology Evaluation Competition (PTE) • Contains: 337 compounds • Each graph contains 27 nodes and 27 edges on average • NIH DTP Anti-Viral Screen Test (DTP CA/CM) • Chemicals are classified to be Confirmed Active (CA), Confirmed Moderate Active (CM) and Confirmed Inactive (CI). • We formed a dataset contains CA (423) and CM (1083). • Each graph contains 25 nodes and 27 edges on average

  12. Performance (PTE) Support Threshold (%) Support Threshold (%)

  13. Performance (DTP CACM) Support Threshold (%) Support Threshold (%)

More Related