1 / 13

Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. Jun Huan, Wei Wang, Jan Prins ICDM 2003. Outline. Introduction Canonical Adjacency Matrix Join, Extension and Suboptimal CAMs SCAM Tree Conclusion. Introduction.

maisie
Télécharger la présentation

Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism Jun Huan, Wei Wang, Jan Prins ICDM 2003

  2. Outline • Introduction • Canonical Adjacency Matrix • Join, Extension and Suboptimal CAMs • SCAM Tree • Conclusion

  3. Introduction • Mining patterns from graph databases is challenging since graph related operation, such as subgraph testing, generally have higher time complexity than the corresponding operations on itemsets, sequences, and trees. • The problem of frequent subgraph mining is to find all frequent subgraphs from a graph database • Two challenging problem: • Subgraph isomorphism • An efficient scheme to enumerate all frequent subgraphs

  4. Introduction • In this paper, we developed FFSM(Fast Frequent Subgraph Mining) targeting efficient subgraph testing and a better candidate subgraph enumeration scheme. The key features: (1) a novel graph canonical form and two efficient candidate proposing operations: FFSM-Join and FFSM-Extension (2) an graph framework(suboptimal CAM tree) to guarantee that all frequent subgraphs are enumerated unambiguously (3) avoiding subgraph isomorphism testing by maintaining an embedding set for each frequent subgraph

  5. Canonical Adjacency Matrix(CAM) • In FFSM, every graph is represented By an adjacency matrix M (1) diagonal entry of M is filled with the label of the corresponding node (2) off-diagonal entry is filled with the label of the corresponding edge, or zero if there is no edge.

  6. CAM • Given an n x n adjacency matrix M of a graph G with n nodes • Define the code of M denoted by code(M) • code(M) : the sequence of lower triangular entries of M(including entries on the diagonal) in the order m1,1, m2,1, m2,2…mn,1, mn,2,…mn,n-1, mn,n • We use standard lexicographic order on sequences to define a total order of two arbitrary codes • The canonical form is the maximal code among all its possible codes

  7. Top : code(M1) = “axbxyb0yyb” >= code(M2)=“axboybxyyb” Bottom :

  8. Join, Extension and Suboptimal CAMs • The current methods for enumerating all the subgraphs might be classified into two categories:join & extension • Join : a single join might produce multiple candidates and that a candidate might be redundantly proposed by many join operations • Extension : to restrict the nodes that a newly introduced edge may attach to • To achieve efficient subgraph enumeration: (1) Can we design a join operation such that every distinct CAM is generated only once? (2) Can we improve the join operation such that only a few(at most two)CAMs are generated from a single join operation? (3) Can we design an extension operation such that every edge might be attached to only one node in a graph represented by its CAM?

  9. Join, Extension and Suboptimal CAMs • In order to tackle these challenges, we augment the CAM tree with a set of suboptimal CAM, and introduce two new operations : FFSM-Join and FFSM-Extension

  10. Join, Extension and Suboptimal CAMs • At the bottom of Fig-2 we show a case in which a graph might be redundantly proposed by FSG(62) = 15 times. As shown in the graph, FFSM-Join completely removes the redundancy after “sorting” the subgraph by their CAM. • Suboptimal CAM (SCAM) def : given a graph G, and it’s CAM.SCAM is the submatrix of CAM and it can represent the subgraph of G p.s. proper SCAM : it isn’t a CAM

  11. SCAM Tree • All SCAM of a graph G could be organized as a tree

  12. SCAM Tree • SCAM Tree is “complete” that all nodes could be enumerated by either a join or an extension operation.

  13. Conclusion • In this paper, it present a new algorithm FFSM which introducing two operations and a graph framework for reducing the number of redundant candidates • Experiment demonstrates that FFSM achieves a performance gain over the gSpan gSpan : build a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label. Base on the lexicographic order,gSpan adopts the depth-first search to mine frequent subgraph efficiently.

More Related