270 likes | 285 Vues
This paper discusses the mining of significant graph patterns using leap search and objective functions, such as frequency, discriminative measures, and significance. The authors explore challenges such as non-monotonicity and propose a direct mining framework for graph clustering, classification, and database indexing. They also introduce the concept of optimal patterns and address scalability and efficiency. Additionally, the paper highlights the application of direct mining to itemsets, sequences, and trees. Thank you to the authors for their valuable contributions.
E N D
Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)
Graph Patterns • Interestingness measures / Objective functions • Frequency: frequent graph pattern • Discriminative: information gain, Fisher score • Significance: G-test • …
Objective Functions Challenge: Not Anti-Monotonic X
Challenge: Non Anti-Monotonic Non Monotonic Anti-Monotonic Enumerate subgraphs : small-size to large-size Non-Monotonic: Enumerate all subgraphs then check their score?
Frequent Pattern Based Mining Framework Exploratory task Graph clustering Graph classification Graph index Graph Database Optimal Patterns Frequent Patterns (SIGMOD’04, ’05) (ISMB’05, ’07) 1. Bottleneck : millions, even billions of patterns 2. No guarantee of quality
Direct Pattern Mining Framework Exploratory task Graph clustering Graph classification Direct Graph index Graph Database Optimal Patterns How?
Upper-Bound: Anti-Monotonic (cont.) Rule of Thumb : If the frequency difference of a graph pattern in the positive dataset and the negative dataset increases, the pattern becomes more interesting We can recycle the existing graph mining algorithms to accommodate non-monotonic functions.
Vertical Pruning Large <- small
Structural Proximity: Another Perspective # of frequent patterns >> # of possible frequency pairs Many patterns share the same score
Frequency Association Significant patterns often fall into the high-quantile of frequency Starting with the most frequent patterns
Descending Leap Mine 1. Structural Leap Search with frequency threshold 2. Support-Descending Mining F(g*) converges 3. Structural Leap Search
Results: NCI Anti-Cancer Screen Datasets Chemical Compounds: anti-cancer or not # of vertices: 10 ~ 200 Link: http://pubchem.ncbi.nlm.nih.gov
Efficiency Vertical Pruning Horizontal Pruning
Effectiveness (runtime) frequency descending frequency descending + leap mine
Effectiveness (accuracy) slightly different
Graph Classification (6x) (6x) *OA Kernel: Optimal Assignment Kernel LEAP: LEAP search
Scalability Means Something ! ~8000sec OA(6X) Quadratic OA ~200sec LEAP(6X) ~100sec Linear ~20sec LEAP
Direct Pattern Mining Framework Exploratory task Graph clustering Graph classification Direct Graph index Graph Database Optimal Graph Patterns
Beyond Graph Patterns 1. Direct mining can be applied to itemsets, sequences, and trees Exploratory task Clustering Classification Direct Index itemset/sequence/tree Database Optimal Patterns • Existing algorithms can be recycled to mine patterns with • sophisticated measures. • Pattern-based methods including indexing and classification • are competitive.
Thank you Direct Mining of Discriminative and Essential Graphical and Itemset Features via Model-based Search Tree SIGKDD’08 @ Las Vegas
Graph Classification: Kernel Approach • Kernel-based Graph Classification • Optimal Assignment Kernel(Fröhlich et al. ICML’05)