Multi-Label Feature Selection for Graph Classification

Multi-Label Feature Selection for Graph Classification Xiangnan Kong, Philip S. Yu Department of Computer Science University of Illinois at Chicago

Outline • Introduction • Multi-Label Feature Selection for Graph Classification • Experiments • Conclusion

Introduction: Graph Data • Conventional data mining and machine learning approaches assume data are represented as feature vectors. E.g. (x1, x2, …, xd) - y • In real apps, data are not directly represented as feature vectors, but graphs with complex structures. • E.g. G(V, E, l) - y Chemical Compounds Program Flows XML Docs

Introduction: Graph Classification • Graph Classification: • Construct a classification model for graph data • Example:drug activity prediction • Given a set of chemical compounds labeled with activitiesto one type of disease or virus • Predict active / inactive for a testing compound Training Graphs Testing Graph + ? -

Graph Classification using Subgraph Features Subgraph Patterns H H g1 g2 g3 N H C H C … C C C C C C H H How to find a set of subgraphfeatures in order to effectively perform graph classification? C C O O C C N x1 C H H G1 C Classifier C … H N O 1 0 1 x1 H x2 H … 0 1 1 H H C C C H G2 x2 Feature Vectors C C H C H C C C C C C C C H O H O Graph Objects Feature Vectors Classifiers

Existing Methods for Subgraph Feature Selection • Feature Selection for Graph Classification • Find a set of useful subgraph features for classification • Existing Methods • Select discriminative subgraph features • Focused on single-label settings • Assume one graph can only have one label C C C C C H H + - O O C C N C Graphs + Lung Cancer Useful Subgraphs • Graph • Label

Multi-Label Graphs • In many real apps, one graph can have multiple labels. + Breast Cancer - Lung Cancer + Melanoma • Graph • Labels • Anti-Cancer Drug Prediction

Multi-Label Graphs • Other Applications: • XML Document Classification • (One document -> multiple tags) • Program flow error detection • (One program -> multiple types of errors) • Kinase Inhibitor Discovery • (One chemical -> multiple types of kinase) • …

Multi-Label Feature Selection for Graph Classification Evaluation Criteria F(p) b x a a a b b Multi-label Classification Multi-LabelGraphs Subgraph features c c c • Find useful subgraph features for graphs with multiple labels

Two Key Questions to Address • Evaluation: How to evaluate a set of subgraph features using multiple labels of the graphs? (effective) • Search Space Pruning: How to prune the subgraph search space using multiple labels of the graphs? (efficient)

What is a good feature? • Dependence Maximization Maximize dependence between the features and the multiple labels of graphs • Assumption Graphs with similar label sets should have similar features. a d a a d d b e b c e c f 1 f 2

Dependence Measure • Hilbert-Schmidt Independence Criterion (HSIC) [Gretton et al. 05] • Evaluates the dependence between input feature and label vectors in kernel space. • Empirical Estimate is easy to calculate a a b c c HSIC = • KS : kernel matrix for graphs • KS [i, j] : measures the similarity between graph i and j on the common subgraph features they contain (in S) • L : kernel matrix for label vectors • L[i, j] : measures the similarity between label sets of graph i and graph j • H = I – 11T/n : centering matrix using common subgraph features in S using label vectors in {0,1}Q

Optimization -> gHSIC Criterion • gHSIC Score: • Objective: MaximizeDependence (HSIC) H good H N bad • (the sum over all • selected features) gHSICScore C C C C represents the i-th subgraph feature

Two Key Questions to Address • How to evaluate a set of subgraph features with multiple labels of the graphs? (effective) • How to prune the subgraph search space using multiple labels of the graphs? (efficient)

Finding a Needle in a Haystack Pattern Search Tree • gSpan[Yan et. al ICDM’02] • An efficient algorithm to enumerate all frequent subgraph patterns (frequency ≥ min_support) ┴ 0-edges 1-edge 2-edges … • Too many frequent subgraph patterns • Find the mostuseful one(s) usingmultiple labels How to find the Best node(s) in this tree without searching all the nodes? (Branch and Bound to prune the search space) not frequent

gHSIC Upper Bound • gHSIC: represents the i-th subgraph feature • An Upper-Bound of gHSIC: gHSIC-UB = • Upper-Bound of gHSIC scores for all supergraphs of the • Anti-monotonic with subgraph frequency • ----> Pruning

Pruning Principle gHSIC Pattern Search Tree best subgraph so far current node … best score so far H H N upper bound current score H H sub-tree If best score ≥upper bound We can prune the entire sub-tree C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C … …

Experiment Setup • Four methods are compared: • Multi-labelfeature selection +Multi-label classification • gMLC[This Paper] + BoosTexter [Schapire & Singer 00] • Multi-label feature selection +Binary classification • gMLC[This Paper]+ BR-SVM [Boutell et al 04](Binary Relevance) • Single-label feature selection +Binary classification • BR (Binary Relevance)+ Information Gain + SVM • Top-k frequent subgraphs+Multi-label classification • gSpan[Yan & Han 02] + BoosTexter[Schapire & Singer 00]

Data Sets • Three multi-label graph classification tasks: • Anti-cancer activity prediction • Toxicology prediction of chemical compounds • Kinase inhibitor prediction

Evaluation • Multi-Label Metrics [Elisseef&Weston NIPS’02] • Ranking Loss ↓ • Average number of label pairs being ranked incorrectly • The smaller the better • Average Precision ↑ • Average fraction of correct labels in top ranked labels • The larger the better • 10 times 10-fold cross-validation

Experiment Results Ranking Loss 1 – AvePrec Anti-Cancer dataset PTC dataset Kinase Inhibition dataset

Experiment Results Anti-Cancer Dataset • Our approach with multi-label classifier performed best at NCI and PTC datasets Single-Label FS + Single-label Classifiers Ranking Loss (lower is better) Multi-Label FS+ Single-label Classifiers Unsupervised FS + Multi-label Classifier Multi-Label FS + Multi-label Classifier # Selected Features

Pruning Results Running Time #Subgraph Explored

Pruning Results Without gHSIC pruning Running time (seconds) (lower is better) gHSIC pruning (anti-cancer dataset)

Pruning Results Without gHSIC pruning # Subgraphs explored (lower is better) gHSIC pruning (anti-cancer dataset)

Conclusions • Multi-Label Feature Selection for Graph Classification • Evaluating subgraph features using multiple labels of the graphs (effective) • Branch&boundpruning the search space using multiple labels of the graphs (efficient) Thank you!

Multi-Label Feature Selection for Graph Classification

Multi-Label Feature Selection for Graph Classification

Presentation Transcript

Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier

Feature Selection in Nonlinear Kernel Classification

Feature Selection in Nonlinear Kernel Classification

Large Scale Multi-Label Classification

Multi-Label Collective Classification

Classification and Feature Selection for Craniosynostosis

Effective Multi-Label Active Learning for Text Classification

Feature selection

Semi-Supervised Feature Selection for Graph Classification

Feature Selection

Feature Selection in Classification and R Packages

Feature Selection

Unsupervised Feature Selection for Multi-Cluster Data

Graph-based Iterative Hybrid Feature Selection

Multi-Label Collective Classification

Dual Active Feature and Sample Selection for Graph Classification

Feature Selection Stability Analysis for Classification Using Microarray Data

Feature Selection

Feature selection

Feature selection

Classification and Feature Selection Algorithms for Multi-class CGH data