1 / 26

Multi-Label Feature Selection for Graph Classification

Multi-Label Feature Selection for Graph Classification. Xiangnan Kong, Philip S. Yu. Department of Computer Science University of Illinois at Chicago. Outline. Introduction Multi-Label Feature Selection for Graph Classification Experiments Conclusion. Introduction: Graph Data .

dympna
Télécharger la présentation

Multi-Label Feature Selection for Graph Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-Label Feature Selection for Graph Classification Xiangnan Kong, Philip S. Yu Department of Computer Science University of Illinois at Chicago

  2. Outline • Introduction • Multi-Label Feature Selection for Graph Classification • Experiments • Conclusion

  3. Introduction: Graph Data • Conventional data mining and machine learning approaches assume data are represented as feature vectors. E.g. (x1, x2, …, xd) - y • In real apps, data are not directly represented as feature vectors, but graphs with complex structures. • E.g. G(V, E, l) - y Chemical Compounds Program Flows XML Docs

  4. Introduction: Graph Classification • Graph Classification: • Construct a classification model for graph data • Example:drug activity prediction • Given a set of chemical compounds labeled with activitiesto one type of disease or virus • Predict active / inactive for a testing compound Training Graphs Testing Graph + ? -

  5. Graph Classification using Subgraph Features Subgraph Patterns H H g1 g2 g3 N H C H C … C C C C C C H H How to find a set of subgraphfeatures in order to effectively perform graph classification? C C O O C C N x1 C H H G1 C Classifier C … H N O 1 0 1 x1 H x2 H … 0 1 1 H H C C C H G2 x2 Feature Vectors C C H C H C C C C C C C C H O H O Graph Objects Feature Vectors Classifiers

  6. Existing Methods for Subgraph Feature Selection • Feature Selection for Graph Classification • Find a set of useful subgraph features for classification • Existing Methods • Select discriminative subgraph features • Focused on single-label settings • Assume one graph can only have one label C C C C C H H + - O O C C N C Graphs + Lung Cancer Useful Subgraphs • Graph • Label

  7. Multi-Label Graphs • In many real apps, one graph can have multiple labels. + Breast Cancer - Lung Cancer + Melanoma • Graph • Labels • Anti-Cancer Drug Prediction

  8. Multi-Label Graphs • Other Applications: • XML Document Classification • (One document -> multiple tags) • Program flow error detection • (One program -> multiple types of errors) • Kinase Inhibitor Discovery • (One chemical -> multiple types of kinase) • …

  9. Multi-Label Feature Selection for Graph Classification Evaluation Criteria F(p) b x a a a b b Multi-label Classification Multi-LabelGraphs Subgraph features c c c • Find useful subgraph features for graphs with multiple labels

  10. Two Key Questions to Address • Evaluation: How to evaluate a set of subgraph features using multiple labels of the graphs? (effective) • Search Space Pruning: How to prune the subgraph search space using multiple labels of the graphs? (efficient)

  11. What is a good feature? • Dependence Maximization Maximize dependence between the features and the multiple labels of graphs • Assumption Graphs with similar label sets should have similar features. a d a a d d b e b c e c f 1 f 2

  12. Dependence Measure • Hilbert-Schmidt Independence Criterion (HSIC) [Gretton et al. 05] • Evaluates the dependence between input feature and label vectors in kernel space. • Empirical Estimate is easy to calculate a a b c c HSIC = • KS : kernel matrix for graphs • KS [i, j] : measures the similarity between graph i and j on the common subgraph features they contain (in S) • L : kernel matrix for label vectors • L[i, j] : measures the similarity between label sets of graph i and graph j • H = I – 11T/n : centering matrix using common subgraph features in S using label vectors in {0,1}Q

  13. Optimization -> gHSIC Criterion • gHSIC Score: • Objective: MaximizeDependence (HSIC) H good H N bad • (the sum over all • selected features) gHSICScore C C C C represents the i-th subgraph feature

  14. Two Key Questions to Address • How to evaluate a set of subgraph features with multiple labels of the graphs? (effective) • How to prune the subgraph search space using multiple labels of the graphs? (efficient)

  15. Finding a Needle in a Haystack Pattern Search Tree • gSpan[Yan et. al ICDM’02] • An efficient algorithm to enumerate all frequent subgraph patterns (frequency ≥ min_support) ┴ 0-edges 1-edge 2-edges … • Too many frequent subgraph patterns • Find the mostuseful one(s) usingmultiple labels How to find the Best node(s) in this tree without searching all the nodes? (Branch and Bound to prune the search space) not frequent

  16. gHSIC Upper Bound • gHSIC: represents the i-th subgraph feature • An Upper-Bound of gHSIC: gHSIC-UB = • Upper-Bound of gHSIC scores for all supergraphs of the • Anti-monotonic with subgraph frequency • ----> Pruning

  17. Pruning Principle gHSIC Pattern Search Tree best subgraph so far current node … best score so far H H N upper bound current score H H sub-tree If best score ≥upper bound We can prune the entire sub-tree C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C … …

  18. Experiment Setup • Four methods are compared: • Multi-labelfeature selection +Multi-label classification • gMLC[This Paper] + BoosTexter [Schapire & Singer 00] • Multi-label feature selection +Binary classification • gMLC[This Paper]+ BR-SVM [Boutell et al 04](Binary Relevance) • Single-label feature selection +Binary classification • BR (Binary Relevance)+ Information Gain + SVM • Top-k frequent subgraphs+Multi-label classification • gSpan[Yan & Han 02] + BoosTexter[Schapire & Singer 00]

  19. Data Sets • Three multi-label graph classification tasks: • Anti-cancer activity prediction • Toxicology prediction of chemical compounds • Kinase inhibitor prediction

  20. Evaluation • Multi-Label Metrics [Elisseef&Weston NIPS’02] • Ranking Loss ↓ • Average number of label pairs being ranked incorrectly • The smaller the better • Average Precision ↑ • Average fraction of correct labels in top ranked labels • The larger the better • 10 times 10-fold cross-validation

  21. Experiment Results Ranking Loss 1 – AvePrec Anti-Cancer dataset PTC dataset Kinase Inhibition dataset

  22. Experiment Results Anti-Cancer Dataset • Our approach with multi-label classifier performed best at NCI and PTC datasets Single-Label FS + Single-label Classifiers Ranking Loss (lower is better) Multi-Label FS+ Single-label Classifiers Unsupervised FS + Multi-label Classifier Multi-Label FS + Multi-label Classifier # Selected Features

  23. Pruning Results Running Time #Subgraph Explored

  24. Pruning Results Without gHSIC pruning Running time (seconds) (lower is better) gHSIC pruning (anti-cancer dataset)

  25. Pruning Results Without gHSIC pruning # Subgraphs explored (lower is better) gHSIC pruning (anti-cancer dataset)

  26. Conclusions • Multi-Label Feature Selection for Graph Classification • Evaluating subgraph features using multiple labels of the graphs (effective) • Branch&boundpruning the search space using multiple labels of the graphs (efficient) Thank you!

More Related