A Fast Jensen-Shannon Subgraph Kernel

A Fast Jensen-Shannon Subgraph Kernel Lu Bai, Edwin R. Hancock Department of Computer Science, University of York, UK

Outline • Literature Review: State of the Art Graph Kernels • Existing graph kernel methods：Graph kernels based on comparing all pairs of a) walks, b) paths and c) restricted subgraph or subtree structures. • A Novel Graph Kernel: A fast Jensen-Shannon subgraph kernel • A Shannon entropy associated with a steady state random walk on a graph • The Jensen-Shannon divergence • A fast Jensen-Shannon diffusion kernel • A fast depth-based representation of a graph • A fast Jensen-Shannon subgraph kernel • Experiments • Conclusion

Literature Review: Graph Kernels • Graph Kernels: Similarity Measure of Pairwise Graphs • Kernel offers an elegant solution tothe cost of computation on highdimensional feature space[K. Riesen, and H. Bunk, 2009, Pattern Recognition] • Existing Graph Kernels (i.e Graph Kernels from the R-convolution [Haussler, 1999]) Generally Fall into Three Classes: • Restricted subgraphs or subtrees based kernels • Weisfeiler-Lehman subtree kernel[Shevashidze et al., 2009, NIPS] • Random walk based kernels • Product graph kernels[Gartner et al., 2003, ICML] • Marginalized kernels on graphs[Kashima et al., 2003, ICML] • Path based kernels • The shortest path kernel[K. m. Borgwardt, 2005, ICDM]

Our Motivation • Limitations of existing graph kernel • Existing graph kernel can not scale up to substructures of large sizes (e.g. (sub)graphs with hundreds or even thousands vertices). Therefore, the graph kernels compromise to substructures of limited sizes which can just roughly capture topological arrangements of a graph. • Unfortunately, even for relatively small subgraphs, most graph kernels still require significant computational overheads. • Motivation: The aim of this paper is to develop a novel subgraph kernel for efficient computation, even when a pair of fully sized subgraphs are compared.

Our Motivation We investigate how to kernelize depth-based graph representations in terms of measuring information content similarities for K-layer subgraphs using the Jensen-shannon divergence. First, we describe how to compute a fast Jensen-Shannon diffusion kernel for a pair of (sub)graphs. Second, we describe how to compute a fast depth-based graph representation. Third, we describe how to compute a fast Jensen-Shannon subgraph kernel by measuring the depth-based representations of graphs in terms of the Jensen-Shannon diffusion kernel.

The Jensen-Shannon Diffusion Kernel • Entropies of graphs • Consider a graph , the adjacency matrix has elements • The vertex degree matrix of is given by • Shannon entropy: The probability of a random walk on - visiting vertex vi is . The Shannon entropy of is • von Neumann entropy: The von Neumann entropy of is

The Jensen-Shannon Diffusion Kernel • The composite entropy of a pair of graphs • A disjoint union of a pair of graph: the disjoint union graph ofGp(Vp,Ep) andGq(Vq, Eq) is • The entropy of a disjoint union for a pair of graphs: Let graphs Gp and Gq be the connected components of the disjoint union graph GDU, and p = |V (Gp)|/|V (GDU )| and q = |V (Gq)|/|V (GDU)|. The entropy (i.e. the composite entropy) of GDU is Here the entropy function H(·) could be either the is the Shannon entropy HS(·) or the von Neumann entropy HN(.).

The Jensen-Shannon Diffusion Kernel • The Jensen-Shannon diffusion kernel for graphs: • For a pair of graphs Gp(Vp, Ep) and Gq(Vq, Eq), the Jensen-Shannon divergence between the pair of graphs is where is the composite structure graph of the pair of (sub)graphs using the disjoint union. • The Jensen-Shannon diffusion kernel kJS for Gp(Vp, Ep) and Gq(Vq, Eq) is where H(·) could be either the is the Shannon entropy HS(·) or the von Neumann entropy HN(.). • The Jensen-Shannon diffusion kernel is positive definite (pd). This follows the definitions in [Kondor and Lafferty, 2002, ICML], if a dissimilarity measure sG(Gp, Gq) between a pair of graphs Gp(Vp, Ep) and Gq(Vq, Eq) satisfies symmetry, then a diffusion kernel ks = exp(-lamda sG(Gp, Gq))associated with the similarity measure sG(Gp, Gq) is pd. • Time Complexity: For a pair of graphs Gp(Vp,Ep) and Gq(Vq,Eq) both having n vertices, computing the Jensen-Shannon diffusion kernel kJS(Gp,Gq) requires time complexity O(n^2).

The Depth-Based Representation of A Graph • Subgraphs from the Centroid Vertex • For a graph G(V,E), the matrix SG whose element SG(i, j) represents the shortest path length between vertices vi and vj is referred to as the shortest path matrix for graph G(V,E). The average-shortest-path vector SV for G(V,E) is a vector with element representing the average shortest path length from vertex vi to the remaining vertices. We then locate the centroid vertex for G(V,E) as • Let be a subset of V satisfying . For G(V,E) with the centroid vertex, the K-layer centroid expansion subgraph is • The depth-based representation of a graph: For a graph G(V,E),we obtain a family of centroid expansion subgraphs , the depth-based representation of G(V,E) is defined as where H(·) could be either the is the Shannon entropy HS(·) or the von Neumann entropy HN(.). [Bai and Hancock, 2014, Pattern Recognition]

The Depth-Based Representation of A Graph • An example of the depth-based representation for a graph from the centroid vertex

The Fast Jensen-Shannon Subgraph Kernel • For a pair of graphs Gp(Vp, Ep) and Gq(Vq, Eq), we develop a similarity measure between their depth-basedrepresentationsD(Gp) and D(Gq) as follows where sH(H(Gp;K),H(Gq;K))is an entropy-based similarity measure for the K-layer subgraphs Gp;K and Gq;K of Gp(Vp, Ep) and Gq(Vq, Eq). By using the Jensen-Shannon diffusion kernel kJS(.,.) in as the entropy-based similarity measure sH(.,.), the similarity between the depth-based representations D(Gp) and D(Gq) is formulated as the sum of the diffusion kernel measures for all the pairs of K-layer subgraphs of Gp(Vp,Ep) and Gq(Vq,Eq). As a result, the Jensen-Shannon subgraph kernel is defined as • The Jensen-Shannon subgraph kernel is pd. Because, the proposed subgraph kernel is the sum of the positive Jensen-Shannon diffusion kernel.

The Fast Jensen-Shannon Subgraph Kernel • Time complexity: The computation of the proposed subgraph kernel between a pair of graphs, each of which has n vertices and m edges, requires time complexity O(n^2L + mn). Here L is the largest layer of the expansion subgraph for the pair of graphs. The reason for this is that computing the depth–based representaion of a graph requires time complexity O(n^2L+mn). Furthermore, the Jensen-Shannon diffusion kernel between each pair of expansion subgraphs requires time complexity O(n^2). As a result, the whole time complexity is O(n^2L + mn).

The Fast Jensen-Shannon Subgraph Kernel • We make five observations regarding the proposed subgraph kernel as follows. • a) We observe that the required von Neumann entropy HN is associated with the degrees of connected vertices. Hence, the proposed subgraph kernel associated with the von Neumann entropy is sensitive to the interconnections between vertex clusters within a graph. • b) We observe that for the Shannon entropy HS vertices with large degrees will dominate the entropy value. Hence, the proposed subgraph kernel associated with the Shannon entropy HS is suited to characterizing graphs possessing a group of highly interconnected vertices, i.e. a dominant cluster. • c) The depth-based representation of a graph G(V,E) exhibits high dimensional depth-based entropy complexity characteristics via the centroid expansion subgraphs. This enables our subgraph kernel to capture richer complexity based information than that obtained from straightforwardly applying Jensen-Shannon diffusion kernel to original graphs. • d) The proposed subgraph kernel only compares the pairs of subgraphs with the same layer size K. This avoids enumerating all the pairs of subgraphs and renders an efficient computation. • e) For a pair of graphs, the proposed subgraph kernel can also efficiently measure the similarity of their L-layer subgraphs (i.e. the two graphs themselves). Hence, our subgraph kernel overcomes the subgraph size restriction which commonly arises in existing graph kernels.

Experiments (New, not in the paper) • We evaluate the classification performance of our kernel using 10-fold cross validation associated with C-Support Vector Machine. (Intel i5 3210M 2.5GHz) • Classification of graphs abstracted from bioinformatics and computer vision databases. This datasets include: GatorBait (3D shapes), DD, COIL5 (images), CATH1, CATH2. • Graph kernels for comparisons include: a) our kernels: 1) our kernel using the Shannon entropy (JSSS) 2) our kernel using the von Neumann entropy (JSSV) b) the Weisfeiler-Lehman subtree kernel (WL)[Shevashidze et al., 2009, NIPS] c) the shortest path graph kernel (SPGK)[Borgwardt et al., 2005, ICDM] d) the graphlet count kernel (GCGK)[Shevashidze and Borgwardt. 2009, ICML]

Experiments • Details of the datasets

Experiments Results

Conclusion and Further Work Conclusion Further Work

Acknowledgments Prof. Edwin R. Hancock is supported by a Royal Society Wolfson Research Merit Award. We thank Prof. Karsten Borgwardt and Dr. Nino Shervashidze for providing the Matlab implementation for the various graph kernel methods, and Dr. Geng Li for providing the graph datasets. We thank Dr. Peng Ren for the constructive discussion and suggestion about the further work.

Reference • 1. Scholkopf, B., Smola, A.: Learning with Kernels. MIT Press (2002). • 2. Haussler, D.: Convolution kernels on discrete structures. In: Technical Report UCS-CRL-99-10, Santa Cruz, CA, USA (1999). • 3. Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proceedings of International Conference on Machine Learning. (2003) 321–328. • 4. Borgwardt, K.M., Kriegel, H.P.: Shortest-path kernels on graphs. In: Proceedings of the IEEE International Conference on Data Mining. (2005) 74–81. • 5. Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-lehman graph kernels. Journal of Machine Learning Research 1 (2010) 1–48. • 6. Lamberti, P., Majtey, A., Borras, A., Casas, M., Plastino, A.: Metric character of the quantum jensen-shannon divergence. Physical Review A 77 (2008) 052311. • 7. Bai, L., Hancock, E.R.: Graph kernels from the jensen-shannon divergence. Journal of Mathematical Imaging and Vision (To appear). • 8. Majtey, A., Lamberti, P., Prato, D.: Jensen-shannon divergence as a measure of distinguishability between mixed quantum states. Physical Review A 72 (2005) 052310. • 9. Farhi, E., Gutmann, S.: Quantum computation and decision trees. Physical Review A 58 (1998) 915. • 10. Dirac, P.: The Principles of Quantum Mechanics (4ed). Oxford Science Publications (1958). • 11. Kempe, J.: Quantum random walks: an introductory overview. Contemporary Physics 44 (2003) 307–327. • 12. Nielsen, M., Chuang, I.: Quantum computation and quantum information. Cambridge university press (2010). • 13. Martins, A.F., Smith, N.A., Xing, E.P., Aguiar, P.M., Figueiredo, M.A.: Nonextensive information theoretic kernels on measures. Journal of Machine Learning Research 10 (2009) 935–975. • 14. Konder, R., Lafferty, J.: Diffusion kernels on graphs and other discrete input spaces. In: Proceedings of International Conference on Machine Learning. (2002) 315–322. • 15. Neuhaus, M., Bunke, H.: Bridging the gap between graph edit distance and kernel machines. World Scientific (2007). • 16. Escolano, F., Hancock, E.R., Lozano, M.A.: Heat diffusion: Thermodynamic depth complexity of networks. Physical Review E 85 (2012) 036206. • 17. Dehmer, M.: Information processing in complex networks: Graph entropy and information functionals. Applied Mathematics and Computation 201 (2008) 82–94. • 18. Ren, P., Wilson, R.C., Hancock, E.R.: Graph characterization via ihara coefficients. IEEE Transactions on Neural Networks 22 (2011) 233–245. • 19. Scholkopf, B., Smola, A.J., M¨uller, K.R.: Kernel principal component analysis. In: Proceedings of International Conference on Artificial Neural Networks. (1997) 583–588. • 20. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. Scholkopf, B., Burges, C.J.C., and Smola, A.J. (Eds.) Advances in Kernel Methods (1999) 185–208. • 21. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2011). • 22. Bai, L., Hancock, E.R., Ren, P.: A jensen-shannon kernel for hypergraphs. In: SSPR/SPR. (2012) 181–189. • 23. Ren, P., Aleksic, T., Emms, D., Wilson, R., Hancock, E.: Quantum walks, ihara zeta functions and cospectrality in regular graphs. Quantum Information Processing 10 (2011) 405–417.

Thank you!

A Fast Jensen-Shannon Subgraph Kernel

A Fast Jensen-Shannon Subgraph Kernel

Presentation Transcript

Relevant Subgraph Extraction

The Subgraph Testing Model

Shannon

Ontology-based Subgraph Querying

Fast Jensen-Shannon Graph Kernel

Shannon A. Warren

A very simple kernel

Fast Dynamic Binary Translation for the Kernel

Frequent Subgraph Mining

Two subgraph maximization problems

What is a Kernel

Sampling a web subgraph

Kernel Density Estimation, Kernel Methods, and fast learning

Fast Methods for Kernel-based Text Analysis

An Edge-Based Framework for Fast Subgraph Matching in a Large Graph

Sampling a web subgraph

Building a Linux Kernel