1 / 65

Marginalized Kernels & Graph Kernels

Marginalized Kernels & Graph Kernels. Max Planck Institute for Biological Cybernetics Koji Tsuda. Kernels and Learning. In Kernel-based learning algorithms, problem solving is now decoupled into: A general purpose learning algorithm (e.g. SVM, PCA, … ) – Often linear algorithm

Télécharger la présentation

Marginalized Kernels & Graph Kernels

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Marginalized Kernels & Graph Kernels Max Planck Institute for Biological Cybernetics Koji Tsuda

  2. Kernels and Learning • In Kernel-based learning algorithms, problem solving is now decoupled into: • A general purpose learning algorithm (e.g. SVM, PCA, …) – Often linear algorithm • A problem specific kernel Simple (linear) learning algorithm Complex Learning Task Specific Kernel function

  3. Current Synthesis • Modularity and re-usability • Same kernel ,different learning algorithms • Different kernels, same learning algorithms Data 1 (Sequence) Learning Algo 1 Kernel 1 Gram Matrix (not necessarily stored) Data 2 (Network) Learning Algo 2 Kernel 2 Gram Matrix

  4. Lectures so far • Kernel represents the similarity between two objects, defined as the dot-product in thefeature space • Various String Kernels • Importance of Positive Definiteness

  5. Kernel Methods : the mapping f f f Original Space Feature (Vector) Space

  6. Overview of this lecture • Marginalized kernels • General idea about defining kernels using latent variables • An example in string kernel • Marginalized Graph Kernels • Kernel for labeled graphs (~ several hundred nodes) • Similarity for chemical compounds (drug discovery) • Diffusion Kernels • Closeness between nodes of a network • Used for function prediction of proteins based on biological networks (protein-protein interaction nets)

  7. Marginalized kernels K. Tsuda, T. Kin, and K. Asai. Marginalized kernels for biological sequences Bioinformatics, 18(Suppl. 1):S268-S275, 2002.

  8. Biological Sequences:Classification Tasks • DNA sequences (A,C,G,T) • Gene Finding, Splice Sites • RNA sequences (A,C,G,U) • MicroRNA discovery, Classification into Rfam families • Amino Acid Sequences (20 symbols) • Remote Homolog Detection, Fold recognition

  9. Structures hidden in sequences (I) • Exon/intron of DNA (Gene)

  10. Structures hidden in sequences (II) • It is crucial to infer hidden structures and exploit them for classification RNA Secondary Structure Protein 3D Structures

  11. Hidden Markov Models • Visible Variable : Symbol Sequence • Hidden Variable : Context • HMM has parameters • Transition Probability • Emission Probability • HMM models the joint probability

  12. HMM for gene finding Engineered HMM: Some parameters are set to constants a priori Reflect prior knowledge about the sequence

  13. Training Hidden Markov Models • Training examples consist of string-context pairs • E.g., Fragments of DNA sequences with known splice sites • Parameters are estimated by the maximizing likelihood

  14. Using trained hidden Markov models to estimate the context • A trained HMM can compute the posterior probability • Given the sequence x, what is the probability of the context h? • You can never predict the context perfectly! x: A C C T G T A A A 0.0003 h: 1 2 1 2 2 2 2 1 1 0.0006 h: 2 2 1 1 1 1 2 1 1

  15. Kernels for Sequences • Similarity between sequences of different lengths • How do you use the trained HMM for computing the kernel? ACGGTTCAA ATATCGCGGGAA

  16. Count Kernel • Inner product between symbol counts • Extension: Spectrum kernels (Leslie et al., 2002) • Counts the number of k-mers (k-grams) efficiently • Not good for sequences with frequent context change • E.g., coding/non-coding regions in DNA

  17. Hidden Markov Models for Estimating Context • Visible Variable : Symbol Sequence • Hidden Variable : Context • HMM can estimate the posterior probability of hidden variables from data

  18. Marginalized kernels • Design a joint kernel for combined • Hidden variable is not usually available • Take expectation with respect to the hidden variable • The marginalized kernel for visible variables

  19. Designing a joint kernel for sequences • Symbols are counted separately in each context • :count of a combined symbol (k,l) • Joint kernel: count kernel with context information

  20. Marginalization of the joint kernel • Joint kernel • Marginalized count kernel

  21. Computing Marginalized Counts from HMM • Marginalized count is described as • Posterior probability of i-th hidden variable is efficiently computed by dynamic programming

  22. 2nd order marginalized count kernel • If adjacent relations between symbols have essential meanings,the count kernel is obviously not sufficient • 2nd order marginalized count kernel • 4 neighboring symbols (i.e. 2 visible and 2 hidden) are combined and counted

  23. Protein clustering experiment • 84 proteins containing five classes • gyrB proteins from five bacteria species • Clustering methods • HMM + {FK,MCK1,MCK2}+K-Means • Evaluation • Adjusted Rand Index (ARI)

  24. Kernel Matrices

  25. Clustering Evaluation

  26. Applications since then.. • Marginalized Graph Kernels (Kashima et al., ICML 2003) • Sensor networks (Nyugen et al., ICML 2004) • Labeling of structured data (Kashima et al., ICML 2004) • Robotics (Shimosaka et al., ICRA 2005) • Kernels for Promoter Regions (Vert et al., NIPS 2005) • Web data (Zhao et al., WWW 2006) • Multiple Instance Learning (Kwok et al., IJCAI 2007)

  27. Summary (Marginalized Kernels) • General Framework for using generative model for defining kernels • Fisher kernel as a special case • Broad applications • Combination with CRFs and other advanced stuff?

  28. 2. Marginalized Graph Kernels H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. ICML 2003,pages 321-328, 2003.

  29. Serial Num Name Age Sex Address … 0001 ○○ 40 Male Tokyo … 0002 ×× 31 Female Osaka … Motivations for graph analysis • Existing methods assume ” tables” • Structured data beyond this framework → New methods for analysis

  30. Graphs..

  31. A C G C UA CG CG U U U U Graph Structures in Biology • Compounds • DNA Sequence • RNA H C C C H H O C C H C H H

  32. Marginalized Graph Kernels (Kashima, Tsuda, Inokuchi, ICML 2003) • Going to define the kernel function • Both vertex and edges are labeled

  33. Label path • Sequence of vertex and edge labels • Generated by random walking • Uniform initial, transition, terminal probabilities

  34. Path-probability vector

  35. A c D b E B c D a A Kernel definition • Kernels for paths • Take expectation over all possible paths! • Marginalized kernels for graphs

  36. Transition probability : Initial and terminal : omitted • : Set of paths ending at v • KV : Kernel computed from the paths ending at (v, v’) • KV is written recursively • Kernel computed by solving linear equations (polynomial time) A(v’) v v’ A(v) Computation

  37. Graph Kernel Applications • Chemical Compounds (Mahe et al., 2005) • Protein 3D structures (Borgwardt et al, 2005) • RNA graphs (Karklin et al., 2005) • Pedestrian detection • Signal Processing

  38. Predicting Mutagenicity • MUTAG benchmark dataset • Mutation of Salmonella typhimurium • 125 positive data (effective for mutations) • 63 negative data (not effective for mutations) Mahe et al. J. Chem. Inf. Model., 2005

  39. Classification of Protein 3D structures • Graphs for protein 3D structures • Node: Secondary structure elements • Edge: Distance of two elements • Calculate the similarity by graph kernels Borgwardt et al. “Protein function prediction via graph kernels”, ISMB2005

  40. Classification of proteins: Accuracy Borgwardt et al. “Protein function prediction via graph kernels”, ISMB2005

  41. Pedestrian detection in images (F. Suard et al., 2005)

  42. Classifying RNA graphs (Y. Karklin et al.,, 2005)

  43. Strong points of MGK • Polynomial time computation O(n^3) • Positive definite kernel • Support Vector Machines • Kernel PCA • Kernel CCA • And so on…

  44. Diffusion Kernels: Biological Network Analysis

  45. Biological Networks • Protein-protein physical interaction • Metabolic networks • Gene regulatory networks • Network induced from sequence similarity • Thousands of nodes (genes/proteins) • 100000s of edges (interactions)

  46. Physical Interaction Network

  47. Physical Interaction Network • Undirected graphs of proteins • Edge exists if two proteins physically interact • Docking (Key – Keyhole) • Interacting proteins tend to have the same biological function

  48. Metabolic Network

  49. Oxaloacetate Metabolic Network • Node: Chemical compounds • Edge: Enzyme catalyzing the reaction (EC Number) • KEGG Database (Kyoto University) • Collection of pathways (subnetworks) • Can be converted as a network of enzymes (proteins) (S)-Malate Fumarate 4.2.1.2 1.1.1.37

  50. Protein Function Prediction • For some proteins, their functions are known • But still functions of many proteins are unknown

More Related