1 / 17

Bioinformatics: Spectral Clustering

Bioinformatics: Spectral Clustering. Mentee: Joonoh Lim Mentor: Sanketh Shetty. Background. Cluster analysis is an unsupervised method of determining groupings (clusters) in data sets.

hasana
Télécharger la présentation

Bioinformatics: Spectral Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics:Spectral Clustering Mentee: Joonoh Lim Mentor: SankethShetty

  2. Background • Cluster analysis is an unsupervised method of determining groupings (clusters) in data sets. • In general, cluster analysis is a common technique for statistical data analysis used in many fields: machine learning, data mining, pattern recognition, image analysis and bioinformatics. • In bioinformatics, it is used to study gene and gene expression.

  3. Types of Clustering Algorithms • Partitional Methods • K-means Clustering • Affinity Propagation • Spectral Clustering • Mean-shift Clustering • Normalized-cuts • Gaussian Mixture Models • Hierarchical Methods • Single linkage • Complete linkage • Average Linkage

  4. Advantage of Spectral Clustering • Very simple to implement • Can be solved efficiently by standard linear algebra • Invariant to cluster shapes and densities

  5. Spectral Clustering Vs. K-means Spectral Clustering Result apo.enseeiht.fr/pub/Jdoc09/sandrine.pdf

  6. Overview of clustering processfrom algorithm created by Ng, Jorda, and Weiss • 1. Given n data points, construct n-by-n distance matrix. • 2. Form similarity matrix (W) from the distance matrix. • 3. Form Laplacian matrix (Lsym) from the similarity matrix • 4. Compute the smallest k eigenvectors u1, u2,…,uk of Lsym. • 5. Form a n-by-k matrix U containing the vectors u1, u2,…,uk as columns. • 6. Form a matrix Y by normalizing each of the U’s rows • 7. Treat each row of Y as a point on the data set and cluster them in k clusters via k-means clustering method.

  7. How Spectral Clustering works:Graph cut point of view • Graph: abstract representation of a set of objects where some pairs of the objects are connected by links. • In spectral clustering, we want to find a partition of the graph such that the edges between different groups have a very low weight and the edges within a group have high weight wikipedia.com

  8. Objective of Spectral Clustering Algorithm • In order to get distinct clusters, we want to minimize :or, equivalently, fTLsymfwhere f is eigenvector. • So that we have very low weight for the edges between different groups and high weight for the edges within a group

  9. Details of Clustering Algorithm • 1. Given n data points, construct n-by-n distance matrix • 2. Form similarity matrix W from the distance matrix by applying Gaussian similarity function element-wise: • 3. Form Laplacian matrix from the similarity matrix by calculating: Lsym = I – D-1/2WD-1/2, where D is a degree matrix which is a diagonal matrix with • 4. Compute the first (smallest) k eigenvectors u1, u2,…,uk of L.

  10. Details of Clustering Algorithm • 5. Form a n-by-k matrix U containing the vectors u1, u2,…,uk as columns. • 6. Normalize the rows of U to norm 1, by setting for each element uij • 7. For i=1,2,…,n, let yi be the vector corresponding to the i-th row of U • 8. Using k-means, cluster the points yi for i=1,...,n into clusters * Note that the inputs of this algorithm are k and similarity matrix (and thus σ) * In addition, outputs are clusters C1,…,Ck and data points coupled with the index.

  11. Role of parameter σ (sigma) σ = 1 σ = 5 σ = 10 • Parameter σ assigns high weight to data points located within a circle of radius σ centered at each data point. • The greater σ is, the more points are assigned high weight, resulting in much less distinct clusters.

  12. σ

  13. Examples of Clustering Results

  14. Clustering results with different k Different color means different cluster *note that k = the number of clusters

  15. Applications • Cluster analysis is being used in many fields: • In biology: sequence analysis, gene analysis • In medicine: PET scans • And in market research, social network analysis, image segmentation, data mining, crime analysis, and so on.

  16. Acknowledgment • Mentor: • SankethV ShettyGraduate Research Assistant Computer Vision and Robotics Laboratory, Beckman Institute of Advanced Science and Technology

  17. Questions? • Thank you!

More Related