Bioinformatics: Spectral Clustering

Bioinformatics:Spectral Clustering Mentee: Joonoh Lim Mentor: SankethShetty

Background • Cluster analysis is an unsupervised method of determining groupings (clusters) in data sets. • In general, cluster analysis is a common technique for statistical data analysis used in many fields: machine learning, data mining, pattern recognition, image analysis and bioinformatics. • In bioinformatics, it is used to study gene and gene expression.

Types of Clustering Algorithms • Partitional Methods • K-means Clustering • Affinity Propagation • Spectral Clustering • Mean-shift Clustering • Normalized-cuts • Gaussian Mixture Models • Hierarchical Methods • Single linkage • Complete linkage • Average Linkage

Advantage of Spectral Clustering • Very simple to implement • Can be solved efficiently by standard linear algebra • Invariant to cluster shapes and densities

Spectral Clustering Vs. K-means Spectral Clustering Result apo.enseeiht.fr/pub/Jdoc09/sandrine.pdf

Overview of clustering processfrom algorithm created by Ng, Jorda, and Weiss • 1. Given n data points, construct n-by-n distance matrix. • 2. Form similarity matrix (W) from the distance matrix. • 3. Form Laplacian matrix (Lsym) from the similarity matrix • 4. Compute the smallest k eigenvectors u1, u2,…,uk of Lsym. • 5. Form a n-by-k matrix U containing the vectors u1, u2,…,uk as columns. • 6. Form a matrix Y by normalizing each of the U’s rows • 7. Treat each row of Y as a point on the data set and cluster them in k clusters via k-means clustering method.

How Spectral Clustering works:Graph cut point of view • Graph: abstract representation of a set of objects where some pairs of the objects are connected by links. • In spectral clustering, we want to find a partition of the graph such that the edges between different groups have a very low weight and the edges within a group have high weight wikipedia.com

Objective of Spectral Clustering Algorithm • In order to get distinct clusters, we want to minimize :or, equivalently, fTLsymfwhere f is eigenvector. • So that we have very low weight for the edges between different groups and high weight for the edges within a group

Details of Clustering Algorithm • 1. Given n data points, construct n-by-n distance matrix • 2. Form similarity matrix W from the distance matrix by applying Gaussian similarity function element-wise: • 3. Form Laplacian matrix from the similarity matrix by calculating: Lsym = I – D-1/2WD-1/2, where D is a degree matrix which is a diagonal matrix with • 4. Compute the first (smallest) k eigenvectors u1, u2,…,uk of L.

Details of Clustering Algorithm • 5. Form a n-by-k matrix U containing the vectors u1, u2,…,uk as columns. • 6. Normalize the rows of U to norm 1, by setting for each element uij • 7. For i=1,2,…,n, let yi be the vector corresponding to the i-th row of U • 8. Using k-means, cluster the points yi for i=1,...,n into clusters * Note that the inputs of this algorithm are k and similarity matrix (and thus σ) * In addition, outputs are clusters C1,…,Ck and data points coupled with the index.

Role of parameter σ (sigma) σ = 1 σ = 5 σ = 10 • Parameter σ assigns high weight to data points located within a circle of radius σ centered at each data point. • The greater σ is, the more points are assigned high weight, resulting in much less distinct clusters.

Examples of Clustering Results

Clustering results with different k Different color means different cluster *note that k = the number of clusters

Applications • Cluster analysis is being used in many fields: • In biology: sequence analysis, gene analysis • In medicine: PET scans • And in market research, social network analysis, image segmentation, data mining, crime analysis, and so on.

Acknowledgment • Mentor: • SankethV ShettyGraduate Research Assistant Computer Vision and Robotics Laboratory, Beckman Institute of Advanced Science and Technology

Questions? • Thank you!

Bioinformatics: Spectral Clustering

Bioinformatics: Spectral Clustering

Presentation Transcript

Spectral Clustering

Distance Metric Learning with Spectral Clustering

Clustering

Introduction to Bioinformatics Microarrays 3: Data Clustering

Biblio: cap. 10 din “An introduction to Bioinformatics algorithms”, N.Jones, P. Pevzner

Chapter 4 Clustering

Learning Spectral Clustering, With Application to Speech Separation

On Clusterings : Good, Bad, and Spectral

Clustering and MDS

Clustering V

Lesson 10 Bioinformatics

Clustering Aggregation

Anindya Bhattacharya and Rajat K. De Bioinformatics, 2008

Clustering

Bi-Clustering

Clustering and Research Works

Clustering and pathway analysis

WEB BAR 2004 Advanced Retrieval and Web Mining

Dynamic Hybrid Clustering of Bioinformatics by Incorporating Text Mining and Citation Analysis

Graph spectral analysis/Graph spectral clustering and its application to metabolic networks