1 / 18

Clustering Aggregation

Clustering Aggregation. Nir Geffen 021537980 Yotam Margolin 039719729 Supervisor Professor Zeev Volkovitch. ORT BRAUDE COLLEGE – SE DEPT. 9.12.2011. Table of Contents. Introduction Goals Clustering Spectral Clustering Cluster Ensembles. Consensus Spectral Clustering Ensembles

hester
Télécharger la présentation

Clustering Aggregation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering Aggregation Nir Geffen 021537980 Yotam Margolin 039719729 Supervisor Professor Zeev Volkovitch ORT BRAUDE COLLEGE – SE DEPT. 9.12.2011

  2. Table of Contents • Introduction • Goals • Clustering • Spectral Clustering • Cluster Ensembles. • Consensus • Spectral Clustering Ensembles • Abstract • Steps • Pseudo • Clustering Aggregation via Self Learning Approach - CASLA • Abstract • Steps • Pseudo • SE Documents

  3. Introduction – Goals Our goal is to investigate the results of different clustering ensemble techniques and to show the exclusive distinction between the various cluster ensemble and clustering aggregation via self learning.

  4. Introduction – Clustering • Clustering is a method of unsupervised learning, aimed at partitioning a given data set into subsets named clusters, so that items belonging to the same cluster are similar to each other while items belonging to different clusters are not similar.

  5. Introduction – Spectral Clustering • While Classic clustering methods gives solid results, they also need elaborate similarity functions and pre-configurations. • To make things easier, Spectral clustering approaches the clustering problem from a different angle. Instead of clustering the data as-is, we project it onto a space to which most noise will be perpendicular (orthogonal). • Finally, we will cluster the results using a classic algorithm to achieve the required results.

  6. Introduction – Cluster Ensembles • As no clustering algorithm is agreed to be superior for any data set, a common practice is to obtain several cluster partitions of the same data set. • Our next step will be to use a Consensus function to combine the resulting partitions into a new one, thereby increasing the robustness of the clustering process.

  7. Introduction - Consensus • There are 3 main algorithms to join partitions (or Clusterings). Due to long computing time, we’ll only use greedy algorithms. • These algorithms, also known as Consensus functions, mostly rely on Graph theory. • CSPA, is considered the brute-force. O() time and space complexity. • HGPA, stable, not always optimal. • MCLA, high-end solution, yields solid results, Worthy competitor to HGPA

  8. Spectral Ensembles - Abstract • To make full use of information included in a dataset, a multiway spectral clustering algorithm with joint model is applied to image segmentation. • Overcome the sensitivity of the joint model based multiwayspectral clustering to kernel parameter and to produce the robust and stable segmentation results, spectral clustering ensemble algorithm.

  9. Spectral Ensembles - Steps • Produce r individual spectral partitions • Use MCLA to obtain Sc MCLA(xi); • Use HGPA to obtain Sc HGPA(xi); • By ANMI criterion, get the final decision Sc*(xi) from Sc MCLA(xi) and Sc HGPA(xi).

  10. CASLA - Motivation • Being a central task in many research fields, numerous clustering algorithms have been developed and analyzed. • However, no clustering algorithm is agreed to be superior for any data set. • The performance of a clustering algorithm depends greatly on characteristics of the given data set and on parameters used by the algorithm, such as the desired number of clusters in a partition.

  11. CASLA - Abstract • Use various partitions of the same data set in order to define a new metric on the data set. • Using the new metric as an enhanced input for a clustering algorithm will produce better and more robust partitions. • This process can be done repeatedly, where in each step the metric is updated using the original data as well as the new cluster partition.

  12. CASLA – Steps (exterior metric update) 1. Let R be an n x n distance matrix based on X (e.g., R = (XTX)1/2 for the Euclidean distance). 2. Determine C, the desired number of clusters. 3. Create cluster C-partitions Π1,….,Πmusing m clustering methods, with R as the metric. 4. Compute ijand Ʃijfor any cluster πijin any Πi. 5. Recompute A using Equation (8). 6. Set R = XTAX. 7. Repeat until R converges: 8. Create a cluster partition Πof X using some clustering method, with R as the metric. 9. Compute jand Ʃjfor any cluster πjin Π. 10. Recompute A using Equation (8) (for m = 1). 11. Set R = XTAX. 12. Output Π.

  13. CASLA – Steps (interior metric update) 1. Let R be an n x n distance matrix based on X . 2. Determine C, the desired number of clusters. 3. Initialize C random clusters. 4. Compute the cluster centroids c1,…, cC. 5. Repeat until R converges: 6. Assign each data element xrto the cluster πj such that ||xr – cj||Ris minimized. 7. Compute jand Ʃjfor any cluster πjin Π. 8. Recompute A using Equation (8) (for m = 1). 9. Set R = XTAX. 10. Output Π.

  14. Use Case

  15. SE Documents

  16. GUI - TODO • Input file (choose) • Run clustering by Zeev, run clustering by Chinese, different threads. • Show ANMI criterion for each. Show colored graph for each. • Show statistics – Time eval per round, diff ANMI for diff methods, STD for cluster size. • History Tab for prev results.

  17. References [1] Zeevarticle draft * [2] Spectral Clustering Ensemble for Image Segmentation, Xiuli, Wanggen& Licheng. [3] EyalDavid [4] Dhilon [5] Sterhl

  18. THE END!

More Related