1 / 15

Clustering

Clustering. 조이현. Overview. What is clustering? Clustering algorithms. What is clustering?. Clustering The act of grouping similar object into sets Clustering vs. Classification Classification assigns objects to predefined groups Clustering infers groups based on clustered objects.

butch
Télécharger la présentation

Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering 조이현

  2. Overview • What is clustering? • Clustering algorithms

  3. What is clustering? • Clustering • The act of grouping similar object into sets • Clustering vs. Classification • Classification assigns objects to predefined groups • Clustering infers groups based on clustered objects

  4. Clustering algorithms • Hierarchical • Bottom-up (agglomerative clustering) • Top-down (divisive clustering) • Non-Hierarchical • K-means (can be fuzzy) • Single-pass (incremental)

  5. Hierarchical Clustering • Bottom-up (agglomerative clustering) • Start with the individual object • Join cluster with maximum similarity • Top-down (divisive clustering) • Start with all the object • Divides them into groups • Split least coherent part in cluster

  6. Agglomerative clustering

  7. Clustering result: dendrogram

  8. Hierarchical clustering variants • Various ways of calculating cluster similarity single-link (minimum) complete-link (maximum) Group-average (average)

  9. Single Link • Similarity of two most similar members • Time complexity • O(n2) • Locally Coherent • Close objects are in the same cluster • Chaining effect

  10. Complete Link • Similarity of two least similar members • Time complexity • O(n3) • Focused on global cluster quality • Avoids elongated cluster

  11. Group average • Averages similarity between members • Time complexity • O(n2) • compromise between single-link and complete-link

  12. K-means clustering • Defines clusters by the center of mass of their members • Initial center of cluster are randomly selected • Assign objects to cluster using distances between center and object • Re-compute the center of each cluster • Return step2 until stopping criteria is satisfied

  13. K-means clustering (k=3)

  14. Single-pass threshold

  15. Preferable for detailed data analysis Provides more information than flat No single best algorithm (dependent on application) Less efficient than flat ( N X N similarity matrix required) Preferable if efficiency is consideration or data sets are very large K-means is the conceptually simplest method K-means assumes a simple Euclidean representation space and so can’t be used for many data sets Properties of hierarchical and flat clustering

More Related