1 / 39

Cluster analysis

Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters from it. K-means. Criteria. Same criteria with multivariate data:. Justifying the criteria. Anova: decomposition of the variance.

ryu
Télécharger la présentation

Cluster analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cluster analysis

  2. Partition Methods Divide data into disjoint clusters • Hierarchical Methods Build a hierarchy of the observations and deduce the clusters from it.

  3. K-means

  4. Criteria

  5. Same criteria with multivariate data:

  6. Justifying the criteria • Anova: decomposition of the variance. Univariate: SST=SSW+SSB Multivariate: Minimizing the withing clusters variance is equivalent to maximize the between clusters variance (the difference between clusters).

  7. K-means algorithm

  8. Number of clusters

  9. Consequences of standardization

  10. Ruspini example

  11. Problems of k-means • Very sensitive to outliers • Euclidean distances not appropriate for eliptical clusters • It does not give the number of clusters.

  12. Hierarchical Algoritms

  13. Agglomerative algorithms

  14. Nearest neighbour distance

  15. Farthest neighbour distance

  16. Average distance

  17. Centroid method distance

  18. Ward’s method distance

  19. Dendograms

  20. Example

  21. Problems of hierarchical cluster • If n is large, slow. Each time n(n-1)/2 comparisons. • Euclidean distances not always appropriate • If n is large, dendogram difficult to interpret

  22. Clustering by variables

  23. Distances between quantitative variables

  24. Distances between qualitative variables

  25. Similarity between attributes

More Related