1 / 24

A dissimilarity measure for the K-Modes clustering algorithm

Presenter : Bo- Sheng Wang Authors : Fuyuan Cao, Jiye Liang, Deyu Li, Liang Bai , Chuangyin Dang KBS, 2012. A dissimilarity measure for the K-Modes clustering algorithm. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.

seth
Télécharger la présentation

A dissimilarity measure for the K-Modes clustering algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Presenter : Bo-Sheng Wang Authors : Fuyuan Cao, Jiye Liang, Deyu Li, Liang Bai, Chuangyin Dang KBS, 2012 A dissimilarity measure for the K-Modes clustering algorithm

  2. Outlines Motivation Objectives Methodology Experiments Conclusions Comments

  3. Motivation In this paper, the limitations of simple matching dissimilarity measure and Ng’s dissimilarity measure are revealed using some illustrative examples.

  4. Limitationsof simple matching dissimilarity measure 1, if x≠y x≡y = 0, otherwise • Simple matching is a common approach, the simple matching dissimilarity measure is is defined as: • However, simple matching often results: • Weak intrasimilarity. • Disregards the similarity hidden between categorical values.

  5. Limitationsof Ng’s dissimilarity measure • For the k-Modes algorithm with Ng’s dissimilarity measure, the simple matching dissimilarity measure is still used in the first iteration. • Disregards the similarity hidden between categorical values.

  6. Objectives Based on the idea of biological and genetic taxonomy and rough membership function, a new dissimilarity measure for the k-Modes algorithm is define. The dissimilarity measure between a mode of a cluster and an object is given by improving Ng’s dissimilarity measure.

  7. Methodology • Review some basic concepts of rough set theory. • Definition 1 Categorical information system • IS = (U,A,V,f) • Definition 2 Binary relation IND(P) • 1. • 2. • .Definition 3The rough membership function µPX: U→[0,1]

  8. Methodology-A new dissimilarity measure between two objects • Definition 4 A similarity measure between objects x and y with respect to a

  9. Methodology-A new dissimilarity measure between two objects Definition 5 The dissimilarity measure between x and y with respect to P.

  10. Methodology-A new dissimilarity measure between two objects • Example:A new dissimilarity measure between two objects • Simple Matching Dissimilarity Measure: • New Dissimilarity Measure:

  11. Methodology-A new dissimilarity measure between a mode and an object Ng’s Dissimilarity Measure

  12. Methodology-A new dissimilarity measure between a mode and an object Definition 7 The new dissimilarity measure between xi and zl with respect to P

  13. Methodology-A new dissimilarity measure between a mode and an objects • Example: A new dissimilarity measure between a mode and an object • Ng’s dissimilarity measure • New dissimilarity measure

  14. Methodology-Convergence and complexity analysis The objective of clustering a set of n = |U| objects into k clusters is to find W and Z that minimize:

  15. Methodology-Convergence and complexity analysis This process can be formulated as the following k-Modes algorithm:

  16. Methodology-Convergence and complexity analysis Now we consider the convergence of the k-Modes algorithm with the proposed dissimilarity measure NDisP(zl,x i )

  17. Methodology-Convergence and complexity analysis Proof. For a given W. we have:

  18. Methodology-Convergence and complexity analysis

  19. Methodology-Convergence and complexity analysis

  20. Experiments Evaluation on scalability

  21. Experiments Evaluation on scalability

  22. Experiments Evaluation on clustering efficiency

  23. Conclusions The new measure that unifies the dissimilarity measures between two objects and between an object and a mode as well. The k-Modes algorithm using the new dissimilarity measure can be safely and effectively used in case of large data sets. The results of experiments using synthetic data sets and five real data sets from UCI show the effectivenessof the new dissimilarity measure.

  24. Comments • Advantages • The method that can save some time. • Applications • Dissimilarity measure

More Related