A dissimilarity measure for the K-Modes clustering algorithm

Presenter : Bo-Sheng Wang Authors : Fuyuan Cao, Jiye Liang, Deyu Li, Liang Bai, Chuangyin Dang KBS, 2012 A dissimilarity measure for the K-Modes clustering algorithm

Outlines Motivation Objectives Methodology Experiments Conclusions Comments

Motivation In this paper, the limitations of simple matching dissimilarity measure and Ng’s dissimilarity measure are revealed using some illustrative examples.

Limitationsof simple matching dissimilarity measure 1, if x≠y x≡y = 0, otherwise • Simple matching is a common approach, the simple matching dissimilarity measure is is defined as: • However, simple matching often results： • Weak intrasimilarity. • Disregards the similarity hidden between categorical values.

Limitationsof Ng’s dissimilarity measure • For the k-Modes algorithm with Ng’s dissimilarity measure, the simple matching dissimilarity measure is still used in the ﬁrst iteration. • Disregards the similarity hidden between categorical values.

Objectives Based on the idea of biological and genetic taxonomy and rough membership function, a new dissimilarity measure for the k-Modes algorithm is deﬁne. The dissimilarity measure between a mode of a cluster and an object is given by improving Ng’s dissimilarity measure.

Methodology • Review some basic concepts of rough set theory. • Definition 1 Categorical information system • IS = (U,A,V,f) • Definition 2 Binary relation IND(P) • 1. • 2. • .Definition 3The rough membership function µPX: U→[0,1]

Methodology-A new dissimilarity measure between two objects • Definition 4 A similarity measure between objects x and y with respect to a

Methodology-A new dissimilarity measure between two objects Definition 5 The dissimilarity measure between x and y with respect to P.

Methodology-A new dissimilarity measure between two objects • Example：A new dissimilarity measure between two objects • Simple Matching Dissimilarity Measure： • New Dissimilarity Measure：

Methodology-A new dissimilarity measure between a mode and an object Ng’s Dissimilarity Measure

Methodology-A new dissimilarity measure between a mode and an object Definition 7 The new dissimilarity measure between xi and zl with respect to P

Methodology-A new dissimilarity measure between a mode and an objects • Example： A new dissimilarity measure between a mode and an object • Ng’s dissimilarity measure • New dissimilarity measure

Methodology-Convergence and complexity analysis The objective of clustering a set of n = |U| objects into k clusters is to ﬁnd W and Z that minimize:

Methodology-Convergence and complexity analysis This process can be formulated as the following k-Modes algorithm:

Methodology-Convergence and complexity analysis Now we consider the convergence of the k-Modes algorithm with the proposed dissimilarity measure NDisP(zl,x i )

Methodology-Convergence and complexity analysis Proof. For a given W. we have：

Methodology-Convergence and complexity analysis

Experiments Evaluation on scalability

Experiments Evaluation on clustering efﬁciency

Conclusions The new measure that uniﬁes the dissimilarity measures between two objects and between an object and a mode as well. The k-Modes algorithm using the new dissimilarity measure can be safely and effectively used in case of large data sets. The results of experiments using synthetic data sets and ﬁve real data sets from UCI show the effectivenessof the new dissimilarity measure.

Comments • Advantages • The method that can save some time. • Applications • Dissimilarity measure

A dissimilarity measure for the K-Modes clustering algorithm

A dissimilarity measure for the K-Modes clustering algorithm

Presentation Transcript

SCAN : A Structural Clustering Algorithm for Networks

k - medoid clustering with genetic algorithm

A novel genetic algorithm for automatic clustering

Proposed Dissimilarity Measure

A k-mean clustering algorithm for mixed numeric and categorical data

Clustering Algorithm

A Self-Stabilizing O(n) -Round k -Clustering Algorithm

APSCAN: A parameter free algorithm for clustering

SCAN: A Structural Clustering Algorithm for Networks

Algorithm design for MAPS clustering

A Genetic Algorithm Approach to K -Means Clustering

A New Gravitational Clustering Algorithm

Rek-means A k-means Based Clustering Algorithm

K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm | Edureka

Boosting Algorithm for Clustering

Local Clustering Algorithm

A novel genetic algorithm for automatic clustering

A Self-Stabilizing O(n) -Round k -Clustering Algorithm

SCAN: A Structural Clustering Algorithm for Networks

A Fuzzy k-Modes Algorithm for Clustering Categorical Data

Towards a clustering algorithm for CALICE

Categorical K-means Clustering Algorithm