Presenter : Keng -Yu Lin Author : Amir Ahmad , Lipika Dey PRL . 2011

A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets Presenter : Keng-Yu Lin Author : Amir Ahmad , LipikaDey PRL. 2011

Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments

Motivation Almost all subspace clustering algorithms proposed so far are designed for numeric datasets.

Objectives • This paper present a k-means type clustering algorithm that finds clusters in data subspaces in mixed numeric and categorical datasets.

Methodology • k-means clustering algorithm • Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids. • Assign each object to the group that has the closest centroid. • When all objects have been assigned, recalculate the positions of the K centroids. • Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.

Methodology

Experiments error rate : 4.8% Zaki et al. error rate : 3.8% Vote dataset

Experiments error rate : 4.1% Zaki et al. error rate : 0.3% Mushroom datasets

Experiments error rate : 17% DNA datasets

Experiments error rate : 13.9% Huang et al.(2005) error rate: 15% Australian credit data

Conclusions This paper presented a clustering algorithm for subspace clustering for mixed numeric and categorical data.

Comments • Advantage • Applications • Subspace clustering.

Presenter : Keng -Yu Lin Author : Amir Ahmad , Lipika Dey PRL . 2011