1 / 24

A modified version of the K-means algorithm with a distance based on cluster symmetry

A modified version of the K-means algorithm with a distance based on cluster symmetry. Advisor : Dr. Hsu Reporter : Chun Kai Chen Author : Mu-Chun Su and Chien-Hsing Chou. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2001. Outline. Motivation Objective Introduction

Télécharger la présentation

A modified version of the K-means algorithm with a distance based on cluster symmetry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A modified version of the K-means algorithm with a distance based on cluster symmetry Advisor :Dr. Hsu Reporter:Chun Kai Chen Author:Mu-Chun Su and Chien-Hsing Chou IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2001

  2. Outline • Motivation • Objective • Introduction • The Point Symmetry Distance • Experimental Results • Conclusions • Personal Opinion

  3. Motivation • Since clusters can be of arbitrary shapes and sizes, the Minkowski metrics seem not a good choice for situations where no a priori information about the geometric characteristics of the data set to be clustered exists

  4. Objective • Therefore, we have to find another more flexible measure • One of the basic features of shapes and objects is symmetry • Propose a nonmetric measure based on the concept of point symmetry

  5. 10 9 8 7 6 5 Update the cluster means 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 reassign reassign Update the cluster means K-means Partitional Clustering

  6. 10 9 8 7 6 5 Update the cluster means 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 reassign reassign reassign Symmetry-based version of the K-means algorithm Update the cluster means Fine-Tuning Coarse-Tuning

  7. Introduction(1/4) • Most of the conventional clustering methods assume that patterns having similar locations or constant density create a single cluster • Location or density becomes a characteristic property of a cluster

  8. Introduction(2/4) • Mathematically identify clusters in a data set • usually necessary to first define a measure of similarity or proximity which will establish a rule for assigning patterns to the domain of a particular cluster center • the most popular similarity measure • the Euclidean distance

  9. Introduction(3/4) • Euclidean distance as a measure of similarity • hyperspherical-shaped clusters of equal size are usually detected • Mahalanobis distance • take care of hyperellipsoidal-shaped clusters, is one of the popular choices

  10. Introduction(4/4) • The major difficulties using the Mahalanobis distance • have to recompute the inverse of the sample covariance matrix every time a pattern changes its cluster domain, which is computationally expensive • In fact, not only similarity measures, but also the number of clusters which cannot always be defined a priori will influence the clustering results • In this paper • we focus on the selection of similarity measures

  11. Symmetry • Symmetry is so common in the abstract and in nature • reasonable to assume some kinds of symmetry exit in the structures of clusters • immediate problem is how to find a metric to measure symmetry

  12. The Point Symmetry Distance • The point symmetry distance is defined as follows: Given N patterns, xi; i=1,…,N, and a reference vector c (e.g., a cluster centroid) • the denominator term is used to normalize • If the right hand term of (2) is minimized when xi = xj*, then the pattern xj* is denoted as the symmetrical pattern relative to xj with respect to c

  13. Example of The Point Symmetry Distance

  14. Symmetry-based version of the K-means algorithm(1/3) • Step 1: Initialization • randomly choose K data points from the data set to initialize K cluster centroids, c1, c2 . . . ; cK. • Step 2: Coarse-Tuning • use the ordinary K-means algorithm with the Euclidean distance to update the K cluster centroids • after the K cluster centroids converge or some kind of terminating criteria is satisfied

  15. Symmetry-based version of the K-means algorithm(2/3) • Step 3: Fine-Tuning • For pattern x, find the cluster centroid nearest it in the symmetrical sense • If the point symmetry distance is smaller than a prespecified parameter θ, then assign the data point x to the k*th cluster • ds(x,ck) is the point symmetry distance • Otherwise, the data point is assigned to the cluster centroid k using the following criterion: • d(x,ck) is the Euclidean distance

  16. Symmetry-based version of the K-means algorithm(3/3) • Step 4: Updating • Compute the new centroids of the K clusters • where Sk(t) is the set whose elements are the patterns assigned to the kth cluster at time t and Nk is the number of elements in Sk. • Step 5: Continuation • If no patterns change categories or the number of iterations has reached a prespecified maximum number, then stop. Otherwise, go to Step 3.

  17. Experimental Results • Used four examples to compare the SBKM algorithm and the SBCL algorithm • In addition, we use one example to show how to use the point symmetry distance in face detections

  18. Mixture of Spherical and Ellipsoidal clusters ordinary K-means SBCL SBKM

  19. Ring-shaped clusters ordinary K-means SBCL SBKM

  20. Linear structures ordinary K-means SBCL SBKM

  21. Combination of ring-shaped, compact,and linear clusters ordinary K-means SBKM SBCL

  22. Detecting a face in a complex background

  23. Conclusion • Both use the point symmetry distance as the dissimilarity measure, the SBKM algorithm outperformed the SBCL algorithm in many cases • The proposed SBKM algorithm can be used to group a given data set into a set of clusters of different geometrical structures • Besides, we can also apply the point symmetry distance to detect human faces. The experimental results are encouraging

  24. Personal Opinion • Advantage • Idea, innovate • Application • clustering • Future Work • Adopt symmetry distance on SOM

More Related