1 / 26

A novel genetic algorithm for automatic clustering

A novel genetic algorithm for automatic clustering. Outline. Motivation Objective Introduction Basic concept of Classical Genetic Algorithm Clustering with Genetic Algorithm Experimental results Discussion and conclusion Personal opinions Review. Motivation.

vito
Télécharger la présentation

A novel genetic algorithm for automatic clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A novel genetic algorithm for automatic clustering

  2. Outline • Motivation • Objective • Introduction • Basic concept of Classical Genetic Algorithm • Clustering with Genetic Algorithm • Experimental results • Discussion and conclusion • Personal opinions • Review

  3. Motivation • Some problems of the clustering. • Automatic clustering. • If one cluster is confined fully or partly within another cluster. • If clusters are present in noisy data.

  4. Objective • A new genetically guided algorithm for solving the clustering problem, which have two-phase process. • Cluster Decomposition Algorithm (CDA). • Hierarchical Cluster Merging Algorithm (HCMA). • Adjacent Cluster Checking Algorithm (ACCA).

  5. Introduction • These clustering methods can broadly be classified into two categories: • Hierarchical • agglomerative • divisive • Non-hierarchical • k-means

  6. Introduction • Some researchers have used GA based on split-and-merge method in defining clusters. • Tseng and Yang (2001). • Other algorithms: • DBScan • CURE • Chameleon

  7. Introduction • Genetically based Clustering Algorithm (GCA) which is basically a two-stage split-and-merge algorithm for finding the clusters. • Splitting of clusters with CDA. • Cluster merging with HCMA. • Adjacency checking between two fragmented clusters with ACCA.

  8. Encoding schemas Fitnessevaluation YES Testing the end of the algorithm Halt NO Parent selection Crossover operators Mutation operators Basic concept of Classical Genetic Algorithm

  9. Clustering with Genetic Algorithm • n vectors X = {x1, x2, …, xn} to be clustered into k groups. • The clustering approach has two steps • Cluster Decomposition Algorithm (CDA). • Hierarchical Cluster Merging Algorithm (HCMA).

  10. Splitting of clusters with CDA • First decomposes the entire data set into m groups of clusters.

  11. The progress of the CDA process • Step 1. For each object xi, find the nearest neighbor xj. • Step 2. Compute dav.

  12. The progress of the CDA process • Step 3. Consider xi as the center of a circular region with radius r. • Step 4. Set p = 1. • Step 5. Extract Bp and modify the data set X such that X = |X - Bp|. • Step 6. Terminate the algorithm if . Otherwise, p = p + 1 and go to step 5.

  13. u Pi m Cluster merging with HCMA • The second stage to merge the fragmented clusters, Bi. …

  14. Cluster merging with HCMA • The algorithm, HCMA consists of all three phases of CGA. • Pa and Pb are chosen randomly from the pool of individuals. • Corssover probability, , using single point corssover operation. • Adaptive mutation probability.

  15. Cluster merging with HCMA (example) pi B1 B0 m0 m’ Merge until B0 is null Ci

  16. Cluster merging with HCMA • Let the seed of the fragmented cluster Bi be . • The center Sj of each Cj : • The fitness function, .

  17. Adjacency checking between two fragmented clusters • The ACCA is used along with HCMA if • One cluster is confined fully or partly within another cluster. • Clusters are present in noisy data. • The ACCA uses two thresholds for deciding merging of pair of clusters. • : The threshold of boundary points. • : The threshold of data density difference.

  18. The progress of the ACCA process • Step 1. Define suitably the value of the radius . • Step 2. Select two fragmented clusters, , which satisfy the merging condition. • Step 3. Count the number of boundary points of which resides within radius r’ . Let it be Nb and the object density of be . • Step 4. If then are adjacent to each other. • Step 5. Terminate the algorithm.

  19. Experimental results • Parameter setting • Population size, = 50. • The number of clusters, m, is inversely proportional to the value of r. • 2 <= u <= 4. • k is pre-specified by the user. • Crossover probability • Initial mutation probability • Gmax=100 times in each cycle, 30 runs. • Tb=4 ; Td=0.4

  20. Cluster partitioning in R2 feature space

  21. Cluster partitioning in R2 feature space

  22. Cluster partitioning in R2 feature space • The noise is represented as the third cluster.

  23. Cluster separation in Iris data • 4-D Iris dataset.

  24. Discussion and conclusion • GCA is composed of two algorithms • CDA • HCMA After several GA cycles when k clusters are found. • Identify clusters accurately (ACCA) • Either partly or fully enclosed by another cluster. • Noise.

  25. Personal Opinions • It may be used in SOM 2-D map to automatic clustering.

  26. Review • Using GCA to automatic clustering. • Split : CDA • Merge : HCMA + ACCA

More Related