1 / 20

Genetic Algorithm Using Iterative Shrinking for Solving Clustering Problems

Genetic Algorithm Using Iterative Shrinking for Solving Clustering Problems. Pasi Fränti and Olli Virmajoki. UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND. to be presented at: Data Mining 2003. Problem setup.

gene
Télécharger la présentation

Genetic Algorithm Using Iterative Shrinking for Solving Clustering Problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genetic Algorithm Using Iterative Shrinking for Solving Clustering Problems Pasi Fränti and Olli Virmajoki UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND to be presented at: Data Mining 2003

  2. Problem setup • Given N data vectors X={x1, x2, …, xN}, partition the data set into M clusters • Clustering: find the location of the clusters. • 2. Vector quantization: approximate the original data by a set of code vectors.

  3. Agglomerative clustering PNN: Pairwise Nearest Neigbor method • Merges two clusters • Preserves hierarchy of clusters IS: Iterative shrinking method • Removes one cluster • Repartition data vectors in removed cluster

  4. Iterative Shrinking

  5. Iterative Shrinking algorithm (IS)

  6. Local optimization of the IS Finding secondary cluster: Removal cost of single vector:

  7. Generalization to the case of unknown number of clusters • Measure variance-ratio F-test for every intermediate clustering from M=1..N. • Select the clustering with minimum F-ratio as final clustering. • No additional computing – except the calculation of the F-ratio.

  8. Example for (Data set 3)

  9. Example for Data set 4

  10. Genetic algorithm Generate S initial solutions. REPEAT T times • Select best solutions to survive. • Generate new solutions by crossover • Fine-tune solutions END-REPEAT Output the best solution found.

  11. Illustration of crossover + = Crossover

  12. GAIS algorithm

  13. Effect of crossover

  14. Convergence of GA with F-ratio

  15. Bridge (256256) d = 16 N = 4096 M = 256 Miss America (360288) d = 16 N = 6480 M = 256 House (256256) d = 3 N = 34112* M =256 Image datasets

  16. Data set S1 d = 2 N = 5000 M = 15 Data set S2 d = 2 N = 5000 M = 15 Data set S3 d = 2 N = 5000 M = 15 Data set S4 d = 2 N = 5000 M = 15 Synthetic data sets

  17. Comparison with image data Popular methods Simplest of the good ones Previous GA NEW!

  18. Comparison with synthetic data Most separable clusters Most overlapping between clusters

  19. What does it cost? Bridge Random: ~0 s K-means: 8 s SOM: 6 minutes GA-PNN: 13 minutes GAIS – short: ~1 hour GAIS – long: ~3 days

  20. Conclusions • Slower but better clustering algorithm. • BEST known clustering algorithm in minimizing MSE Thank you!

More Related