1 / 12

A new data clustering approach-Generalized cellular automata

A new data clustering approach-Generalized cellular automata. Presenter : Shao-Wei Cheng Authors : Dianxun Shuai, Yumin Dong, Qing Shuai. IS 2007. Outline. Motivation Objective Methodology Experiments Conclusion Personal Comments. 區域解. Start. Motivation.

cianna
Télécharger la présentation

A new data clustering approach-Generalized cellular automata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A new data clustering approach-Generalized cellular automata Presenter : Shao-Wei Cheng Authors : Dianxun Shuai, Yumin Dong, Qing Shuai IS 2007

  2. Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Personal Comments

  3. 區域解 Start Motivation • Many clustering methods have the following limitations and shortcomings in enterprise computing. • The run-time increasing rapidly. • Needs of some pairwise computation or pre-processing. • No guarantee for the clustering optimality. • The clustering performance and quality are sensitive to the cluster shape and cluster distribution. • Unable to well suppress the noise affect. • Poor clustering performance for high-dimensional data. • No learning ability • The dynamic change of clustered data objects are usually not allowed during the algorithm execution. 3

  4. Objectives • This paper is devoted to a novel GCA for self-organizing data clustering in enterprise computing and overcame the limitations and shortcomings above. • GCA is a Generalized Cellular Automata. • GCA have some components and feature. • Cells • States • Neighborhood • Rule • Parallel computation • Local • Homogeneous

  5. Methodology • Rule • N x N cellular array • sij(t) is the state of the cellcij(t), is denoted by Ø • cij(t): cell • p = 1, f(∆H), 1- f(∆H) • ∆H = Harmony increment • Γ(t) is a matrix • wij is a weight coefficient • Nij= { ci, j-1 , ci, j+1 , ci-1, j , ci+1, j } 5

  6. Methodology • d( sij(t), si'j'(t) )is the similarity • Ifsij(t)≠Ø and si'j'(t)≠Ø, then 0 ≦d( sij(t), si'j'(t) ) ≦1 • Otherwise, d( sij(t), si'j'(t) ) = -1 6

  7. Methodology

  8. Experiments • Number of clusters: 60. • Data set size: 20,000. • t = number of iterations. t = 0 t = 20 t = 40 t = 60 t = 80 t = 200 8

  9. Experiments • Number of clusters: 25. • Average data objects per cluster: 500. • Data set size: 12,500; • Execution times of the GCAA: 1000. 9

  10. Experiments • PAM, Ex. K-means • CLARANS, Clustering Large Applications based on RANdom Search • CURE, Clustering Using REpresentatives 10

  11. Conclusion • The GCA approach hasshown many advantages over other widely used clustering algorithms in terms of the following: • Faster clustering speed. • The ability to handle and recognize the shape-varying and size-varying clusters. • The robustness to outliers. • The ability to learn. • The suitability for high-dimensional data sets.

  12. Personal Comments • Advantage • A novel data clustering approach. • Drawback • … • Application • Clustering in enterprise computing.

More Related