1 / 20

A New Data Clustering Approach for Data Mining in Large Databases

A New Data Clustering Approach for Data Mining in Large Databases. Authors: Cheng-Fa Tsai†, Han-Chang Wu, and Chun-Wei Tsai Advisor: Dr. Hsu Graduate: Yan Pin Huang

gjean
Télécharger la présentation

A New Data Clustering Approach for Data Mining in Large Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A New Data Clustering Approach for Data Mining in Large Databases Authors: Cheng-Fa Tsai†, Han-Chang Wu, and Chun-Wei Tsai Advisor: Dr. Hsu Graduate: Yan Pin Huang Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN.02) 2002 IEEE IDSL

  2. Outline • Motivation • Objective • Introduction • Definitions for Clustering Problem • Ant Colony Optimization (ACO) • The proposed Approach • Simulation Results • Conclusions • Opinion IDSL

  3. Motivation and Objective • This paper present a new data clustering method for data mining in large databases. • The proposed novel clustering method performs better than the Fast SOM combines K-means approach (FSOM+K-means) and Genetic K-Means Algorithm (GKA) IDSL

  4. Introduction • Traditional clustering algorithms can be classified into two main categories[Jain ,1988] • hierarchical –priori, cure • partitional • crisp clustering • fuzzy clustering Hichem(1999),Wang(2001),Bezdek(1992),Krishnapuram(1992) IDSL

  5. Introduction(cont.) • Dorigo(1992) first presented the Ant System (AS) • Feedback • Distributed computation • The use of a constructive greedy heuristic. • This method apple to the classical traveling salesman problem (TSP), quadratic assignment problem (QAP), and job-shop scheduling problem. • Success to other combinatorial optimization-problems such as the scheduling,partition-ing,coloring,telecommunications networks, andvehicle routing problem IDSL

  6. Definitions for Clustering Problem • Ta represents the time cost for clustering • n r denotes the number of runs • Ts is the initial time for clustering • Te represents the terminate time for clustering IDSL

  7. Ant Colony Optimization (ACO) Let bi(t) (i = 1,…,n) be the number of ants in city i at time t be the intensity of pheromone trail on connection (i, j) at time t + n where ρis a coefficient such that 1- ρ denotes a coefficient which represents the evaporation of trail between time t and t +n IDSL

  8. Ant Colony Optimization (ACO)(cont.) • where Q denotes a constant • Lkrepresents the tour length found by the kth ant. • For each edge, the intensity of trail at time 0 (τij(0)) is set to a very small value. IDSL

  9. Ant Colony Optimization (ACO)(cont.) • where allowedk(t) is the set of cities not visited by ant k at time t • d ij η denotes a local heuristic which equal to 1/d (and it is called ‘visibility’) • The parameterαand β control the relative importance of pheromone trail versus visibility. IDSL

  10. Ant Colony Optimization (ACO)(cont.) IDSL

  11. The proposed Approach (a) using differently favorable ants to solve the clustering problem (b) adopting simulated annealing concept for ants to decreasingly visit the amount of cities to get local optimal solutions (c) utilizing tournament selection strategy to choose a path. IDSL

  12. The Strategy of Using ACO with Different favor IDSL

  13. The Strategy of Using ACO with Different favor(cont.) IDSL

  14. The Strategy of Using Simulated Annealing • ns(t+1) denotes the current number of visiting nodes of ants • ns(t) represents the number of visiting nodes of ants at last time (cycle) ,T is a constant (T= 0.95). • nf the number of visiting nodes of ants during T1 Function nf(t+1) denotes the current number of visiting nodes of ant nf(t) represents the number of visiting nodes of ant at last time (cycle), run = 2, i∈{1, 2}. IDSL

  15. The Strategy of Using Tournament Selection • Tournament Selection is more powerful than Roulette Wheel Selection in our experiences. IDSL

  16. The Proposed Algorithm for Clustering IDSL

  17. Simulation Results IDSL

  18. Simulation Results (cont.) IDSL

  19. Conclusions (a) using ACO with different favor to solve the clustering problem (b) adopting simulated annealing concept for ants to decreasingly visit the amount of cities to get local optimal solutions (c) utilizing tournament selection strategy to choose a path. IDSL

  20. Opinion • ACO algorithm is a extremely powerful clustering algorithm. • We can try to develop the ACO algorithm in clustering proglems. IDSL

More Related