Ant Inspired Data Mining

Ant Inspired Data Mining Brandon Emerson April 22, 2013

What is data mining? • Data mining is any process that analyzes and organizes data into clear and concise formats. It can be particularly powerful when creating relationships between points of data. • Mainly used by companies with a consumer focus, specifically marketing divisions. Data mining allows them to make meaningful relationships between products and consumers.

Applications in Physics • Efficient data mining techniques can improve data storage and retrieval in experiments that require a great deal of data collection. • Effective mining can help analysts develop relationships between specific points of data, and thus physical phenomena.

Our Goals • Use basic ideas about ant behaviors to develop an effective means of data mining. • Discuss recent improvements ant clustering algorithms, and compare data mining techniques by results from simple tests.

A Simple Model of Ants-1 Ant Object

A Simple Model of Ants-2 Ant Object Probability of picking up Probability of placing a is a constant f is the perceived fraction of objects nearby b is a constant Assuming the ant moves randomly and it has enough time to explore the entire area, you could expect all of the objects to be clustered together.

A Note on Perception f is the perceived fraction of objects nearby Where is a dissimilarity function. y X when f > 0 When the objects are the same: otherwise f(x) is now a measure of the similarity of object x to object y in the area around object x When the objects are different: αis a scale factor for dissimilarity.

The Basic Algorithm 0 /*Initialization*/ 1 for every object x do • place x randomly on grid 3 end for 4 for all ants do • Place ant at randomly selected site 6 end for 7 {*main loop*} 8 for all ants do 9 For t = 1 to do • If ((ant no object) and (site occupied by object) then • Compute f(x) and probability of picking up • Draw random real number R • if (R ≤ Prob) then • pick up object • end if • else • if (ant w/object) and (empty site) then • compute f(x) and probability of dropping • draw random real number R • if (R ≤ Prob) then • drop object • end if • end if • end if • move to randomly selected ant free adjacent site • end for • end for • Print location of objects

Improvements-1 • Granted ants “short-term memory.” The ants stored their last x number of locations. After picking up data they proceed to their last remembered locations sequentially. • Normalized the grid to enable efficient mining of a variety of data set sizes. Grid size Step size Where N is the maximum number of data items to be mined. Number of iterations

Improvements-2 αdetermines the percentage of items that are similar. If αis too small, clusters wont be formed. If αis too large, the clusters will combine to create one super cluster. Each ant is uniquely assigned a value for α, and is allowed to change its value in the following way: the ant makes a set number of moves (100), during which it keeps track of how many times it has failed to drop data items F. The rate of failure is found by F/100, and αis adapted according to these parameters. If rate α0.99 If rate α≤ 0.99

The Updated Algorithm move_agent to new location I = carried_object compute f*(x) and prob of drop if drop = true then while pick = false do I = random_select_object compute f*(x) and prob of pick pick_up_object end while end if end for end • /*Initialization Phase*/ • Randomly scatter x object on the grid file • For each ant a do • random_select_object • pick_up_object • place_agent a at randomly selected empty grid location • End for • {*main loop*} • For t = 1 to do • random_select_agent

Comparing Techniques Iris 150 is a data set used from the Machine Learning repository. K-means is a standard technique for data mining, and is used here to benchmark the Ant Clustering Algorithm’s (ACA) performance. Important note: the ACA does not need to be given the correct number of clusters to proceed; whereas K-means does. Maximize these values Minimize this value

Summary • Ant simulation offers a unique technique for data mining. This technique was developed using simple ideas about ant behavior. • Ant Clustering Algorithms could use improvement, but as it stands it is fairly effective. • As our understanding of ant behavior improves, perhaps ACA could be refined into an even more efficient tool.

Just to be Clear… None of the information presented, including data tables, and code, is my personal work. All of the information was found in the paper below. Boryczka, Urszula. "Ant Colony Metaphor in a New Clustering Algorithm." Control and Cybernetics 39.2 (2010): 343-57. Print.

Ant Inspired Data Mining

Ant Inspired Data Mining

Presentation Transcript

Data Mining: Data

Data Mining: Data

Ant Colony Optimization and its Potential in Data Mining

Data Mining: Data

Data Mining: P enelitian Data Mining

Clustering of Visual Data using Ant-inspired Methods

Data Mining: Data

Data Mining: Data

Data Mining: Data

Data Mining: Data

FPGA Co-Processor Enhanced Ant Colony Systems Data Mining

Data Mining: Data

Geometrically Inspired Itemset Mining*

Data Mining

Data Mining: Data

Data Mining: Data