Unsupervised Evolutionary Clustering Algorithm for Mixed Type Data
This paper presents a novel unsupervised clustering algorithm called Evolutionary K-Prototype (EKP), designed to improve the K-Prototype (KP) algorithm for mixed-type data. The EKP approach addresses limitations in initialization sensitivity and local optimum convergence faced by KP by integrating evolutionary strategies. The methodology involves initial setup, crossover, mutation, and a comprehensive search process. Experiments demonstrate that EKP markedly enhances clustering outcomes compared to traditional KP, offering significant potential for mixed data applications. Future studies will explore automatic weight adjustment in EKP.
Unsupervised Evolutionary Clustering Algorithm for Mixed Type Data
E N D
Presentation Transcript
Unsupervised Evolutionary Clustering Algorithm for Mixed Type Data ZhiZheng, Maoguo Gong , Jingjing Ma , Licheng Jiao , Qiaodi Wu 2010,CEC Presented by Chien-Hao Kung 2011/12/1
Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments
Motivation • As a partitional clustering algorithm, K-prototype (KP) algorithm is a well-known one for mixed type data. • However, it is sensitive to initialization and converges to local optimum easily.
Objectives In this study, KP is applied as a local search strategy, and runs under the Global searching to help KP overcome its flaws.
Methodology • K-prototype Algorithm • Step1.Initializing. • Step2.For each data item, calculating the distances. • Step3.Retest every data item. • Step4.Repeat Step3. until no item changes its cluster.
Methodology • Evolutionary k-prototype(EKP) • Step1 Initialization. • Step2 Crossover. • Step3 Mutation. • Step4 KP Search. • Step5 Evaluation and Selection. • Step6 Termination Test.
Methodology • Initialization • There are 8 parameters have to be set before evolution. • Cluster number • r is a weight in EKP which balance the influence on clustering • Population size • Proportion of initial individuals that generated by choosing items randomly in dataset (IP) • Crossover probability • Mutation probability • in simulated binary crossover(SBX) • n in polynomial mutation
Methodology • Initialization • Two kinds of random initialization schemes • The first is randomly choosing K data item as the prototypes of clusters • The second is randomly generating K prototypes • Ex: • [2.23,5.63],[6.56,5.13], and {1,2,3,4,5,6},{2,4} • =>{3.21,6.23,2,4}
Methodology • Crossover. • Numerical type --Simulated binary crossover(SBX) • Categorical type – Single point crossover
Methodology Mutation
Methodology • KP Search • Evaluation and Selection • Termination Test
Experiments Parameter setting
Experiments • Dataset
Conclusions • This paper propose a novel unsupervised clustering algorithm for mixed type data named evolutionary k-prototype(EKP) . • The experiment result show that the evolutionary framework improves the original algorithms markedly. • EKP which can adjust this weight automatically needs to be studied.
Comments • Drawback • This method use the parameter too much. Application • Clustering