170 likes | 295 Vues
This presentation discusses an advanced Prototype-Based Clustering (PBC) algorithm utilizing an Enhanced Neural Gas (ENG) network framework, addressing several challenges associated with traditional PBC methods including sensitivity to initialization, input sequence ordering, and outlier influences. It explains the methodology involved, including the employment of the Minimum Description Length (MDL) principle for performance measurement, and presents experimental results showing superior performance compared to nine existing algorithms. The proposed method effectively manages compact and sparse cluster issues, showcasing its potential for clustering and classification applications.
E N D
Enhanced neural gas network for prototype-based clustering Presenter : Shao-Wei Cheng Authors : A.K. Qin, P.N. Suganthan PR 2005
Outline • Motivation • Objective • Methodology • Experiments and Results • Conclusion • Personal Comments
Motivation • There are several problems about PBC and NG algorithm. • Sensitivity to initialization. • Sensitivity to input sequence ordering • The adverse influence from outliers. . . . . .
Objectives • Present an improved PBC algorithm based on the enhanced NG network framework, called the ENG. • Tackle several problems about PBC. . . . . .
. 4 Methodology - original . 3 . . 1 2 • PBC algorithms:k-means, fuzzy k-means • NG network algorithms:single-layered neural network • Faster convergence to low distortion errors. • Lower distortion error than other methods. • Obeying a stochastic gradient descent. • The original NG algorithm • The original NG algorithm with concept of fuzzy . 0 V.
Methodology - enhanced • Enhanced NG network framework • (3) – Explain the influence of outlier, updating from Eq. (1) • (4) – The new formula updating from Eq. (3) • (5)、(6)、(7) – Explain the parameters in Eq. (4)
Methodology – MDL framework • MDL principle is employed as the performance measure. • Original MDL • MDL in this approach as
c synaptic weights W = {w1,w2, . . . ,wc} randomly is the middle value for to control the acceleration of ’s changing κ and η are the parameters used to calculate the MDL value The initial training epoch number: m = 0 The initial iteration step number t in training epoch m : t = 1 Total iteration step number iter is:iter = m · N + t The maximum training epoch is set as Max_epoch The dislocated prototypes’ relocation is defined as RP_epoch The dataset for training is V = {v1, v2, . . . , vN} Methodology - processes Initialize N Draw data in training set and compute (c) If training stage is at RP_epoch (d) For j = 1 to size(V) (a) If m < Max_epoch (b) If trainingset is not empty Y Y Y N N (e) For j = 1 to size(Torelocate) change Y Training epoch += 1 End restore N (f) If current utifactor value < previous utifactor value
Experiments • Compared to 9 algorithms:HCM, FCM, NG, FPCM, CFCM-F1, CFCM-F2, HRC-FRC, AHCM, and AFCM. • Data set: • Artificial – D1, D2 • UCI datasets • Run each clustering algorithms 10 times. • Parameter settings: • εi=0.8, εf=0.05;λi=10, λf=0.01;βi=50, βm= 10, βf= 0.01 • κ= 2, η= 1e− 4 • Max_epoch = 10, RP_epoch = 5.
Conclusion • Tackle several problems about PBC • Sensitivity to initialization. • Sensitivity to input sequence ordering • The adverse influence from outliers. • Experimental results have shown the superior performance of the proposed method over several existing PBC algorithms. • MDL framework can tackle the problem of compact clusters and sparse clusters simultaneously existed.
Personal Comments • Advantage • A heuristic way to tackle outlier problem. • Drawback • Application • clustering • classification