230 likes | 402 Vues
A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm. Presenter : Lin, Shu -Han Authors : Jeen-Shing Wang, Jen- Chieh Chiang. PR (2008 ). Outline. Introduction of SVC Motivation Objective Methodology Experiments
E N D
A cluster validity measurewith a hybrid parameter search method for the support vector clustering algorithm Presenter : Lin, Shu-Han Authors : Jeen-Shing Wang, Jen-Chieh Chiang PR (2008)
Outline • IntroductionofSVC • Motivation • Objective • Methodology • Experiments • Conclusion • Comments
SVC • SVC is from SVMs • SVMs is supervised clustering technique • Fast convergence • Good generalization performance • Robustness for noise • SVC is unsupervised approach • Data points map to HD feature space using a Gaussian kernel. • Look for smallest sphere enclose data. • Map sphere back to data space to form set of contours. • Contours are treated as the cluster boundaries. 3
SVC - Sphere Analysis a To find the minimal enclose sphere with soft margin: To solve this problem, the Lagrangian function: 4
SVC - Sphere Analysis Karush-Kuhn-Tucker complementarity: 6
SVC -Sphere Analysis Wolfe dual optimization problem a • Bound SV; Outlier To find the minimal enclose sphere with soft margin: C : existence of outliersallowed 7
SVC -Sphere Analysis Mercer kernel Kernel: Gaussian a Gaussian function: The distance (similarity) between x and a: q :|clusters|&thesmoothness/tightnessoftheclusterboundaries. 8
Motivation • DrawbacksofClustervalidation • Compactness • Differentdensitiesorsize • Asthe#ofclustersincreases,itwillmonotonicdecrease • Separation • Irregularclusterstructures 9
Motivation • Theirpreviousstudy • Canhandle • Differentsizes • Differentdensities • Arbitraryshape • But… 10
Objectives–AclustervaliditymethodandaparametersearchalgorithmforSVCObjectives–AclustervaliditymethodandaparametersearchalgorithmforSVC • Autodeterminethetwoparameter: • Increasingqleadtoincreasing#ofclusters • Cregulatestheexistenceofoutliersandoverlappingclusters ToIdentifytheoptimalstructure 11
Methodology- Idea N=64,max#ofcluster=,8 qisrelatedtothedensitiesoftheclusters Eachclusterstructurecorrespondstoanintervalofq Identifytheoptimalstructureisequivalenttofindingthelargestinterval 12
Methodology- Problem Howtolocateoverallsearchrangeofq Howtodetectoutliers/noises Howtoidentifythelargestinterval 13
Methodology – Locaterangeofq • Lowerbound • Upperbound:EmployK-Meanstogetclusters,andgetvarianceofeachclustersvi Ascendingorder:clustersize n=3,thebiggest3clusters’variance 14
Methodology – Outlier Detection singleton outlier AndwegetCopt,removethese outlier Setq=qmax,thetightestofq 15
Methodology – the largest interval • Fibonaccisearch:locatetheintervalwheretheclusterstructureisthesame • Bisectionsearch • n:iteration 17
Methodology– Overview Locaterangeofq the largest interval Outlier Detection 18
Experiments - Outlier Copt 20
Experiments ? 21
Conclusions • Anewmeasure: • Inspiredfromtheobservationsofq • DeterminetheoptimalclusterstructurewithitscorrespondingrangeofqandC q C
Comments • Advantage • Inspiredfromobservationofparameter • Drawback • … • Application • SVC • DBSCAN:MinPts/Eps