160 likes | 285 Vues
This paper presents a methodology for automatically estimating the number of clusters in unlabeled datasets, focusing on innovative approaches such as reordered dissimilarity images (RDI) and distortion-based extraction (DBE). It describes experiments conducted on synthetic and real datasets to validate the performance of the proposed methods in comparison to existing techniques, highlighting advantages such as ease of parameter setting and robust clustering outcomes. The findings suggest a preference for larger clusters and propose combining cluster analysis with image processing techniques for improved results.
E N D
Automatically Determining the Number ofClusters in Unlabeled Data Sets Presenter : Lin, Shu-Han Authors : Liang Wang, Christopher Leckie, KotagiriRamamohanarao, and James Bezdek IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING(TKD), 2009
Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Comments
Motivation “reordered dissimilarity image” (RDI) Howtoautomaticallyestimatethenumberofclustersinunlabeleddataset?
Objectives ExtractDarkBlock 4
Methodology– VAT VAT 5
Methodology– VAT VAT 6
Methodology– DBE 1 2 3 4 7
Methodology– DBE1.Dissimilaritytransformationandimagesegmentation f(t) Graythreshfunction(Matlab):σ 8 after before
Methodology– DBE2. Directionalmorphologicalfilteringofthebinaryimage a=2% a=1% Symmetric: along horizontal and vertical directions Linear: along the same direction 9
Methodology– DBE3. Distancetransformanddiagonalprojectionoffilteredimage Nearest non-zero pixel 10
Methodology– DBE4. Detection of major peaks and valleys in the projectionsignal Smooth(parameter:a) Major“peaks/valleys”(parameter:a) 11
Experiments – ComparewithCCE Syntheticdatasets Realdatasets 14
Conclusions • The most method prefer “larger” rather than “smaller” clusters • The DBE • (Nearly) Automatically estimating the number of clusters • Just one easy-to-set parameter: a
Comments • Advantage • An visual assessment of cluster tendency (VAT) • Combine the cluster analysis problem with the image processing tech. • Drawback • … • Application • …