230 likes | 337 Vues
This research project explores a novel clustering paradigm to discover various non-redundant clustering solutions in data. The proposed Orthogonal Clustering Framework uses orthogonal subspaces and feature spaces to find multiple meaningful groupings. By combining methods like linear discriminant analysis and singular value decomposition, this approach automates the identification of the number of clusters and ensures non-redundancy in the results. Experimental evaluations on synthetic and real-world datasets demonstrate the effectiveness of the framework in discovering diverse and interesting clustering solutions. The methodology allows for flexibility in applying different clustering and dimensionality reduction algorithms. Overall, this framework offers a valuable tool for data clustering with applications in various domains.
E N D
Learning multiple nonredundantclusterings Presenter : Wei-Hao Huang Authors : Ying Gui, Xiaoli Z. Fern, Jennifer G. DY TKDD, 2010
Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments
Motivation • Data exist multiple groupings that are reasonable and interesting from different perspectives. • Traditional clustering is restricted to finding only one single clustering.
Objectives • To propose a new clustering paradigm for finding all non-redundant clustering solutions of the data.
Methodology • Orthogonal clustering • Cluster space • Clustering in orthogonal subspaces • Feature space • Automatically Finding the number of clusters • Stopping criteria
Orthogonal Clustering Framework X (Face dataset)
Orthogonal clustering ) Residue space
Clustering in orthogonal subspaces Projection Y=ATX • Feature space • linear discriminant analysis (LDA) • singular value decomposition (SVD) • LDA v.s. SVD • where
Clustering in orthogonal subspaces A(t)= eigenvectors of Residue space
Compare moethod1 and mothod2 A(t)= eigenvectors of M’=M then P1=P2 • Residue space • Moethod1 • Moethod2 • Moethod1 is a special case of Moethod2.
Experiments • To use PCA to reduce dimensional • Clustering • K-means clustering • Smallest SSE • Gaussian mixture model clustering (GMM) • Largest maximum likelihood • Dataset • Synthetic • Real-world • Face, WebKB text, Vowel phoneme, Digit
Experiments Evaluation
Experiments Synthetic
Experiments Face dataset
Experiments WebKB dataset Vowe phoneme dataset
Experiments Digit dataset
Experiments • Finding the number of clusters • K-means Gap statistics
Experiments • Finding the number of clusters • GMMBIC • Stopping Criteria • SSE is less than 10% at first iteration • Kopt=1 • Kopt> Kmax Select Kmax • Gap statistics • BIC Maximize value of BIC
Experiments Synthetic dataset
Experiments Face dataset
Experiments WebKB dataset
Conclusions • To discover varied interesting and meaningful clustering solutions. • Method2 is able to apply any clustering and dimensionality reduction algorithm.
Comments • Advantages • Find Multiple non-redundant clustering solutions • Applications • Data Clustering