1 / 23

Unsupervised Feature Selection for Multi-Cluster Data

Unsupervised Feature Selection for Multi-Cluster Data. Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University. Problem: High-dimension Data. Text document Image Video Gene Expression Financial Sensor …. Problem: High-dimension Data. Text document Image Video Gene Expression

kylar
Télécharger la présentation

Unsupervised Feature Selection for Multi-Cluster Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University

  2. Problem: High-dimension Data • Text document • Image • Video • Gene Expression • Financial • Sensor • …

  3. Problem: High-dimension Data • Text document • Image • Video • Gene Expression • Financial • Sensor • …

  4. Solution: Feature Selection Reduce the dimensionality by finding a relevant feature subset

  5. Feature Selection Techniques • Supervised • Fisher score • Information gain • Unsupervised (discussed here) • Max variance • Laplacian Score, NIPS 2005 • Q-alpha, JMLR 2005 • MCFS, KDD 2010 (Our Algorithm) • …

  6. Outline • Problem setting • Multi-Cluster Feature Selection (MCFS) Algorithm • Experimental Validation • Conclusion

  7. Problem setting • Unsupervised Multi clusters/classes Feature Selection • How traditional score-ranking methods fail:

  8. Multi-Cluster Feature Selection (MCFS) Algorithm • Objective • Select those features such that the multi-cluster structure of the data can be well preserved • Implementation • Spectral analysis to explorer the intrinsic structure • L1-regularized least-square to select best features

  9. Spectral Embedding for Cluster Analysis

  10. Spectral Embedding for Cluster Analysis • Laplacian Eigenmaps • Can unfold the data manifold and provide the flat embedding for data points • Can reflect the data distribution on each of the data clusters • Thoroughly studied and well understood

  11. Learning Sparse Coefficient Vectors

  12. Feature Selection on Sparse Coefficient Vectors

  13. Algorithm Summary • Construct p-nearest neighbor graph W • Solve generalized eigen-problem to get K eigenvectors corresponding to the smallest eigenvalues • Solve K L1-regulairzed regression to get K sparse coefficient vectors • Compute the MCFS score for each feature • Select d features according to MCFS score

  14. Complexity Analysis

  15. Experiments • Unsupervised feature selection for • Clustering • Nearest neighbor classification • Compared algorithms • MCFS • Q-alpha • Laplacian score • Maximum variance

  16. Experiments (USPS Clustering) • USPS Hand Written Digits • 9298 samples, 10 classes, 16x16 gray-scale image each

  17. Experiments (COIL20 Clustering) • COIL20 image dataset • 1440 samples, 20 classes, 32x32 gray-scale image each

  18. Experiments (ORL Clustering) • ORL face dataset • 400 images of 40 subjects • 32x32 gray-scale images 30 Classes 40 Classes 10 Classes 20 Classes

  19. Experiments (Isolet Clustering) • Isolet spoken letter recognition data • 1560 samples, 26 classes • 617 features each sample

  20. Experiments (Nearest Neighbor Classification)

  21. Experiments (Parameter Selection) • Number of nearest neighbors p: stable • Number of eigenvectors: best equal to number of classes

  22. Conclusion • MCFS • Well handle multi-class data • Outperform state-of-art algorithms • Performs especially well when number of selected features is small (< 50)

  23. Questions ?

More Related