ntu.sg

June 21-24, ICML2010 Haifa, Israel School of Computer Engineering Presented by: Mingkui Tan, Li Wang, Ivor W. Tsang Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets Background Experimental Results on Huge Dimensional Problems • Toy Experiments: Toy features are shown in the first two figures. • For non-monotonic feature selection, suppose B=2, one needs to select f1 and f2 as the most informative features and get the best prediction accuracy. We gradually increase noise features. The SVM(IDEAL) denotes the results obtained by only using f1 and f2. • Fig 1. Results on synthetic dataset with varying noise features • (2) Large-scale Real Data Experiments: The experiments on news20.binary(1355191×9996) Arxiv astro-ph(99757×62369), rcv1.binary(47236×20242), real-sim(20958×32309), URL0(3231961 ×16000) and URL1(3231961×20000) are reported. The number in brackets are (dimensions × instances). The comparison with SVM-RFE [2] below shows the competitiveness of our method in terms of both prediction accuracies and training time. • news20.binary Arxiv astro-phrcv1.binary • real-sim URL0 URL1 • news20.binary Arxiv astro-phrcv1.binary • real-sim URL0 URL1 • In many machine learning applications, there is a great desire of sparsity with respect to input features. In this work, we propose a new Feature Generating Machine (FGM) to learn the sparsity of features. • We propose a new sparse model and transform it into a MKL problem with exponential linear kernels via a mild convex relaxation. • We propose to solve the relaxed problem by using cutting plane algorithm which incorporates the MKL learning. • We prove that the proposed algorithm can globally converge within limited iterations. • We show that FGM scales linearly both on dimensions and instances. • Empirically, FGM shows great scalability for non-monotonic feature selection on large-scale and very high dimensional problems. From Fig 1, FGM-B(FGM) shows the best performance in terms of prediction accuracy, sparsity and training time. Model Introduce a 0-1 vector d to control the status of features (“1” means selected, “0” means not). As shown as follows: Suppose we want to select B features, we obtain a new sparse model: It can be convex relaxed and converted to a MKL problem with exponential linear kernels: where . In general, Methods We propose to use cutting plane algorithm to solve the MKL problem with exponential linear kernels[4]: References Convergence Property Theorem 1: Assume that the sub-problem of MKL in step 2 and the most violated d selection in step 3 can be exactly solved, FGM can globally converge after a finite number of steps[1]. Theorem 2: FGM scales linearly in computation with respect to n and m. In other words, FGM takes O(mn) time complexity [3]. [1] Chen, J. and Ye, J. Training SVM with indefinite kernels. In ICML, 2008. [2] Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. Gene selection for cancer classification using support vector machines. Machine Learning, 46:389–422, 2002. [3] Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Keerthi, S. S., and Sundararajan, S. A dual coordinate descent method for large-scale linear SVM. In ICML, 2008. [4] Li, Y.F., Tsang, I.W., Kwok, J.T., and Zhou, Z.H. Tighter and convex maximum margin clustering. In AISTATS, 2009b. … www.ntu.edu.sg

ntu.sg

ntu.sg

Presentation Transcript

ntu.sg

by L Goel elkgoel@ntu.sg Professor & Head of Division of Power Engineering

ntu.sg

ntu.sg

Presentation Transcript

ntu.sg

by L Goel elkgoel@ntu.sg Professor &amp; Head of Division of Power Engineering

by L Goel elkgoel@ntu.sg Professor & Head of Division of Power Engineering