260 likes | 401 Vues
This paper presents a novel approach that integrates likelihood-based squashing with a probabilistic formulation of Support Vector Machines (SVMs). The proposed Squashing-SMO method addresses the computational challenges associated with training SVMs on large datasets by enabling faster training on squashed data. The study includes experiments conducted on synthetic datasets and benchmarks, showcasing that Squash-SMO and Boost-SMO achieve near-optimal performance with significantly reduced time and memory requirements compared to traditional full-SMO. This work highlights the potential for improved model interpretability and efficiency in handling large-scale data.
E N D
Towards Scalable Support Vector Machines Using Squashing • Author:Dmitry Pavlov, Darya Chudova, • Padhraic Smyth • Info. And Comp. Science • University of California • Advisor:Dr. Hsu. • Reporter:Hung Ching-Wen
Outline • 1. Motivation • 2. Objective • 3. Introduction • 4. SVM • 5. Squashing for SVM • 6.EXPERIMENTS • 7. conclusion
Motivation • SVM provide classification model with strong theoretical foundation and excellent empirical performance. • But the major drawback of SVM is the necessity to solve a large-scale quadratic programming problem.
Objective • This paper combines likelihooh-based squashing with a probabilistic formulation of SVMs, enabling fast training on squashed data sets.
Introduction • The applicability of SVMs to large datasets is limited ,because the high computational cost. • Speed-up training algorithms: • Chunking,Osuna’s decomposition method SMO • They can accelerate the training, but cannot scale well with the size of the training data.
Introduction • Reducing the computational cost : • Sampling • Boosting • Squashing(DuMouchel et. al.,Madigan et. al.) • 本文作者提出Squashing-SMO,以解決SVM的高計算成本問題
SVM • Training data:D={(xi,yi):i=1,…,N} • xi is a vector, yi=+1,-1 • In linear SVM :The linear separating classify y=<w,x>+b • w is the normal vector • b is the intercept of the hyperplane
Squashing for SVM • (1).Select a probabilistic model • P((X,Y) ∣θ) • (2).Our objective is to find mle θML
Squashing for SVM • (3). Training data:D={(xi,yi):i=1,…,N}can be grouped into Nc groups • (Xc,Yc)sq:The squashed data point placed at the cluster C • βc :the wieght
Squashing for SVM • If take the prior of w is • P(w) ~exp(-∥w∥2)
Squashing for SVM • (4).The optimization model for the squashed data:
Squashing for SVM • Important design issues for the squashing algorithm: • (1).the choice of the number and location of the squashing points • (2).to sample the values of w from the prior p(w) • (3).b can be made from the optimization model • (4).fixed w,b ,we evaluate the likelihood of training point, and repeat the selection procedure L times(L is length)
EXPERIMENTS • experiment datasets: • Synthetic data • UCI machine learning • UCI KKD repositories
EXPERIMENTS • Evalute: • Full-SMO,Srs-SMO(simple random simple),squash-SMO,boost-SMO • Run:over 100 runs • Performance: • Misclassification rate ,learning time ,the memory
EXPERIMENTS(Results on Synthetic data) • (Wf,bf):estimated by full-SMO • (Ws,bs): :estimated by squashed or sampled data
conclusion • 1.we describe how the use of squashing make the training of SVM applicable to large datasets. • 2.comparison with full-SMO show squash-SMO and boost-SMO are near-optimal performance with much lower time and memory. • 3.srs-SMO has a higher misclassification rate. • 4.squash-SMO and boost-SMO can tune parameter in cross-validation ,it is impossible to full-SMO
conclusion • 5.although the performance of squash-SMO and boost-SMO is similar on the benchmark problems. • 6. squash-SMO can offer a better interpretability of model and can be expected to run faster than SMO that do not reside in the memory.
opinion • It is a good ideal that the author describe how the use of squashing make the training of SVM applicable to large datasets. • 我們可以根據資料性質來改變w的prior distribution, 例如指數分配,Log-normal,或用無母數方法去做