1 / 38

400 likes | 614 Vues

Mathematical Programming in Support Vector Machines. Olvi L. Mangasarian University of Wisconsin - Madison. High Performance Computation for Engineering Systems Seminar MIT October 4, 2000. What is a Support Vector Machine?. An optimally defined surface

Télécharger la présentation
## Mathematical Programming in Support Vector Machines

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Mathematical Programming in Support Vector Machines**Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering Systems Seminar MIT October 4, 2000**What is a Support Vector Machine?**• An optimally defined surface • Typically nonlinear in the input space • Linear in a higher dimensional space • Implicitly defined by a kernel function**What are Support Vector Machines Used For?**• Classification • Regression & Data Fitting • Supervised & Unsupervised Learning (Will concentrate on classification)**Outline of Talk**• Generalized support vector machines (SVMs) • Completely general kernel allows complex classification (No Mercer condition!) • Smooth support vector machines • Smooth & solve SVM by a fast Newton method • Lagrangian support vector machines • Very fast simple iterative scheme- • One matrix inversion: No LP. No QP. • Reduced support vector machines • Handle large datasets with nonlinear kernels**Generalized Support Vector Machines2-Category Linearly**Separable Case A+ A-**Given m points in n dimensional space**• Represented by an m-by-n matrix A • Membership of each in class +1 or –1 specified by: • An m-by-m diagonal matrix D with +1 & -1 entries • Separate by two bounding planes, • More succinctly: where e is a vector of ones. Generalized Support Vector MachinesAlgebra of 2-Category Linearly Separable Case**Generalized Support Vector MachinesMaximizing the Margin**between Bounding Planes A+ A-**Solve the following mathematical program for some :**• The nonnegative slack variable is zero iff: • Convex hulls of and do not intersect • is sufficiently large Generalized Support Vector MachinesThe Linear Support Vector Machine Formulation**Breast Cancer Diagnosis Application97% Tenfold Cross**Validation Correctness780 Samples:494 Benign, 286 Malignant**Another Application: Disputed Federalist PapersBosch & Smith**199856 Hamilton, 50 Madison, 12 Disputed**Linear SVM: Linear separating surface:**• Set . Resulting linear surface: • Replace by arbitrary nonlinear kernel • Resulting nonlinear surface: Generalized Support Vector Machine Motivation(Nonlinear Kernel WithoutMercer Condition)**SSVM: Smooth Support Vector Machine(SVM as Unconstrained**Minimization Problem) Changing to 2-norm and measuring margin in( ) space:**Integrating the sigmoid approximation to the step function:**gives a smooth, excellent approximation to the plus function: • Replacing the plus function in the nonsmooth SVM by the smooth approximation gives our SSVM: SSVM: The Smooth Support Vector MachineSmoothing the Plus Function**Newton: Minimize a sequence of quadratic approximations**to the strongly convex objective function, i.e. solve a sequence of linear equations in n+1 variables. (Small dimensional input space.) Armijo: Shorten distance between successive iterates so as to generate sufficient decrease in objective function. (In computational reality, not needed!) Global Quadratic Convergence: Starting from any point, the iterates guaranteed to converge to the unique solution at a quadratic rate, i.e. errors get squared. (Typically, 6 to 8 iterations without an Armijo.)**SSVM with a Nonlinear KernelNonlinear Separating Surface in**Input Space**Polynomial Kernel**• Gaussian (Radial Basis) Kernel • Neural Network Kernel Examples of KernelsGenerate Nonlinear Separating Surfaces in Input Space**Taking the dual of the SVM formulation:**, gives the following simple dual problem: The variables of SSVM are related to by: LSVM: Lagrangian Support Vector MachineDual of SVM**Defining the two matrices:**Reduces the dual SVM to: The optimality condition for this dual SVM is the LCP: which, by Implicit Lagrangian Theory, is equivalent to: LSVM: Lagrangian Support Vector MachineDual SVM as Symmetric Linear Complementarity Problem**Where:**LSVM AlgorithmSimple & Linearly Convergent – One Small Matrix Inversion Key Idea: Sherman-Morrison-Woodbury formula allows the inversion inversion of an extremely large m-by-m matrix Q by merely inverting a much smaller n-by-n matrix as follows:**LSVM Algorithm – Linear Kernel11 Lines of MATLAB Code**function [it, opt, w, gamma] = svml(A,D,nu,itmax,tol)% lsvm with SMW for min 1/2*u'*Q*u-e'*u s.t. u=>0,% Q=I/nu+H*H', H=D[A -e]% Input: A, D, nu, itmax, tol; Output: it, opt, w, gamma% [it, opt, w, gamma] = svml(A,D,nu,itmax,tol); [m,n]=size(A);alpha=1.9/nu;e=ones(m,1);H=D*[A -e];it=0; S=H*inv((speye(n+1)/nu+H'*H)); u=nu*(1-S*(H'*e));oldu=u+1; while it<itmax & norm(oldu-u)>tol z=(1+pl(((u/nu+H*(H'*u))-alpha*u)-1)); oldu=u; u=nu*(z-S*(H'*z)); it=it+1; end; opt=norm(u-oldu);w=A'*D*u;gamma=-e'*D*u;function pl = pl(x); pl = (abs(x)+x)/2;**SVM classified in 178 seconds & 4497 iterations**LSVMAlgorithm – Linear KernelComputational Results • 2 Million random points in 10 dimensional space • Classified in 6.7 minutes in 6 iterations & e-5 accuracy • 250 MHz UltraSPARC II with 2 gigabyte memory • CPLEX ran out of memory • 32562 points in 123-dimensional space (UCI Adult Dataset) • Classified in141 seconds & 55 iterations to 85% correctness • 400 MHz Pentium II with 2 gigabyte memory**For the nonlinear kernel:**the separating nonlinear surface is given by: Where u is the solution of the dual problem: with Q redefined as: LSVM– Nonlinear KernelFormulation**LSVM Algorithm – Nonlinear Kernel Application 100**Iterations, 58 Seconds on Pentium II, 95.9% Accuracy**Key idea: Use a rectangular kernel.**where is a small random sample of has 1% to 10% of the rows of Typically Two important consequences: only Nonlinear separator depends on Separating surface: gives lousy results Reduced Support Vector Machines (RSVM)Large Nonlinear Kernel Classification Problems • RSVM can solve very large problems**Conventional SVM Result on Checkerboard Using 50 Random**Points Out of 1000**RSVM Result on Checkerboard Using SAME 50 Random Points Out**of 1000**RSVM on Large Classification ProblemsStandard Error over 50**Runs = 0.001 to 0.002RSVM Time = 1.24 * (Random Points Time)**Conclusion**• Mathematical Programming plays an essential role in SVMs • Theory • New formulations • Generalized SVMs • New algorithm-generating concepts • Smoothing (SSVM) • Implicit Lagrangian (LSVM) • Algorithms • Fast : SSVM • Massive: LSVM, RSVM**Chunking for massive classification:**Future Research • Theory • Concave minimization • Concurrent feature & data selection • Multiple-instance problems • SVMs as complementarity problems • Kernel methods in nonlinear programming • Algorithms • Multicategory classification algorithms**Talk & Papers Available on Web**www.cs.wisc.edu/~olvi

More Related