Lagrangian Support Vector Machines

Lagrangian Support Vector Machines David R. Musicant and O.L. Mangasarian December 1, 2000 Carleton College

Lagrangian SVM (LSVM) • Fast algorithm: simple iterative approach expressible in 11 lines of MATLAB code • Requires no specialized solvers or software tools, apart from a freely available equation solver • Inverts a matrix of the order of the number of features (in the linear case) • Extendible to nonlinear kernels • Linear convergence

The Discrimination ProblemThe Fundamental 2-Category Linearly Separable Case A+ A- Separating Surface:

Separate by two bounding planes: such that: • More succinctly:where e is a vector of ones. The Discrimination ProblemThe Fundamental 2-Category Linearly Separable Case • Given m points in the n dimensional space Rn • Represented by an mx n matrix A • Membership of each point Ai in the classes +1 or -1 is specified by: • An m x m diagonal matrix D with along its diagonal

Preliminary Attempt at the (Linear) Support Vector Machine:Robust Linear Programming • Solve the following mathematical program: where y = nonnegative error (slack) vector • Note: y = 0 if convex hulls of A+ and A- do not intersect.

The (Linear) Support Vector MachineMaximize Margin Between Separating Planes A+ A-

The (Linear) Support Vector Machine Formulation • Solve the following mathematical program: where y = nonnegative error (slack) vector • Note: y = 0 if convex hulls of A+ and A- do not intersect.

SVM Reformulation • Add g2 to the objective function, and use 2-norm of slack variable y: • Standard SVM formulation: Experiments show that this does not reduce generalization capability.

Simple Dual Formulation • Dual of this problem is: • I = Identity matrix • Non-negativity constraints only • Leads to a very simple algorithm Formulation ideas explored by Friess, Burges, others

Simplified notation • Make substitution in dual problem to simplify: • Dual problem then becomes: • When computing , we use: • Sherman-Morrison-Woodbury identity: • Only need to invert a matrix of size (n+1) x (n+1)

Deriving the LSVM Algorithm • Start with dual formulation: • Karush-Kuhn-Tucker necessary and sufficient optimality conditions are: • This is equivalent to the following equation:

LSVM Algorithm • Last equation generates a fast algorithm if we replace the lhs u by & the rhs u by as follows: • Algorithm converges linearly if: • In practice, we take: • Only one matrix inversion is necessary • Use SMW identity

LSVM Algorithm – Linear Kernel11 Lines of MATLAB Code function [it, opt, w, gamma] = svml(A,D,nu,itmax,tol)% lsvm with SMW for min 1/2*u'*Q*u-e'*u s.t. u=>0,% Q=I/nu+H*H', H=D[A -e]% Input: A, D, nu, itmax, tol; Output: it, opt, w, gamma% [it, opt, w, gamma] = svml(A,D,nu,itmax,tol); [m,n]=size(A);alpha=1.9/nu;e=ones(m,1);H=D*[A -e];it=0; S=H*inv((speye(n+1)/nu+H'*H)); u=nu*(1-S*(H'*e));oldu=u+1; while it<itmax & norm(oldu-u)>tol z=(1+pl(((u/nu+H*(H'*u))-alpha*u)-1)); oldu=u; u=nu*(z-S*(H'*z)); it=it+1; end; opt=norm(u-oldu);w=A'*D*u;gamma=-e'*D*u;function pl = pl(x); pl = (abs(x)+x)/2;

Depends only on scalar products of rows of G • Therefore, substitute a kernel function: LSVM with Nonlinear Kernel • Start with dual problem • Substitute to obtain:

Nonlinear kernel algorithm • Define • Then algorithm is identical to linear case: • One caveat: SMW identity no longer applies, unless an explicit decomposition for the kernel is known: • LSVM in its current form is effective on moderately sized nonlinear problems.

Experiments • Compared LSVM with standard SVM (SVM-QP) for generalization accuracy and running time • CPLEX 6.5 and SVMlight 3.10b • Tuning set w/ tenfold cross-validation used to find appropriate values of n • Demonstrated that LSVM performs well on massive problems • Data generated with NDC data generator • All experiments run on Locop2 • 400 MHz Pentium II Xeon, 2 Gigabytes memory • Windows NT Server 4.0, Visual C++ 6.0

LSVM on UCI Datasets LSVM is extremely simple to code, and performs well.

LSVM on Massive Data • NDC (Normally Distributed Clusters) data • This is all accomplished with MATLAB code, in core • Method is extendible to out of core implementations LSVM classifies massive datasets quickly.

LSVM with Nonlinear Kernels Nonlinear kernels improve classification accuracy.

Checkerboard Dataset

k-Nearest Neighbor Algorithm

LSVM on Checkerboard • Early stopping: 100 iterations • Finished in 58 seconds

LSVM on Checkerboard • Stronger termination criteria (100,000 iterations) • 2.85 hours

Conclusions and Future Work • Conclusions: • LSVM is an extremely simple algorithm, expressible in 11 lines of MATLAB code • LSVM performs competitively with other well-known SVM solvers, for linear kernels • Only a single matrix inversion in n+1 dimensions (where n is usually small) is required • LSVM can be extended for nonlinear kernels • Future work • Out-of-core implementation • Parallel processing of data • Integrating reduced SVM or other methods for reducing the number of columns in kernel matrix

Lagrangian Support Vector Machines

Lagrangian Support Vector Machines

Presentation Transcript

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines