1 / 37

Support Vector Machines S.V.M. Special session

Support Vector Machines S.V.M. Special session. Bernhard Schölkopf & Stéphane Canu. GMD-FIRST I.N.S.A. - P.S.I. http://svm.first.gmd.de/ http://psichaud.insa-rouen.fr/~scanu/. radial SVM. linear discrimination: the separable case linear discrimination: the NON separable case

garson
Télécharger la présentation

Support Vector Machines S.V.M. Special session

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Support Vector Machines S.V.M.Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRSTI.N.S.A. - P.S.I. http://svm.first.gmd.de/ http://psichaud.insa-rouen.fr/~scanu/

  2. radial SVM

  3. linear discrimination: the separable case linear discrimination: the NON separable case quadratic discrimination radial SVM principle 3 regularization hyperparametres some benchmark results (glass data) SMV for regression Road map

  4. What ’s new with SVM Artificial Neural Networks Support Vector Machine • From biology to Machine learning • It works ! Some reason • formalization of learning : statistical learning theory - learning from data • From maths ! to Machine learning = minimization • universality learn every thing : Kernel trick • complexity control but not any thing : Margin • minimization + constraints

  5. Kernel’s trick Space functional

  6. Minimization with constraints L(x,) : the Lagrangian (Lagrange, 1788)

  7. Minimization with constraintsdual formulation Phase 1 Phase 2

  8. Well classify all examples + + + + + + Linear discriminationthe separable case wx+ b=0 + + + + + + +

  9. Well classify all examples + + + + + + + + + + + + + Linear discriminationthe separable case With the largest MARGIN Margin wx+ b=0 Margin

  10. Linear discriminationthe separable case y 1 x - 1 + + + +

  11. Linear discriminationthe separable case y = wx y 1 MARGIN x MARGIN - 1 + + + +

  12. Well classify all examples + + + + + + + + + + + + + Linear discriminationthe separable case With the largest MARGIN Margin wx+ b=0 Margin

  13. Linear classification- the separable case

  14. = c H y   0 y 0 Equality constraint integration

  15. Inequality constraint integration QP While () do not verify optimality conditions  = M-1 b and  = - H  + c + y if <0, a constraint is blocked : (i=0) (an active variable is eliminated) else if  < 0, a constraint is relaxed

  16. Error variables Linear classification : the non separable case

  17. quadratic SVM

  18. 1 5 1 n polynomial classification Rang(H) = 5 regularization needed

  19. Gaussian Kernel based S.V.M.

  20. 1 d example Class 1 : mixture of 2 gaussian Class 2 : gaussian Training set Output of the SVM for the test set Margin Support vectors

  21. C : the superior bound  : the kernel bandwidth: K(x,y) the linear system regularization H=b => (H+I)=b 3 regularization parameters

  22. Small bandwidth and large C

  23. Large bandwidth and large C

  24. Large bandwidth and small C

  25. SVMforregression

  26. Example...

  27.  small and  also

  28. Geostatistics

  29. An other way to see things (Girosi, 97)

  30. SVM history and trends The pioneers Vapnik, V.; Lerner, A. 1963 statistical learning theory Mangasarian, O. 1965, 1968 optimization Kimeldorf, G; Wahba, G; 1971 non parametric regression : splines The 2nd start : ANN, learning & computers... Boser, B.; Guyon, I..; Vapnik, V. 1992 Bennett, K.; Mangasarian, O. 1992 Trends... • Optimization : • Vapnik • Osuna, E. & Girosi, • John C. Platt • Linda Kaufman • Thorsten Joachims • Applications : • on-line handwritten C. R. • Face recognition • Text mining • ... • Learning Theory : Cortes, C. 1995. • soft margin classifier, • effective VC-dimensions • other formalisms, ...

  31. Optimization issuesQP with constraints • Box constraints • H is positive semidefinite (beware commercial solver) • Size of H ! But a lot of l are 0 or C • active constraint set, starting with l = 0 • do not compute (store) the whole H • chunk • multiclass issue !

  32. Optimization issues • Solve the whole problem • commercial : LOQO (primal-dual approach), MINOS, Matlab !!! • Vapnik : More and Toraldo (1991) • Decompose the problem • Chunking (Vapnik, 82, 92), • Ozuna & Girosi (implemented in SVMlight by Thorsten Joachims, 98) • Sequential Minimal Optimization (SMO)John C. Platt, 98 • No H : Start from 0 - active set technique (Linda Kaufman, 98) • minimize the cost function • 2nd order : Newton, • conjugate gradient, projected conjugate gradient PCG, Burges, 98 • select the relevant constraints • Interior point methods • Moré, 91, Z. Dostal, 97 and others...

  33. Some benchmark considerations (Platt 98) • Osuna’s decomposition technique permits the solution of SVMs via fixed-size QP subproblems • Using two-variable QP subproblems (SMO) does not require QP library • SMO trades off QP time for kernel evaluation time • Optimizations can dramatically reduce kernel time • Linear SVMs (useful for text categorization) • Sparse dot products • Kernel caching (good for smaller problems, Thorsten Joachims, 98) • SMO can be much faster than other techniques for some problems • what about active set and interior points technique ?

  34. open issues • VC Entropy for Margin Classifiers: learning bounds • other margin classifiers: boosting • Non “L2” (quadratic) cost function: Sparse coding (Drezet & Harrsion) • curse of dimensionality: local vs global • kernel influence (Tsuda) • applications: • classification (Weston & Watkins), • …to regression (Pontil & al.) • face detection (Fernandez & Viennet) • algorithms (Christiani & Campbell) • making bridges - other formalisms: • bayesian (Kwok), • statistical mechanics (Buhot & Gordon), • logic (Sebag), …

  35. Books in Support Vector Research • V. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1995, • Statistical Learning Theory. Wiley, 1998. • SVM introductive chapter in : • S. Haykin, Neural Networks, a Comprehensive Foundation. Macmillan, New York, NY., 1998 (2nd ed). • V. Cherkassky and F. Mulier; Learning from Data: Concepts, Theory, and Methods.Wiley, 1998. • C.J.C. Burges; 1998. A tutorial on support vector machines for pattern recognition. • Data Mining and Knowledge, Discovery, Vol 2 Number 2. • Schölkopf, B.; 1997. Support Vector Learning. PhD Thesis. • Published by: R. Oldenbourg Verlag, Munich, 1997. ISBN 3-486-24632-1. • Smola, A. J.; 1998. Learning with Kernels. PhD Thesis. Published by: GMD, Birlinghoven, 1999 • NIPS’ 97 workshop’s book : B. Schölkopf, C. Burges, A. Smola. Advances in Kernel Methods: Support Vector Machines, MIT Press, Cambridge, MA; December 1998, • NIPS’ 98 workshop’s book on large margin classifier… is coming

  36. Events in Support Vector Research ACAI '99 WORKSHOP Support Vector Machine Theory and Applications Workshop on Support Vector Machines - IJCAI'99, August 2, 1999, Stockholm, Sweden EUROCOLT'99 workshop on Kernel Methods , March 27, 1999, Nordkirchen Castle, Germany

  37. Conclusion SVM select relevant patterns in a robust way - svm.cs.rhbnc.ac.uk Matlab code available under request - scanu@insa-rouen.fr Multi class problems Small error

More Related