1 / 22

Locally Linear Support Vector Machines

Locally Linear Support Vector Machines. Ľubor Ladický Philip H.S. Torr. Binary classification task. Given : { [ x 1 , y 1 ], [ x 2 , y 2 ], … [ x n , y n ] }. x i - feature vector. y i  {-1, 1} - label. The task is :

scottbeard
Télécharger la présentation

Locally Linear Support Vector Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

  2. Binary classification task Given : { [x1, y1], [x2, y2], … [xn, yn] } xi - feature vector yi {-1, 1} - label The task is : To design y = H(x) that predicts a label y for a new x

  3. Binary classification task Given : { [x1, y1], [x2, y2], … [xn, yn] } xi - feature vector yi {-1, 1} - label The task is : To design y = H(x) that predicts a label y for a new x Several approaches proposed : Support Vector Machines, Boosting, Random Forests, Neural networks, Nearest Neighbour, ..

  4. Linear vs. Kernel SVMs Linear SVMs • Fast training and evaluation • Applicable to large scale data sets • Low discriminative power Kernel SVMs • Slow training and evaluation • Not feasible to too large data sets • Much better performance for hard problems

  5. Motivation The goal is to design an SVM with: • Good trade-off between the performance and speed • Scalability to large scale data sets (solvable using SGD)

  6. Local codings Points approximated as a weighted sum of anchor points v – anchor points γ V(x) - coordinates

  7. Local codings Points approximated as a weighted sum of anchor points Coordinates obtained using : • Distance based methods (Gemert et al. 2008, Zhou et al. 2009) • Reconstruction methods (Roweis et al.2000, Yu et al. 2009, Gao et al. 2010) v – anchor points γ V(x) - coordinates

  8. Local codings For a normalised codings and any Lipschitz function f : (Yu et al. 2009)  v – anchor points γ V(x) - coordinates

  9. Linear SVMs The form of the classifier Weights w and bias b obtained as : (Vapnik & Learner 1963, Cortes & Vapnik 1995) [xk, yk] – training samples λ- regularisation weight S – number of samples

  10. Locally Linear SVMs The decision boundary should be smooth • Approximately linear is sufficiently small region

  11. Locally Linear SVMs The decision boundary should be smooth • Approximately linear is sufficiently small region

  12. Locally Linear SVMs The decision boundary should be smooth • Approximately linear is sufficiently small region

  13. Locally Linear SVMs The decision boundary should be smooth • Approximately linear is sufficiently small region • Curvature is bounded • Functions wi(x) and b(x) are Lipschitz • wi(x) and b(x)can be approximated using local coding

  14. Locally Linear SVMs The decision boundary should be smooth • Approximately linear is sufficiently small region • Curvature is bounded • Functions wi(x) and b(x) are Lipschitz • wi(x) and b(x)can be approximated using local coding

  15. Locally Linear SVMs The classifier takes the form : WeightsW and biases b are obtained as : where

  16. Locally Linear SVMs Optimised using stochastic gradient descent (Bordes et al. 2005) :

  17. Relation to other models • Generalisation of Linear SVM on x : - representable by W = (ww ..)T and b = (b’ b’ ..)T • Generalisation of Linear SVM on γ: - representable by W = 0 andb = w • Generalisation of model selecting Latent(MI)-SVM : - representable by γ = (0, 0, .., 1, .. 0)

  18. Extension to finite kernels The finite kernel classifier takes the form where is the kernel function Weights W and b obtained as :

  19. Experiments MNIST, LETTER & USPS datasets • Anchor points obtained using K-means clustering • Coordinates evaluated on kNN (k = 8) (slow part) • Coordinates obtained using weighted inverse distance • Raw data used CALTECH-101 (15 training samples per class) • Coordinates evaluated on kNN (k = 5) • Approximated Intersection kernel used (Vedaldi & Zisserman 2010) • Spatial pyramid of BOW features • Coordinates evaluated based on histogram over whole image

  20. Experiments • MNIST

  21. Experiments • UIUC / LETTER • Caltech-101

  22. Conclusions • We propose novel Locally Linear SVM formulation with • Good trade-off between speed and performance • Scalability to large scale data sets • Easy to implement Optimal way of learning anchor points is an open question Questions ?

More Related