1 / 85

Multiple Kernel Learning

Multiple Kernel Learning. Manik Varma Microsoft Research India. A Quick Re view of SVMs. Margin = 2 /  w t w.  > 1. Misclassified point.  < 1. b. Support Vector.  = 0. Support Vector. w. w t x + b = -1.  = 0. w t x + b = 0. w t x + b = +1.

Mercy
Télécharger la présentation

Multiple Kernel Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Kernel Learning Manik Varma Microsoft Research India

  2. A Quick Review of SVMs Margin = 2 /wtw  > 1 • Misclassified point  < 1 b • Support Vector  = 0 • Support Vector w wtx + b = -1  = 0 wtx + b = 0 wtx + b = +1

  3. Primal P = Minw,,b½wtw + Ct • s. t. Y(Xtw + b1) 1 –  • 0 • Dual D = Max 1t – ½tYKY • s. t. 1tY = 0 • 0    C The C SVM Primal and Dual

  4. Primal P = Minxf0(x) • s. t. fi(x) 0 1  i  N • hi(x)= 0 1  i  M • Lagrangian L(x,,) = f0(x) + i ifi(x) + i ihi(x) • Dual D = Max,Minx L(x,,) • s. t.  0 Duality

  5. The Lagrange dual is always concave (even if the primal is not convex) and might be an easier problem to optimize • Weak duality : P  D • Always holds • Strong duality : P = D • Does not always hold • Usually holds for convex problems • Holds for the SVM QP Duality

  6. If strong duality holds, then for x*, * and * to be optimal the following KKT conditions must necessarily hold • Primal feasibility : fi(x*) 0 & hi(x*)= 0 for 1  i • Dual feasibility : *  0 • Stationarity : xL(x*, *,*) = 0 • Complimentary slackness : i*fi(x*)= 0 • If x+, + and + satisfy the KKT conditions for a convex problem then they are optimal Karush-Kuhn-Tucker (KKT) Conditions

  7. Linear : K(xi,xj) = xit-1xj • Polynomial : K(xi,xj) = (xit-1xj + c)d • Gaussian (RBF) : K(xi,xj) = exp( –kk(xik – xjk)2) • Chi-Squared : K(xi,xj) = exp( –2(xi, xj) ) • Sigmoid : K(xi,xj) = tanh(xitxj – c) •  should be positive definite, c  0,   0 and d should be a natural number Some Popular Kernels

  8. Improve accuracy and generalization • Learn an RBF Kernel : K(xi,xj) = exp( – k (xik – xjk)2) Advantages of Learning the Kernel

  9. Kernel Parameter Setting - Underfitting

  10. Kernel Parameter Setting

  11. Kernel Parameter Setting – Overfitting

  12. Improve accuracy and generalization • Learn an RBF Kernel : K(xi,xj) = exp( – k (xik – xjk)2) • Test error as a function of  Advantages of Learning the Kernel

  13. Perform non-linear feature selection • Learn an RBF Kernel : K(xi,xj) = exp(–kk(xik – xjk)2) • Perform non-linear dimensionality reduction • Learn K(Pxi, Pxj) where P is a low dimensional projection matrix parameterized by  • These are optimized for the task at hand such as classification, regression, ranking, etc. Advantages of Learning the Kernel

  14. Multiple Kernel Learning • Learn a linear combination of given base kernels • K(xi,xj) = kdkKk(xi,xj) • Can be used to combine heterogeneous sources of data • Can be used for descriptor (feature) selection Advantages of Learning the Kernel

  15. MKL learns a linear combination of base kernels • K(xi,xj) = kdkKk(xi,xj) MKL – Geometric Interpretation d11 d22  = d33

  16. Suppose we’re given a simplistic 1D shape feature for a binary classification problem • Define a linear shape kernel : Ks(si,sj) = sisj • The classification accuracy is 100% but the margin is very small MKL – Toy Example s

  17. Suppose we’re now given addition 1D colour feature • Define a linear colour kernel : Kc(ci,cj) = cicj • The classification accuracy is also 100% but the margin remains very small MKL – Toy Example c

  18. MKL learns a combined shape-colour feature space • K(xi,xj) = dKs(xi,xj) + (1 – d) Kc(xi,xj) MKL – Toy Example c c s s d = 0 d = 1

  19. MKL – Toy Example

  20. MKL learns a combined shape-colour feature space • K(xi,xj) = dKs(xi,xj) + (1 – d) Kc(xi,xj) MKL – Another Toy Example c c s s d = 0 d = 1

  21. MKL – Another Toy Example

  22. Chair Object Categorization Schooner ? Ketch Taj Panda

  23. Database collected by Fei-Fei et al. [PAMI 2006] The Caltech 101 Database

  24. The Caltech 101 Database – Chairs

  25. The Caltech 101 Database – Bikes

  26. Features • Geometric Blur [Berg and Malik, CVPR 01] • PHOW Gray & Colour [Lazebnik et al., CVPR 06] • Self Similarity [Shechtman and Irani, CVPR 07] • Kernels • RBF for Geometric Blur • K(xi,xj) = exp( – 2(xi,xj)) for the rest Caltech 101 – Features and Kernels

  27. Experimental Setup • 102 categories including Background_Google and Faces_easy • 15 training and 15 test images per category • 30 training and up to 15 test images per category • Results summarized over 3 random train/test splits Caltech 101 – Experimental Setup

  28. Caltech 101 – MKL Results

  29. Caltech 101 – Comparisons

  30. Caltech 101 – Over Fitting?

  31. Caltech 101 – Over Fitting?

  32. Caltech 101 – Over Fitting?

  33. Caltech 101 – Over Fitting?

  34. Caltech 101 – Over Fitting?

  35. Experimental Setup • 33 topics chosen each with more than 60 images • Ntrain = [10, 15, 20, 25, 30] • The remaining images are used for testing • Features • PHOG 180 & 360 • Self Similarity • PHOW Gray & Colour • Gabor filters • Kernels • Pyramid Match Kernel & Spatial Pyramid Kernel Wikipedia MM Subset

  36. LMKL [Gonen and Alpaydin, ICML 08] • GS-MKL [Yang et al., ICCV 09] Wikipedia MM Subset

  37. FERET faces [Moghaddam and Yang, PAMI 2002] Feature Selection for Gender Identification Males Females

  38. Experimental setup • 1053 training and 702 testing images • We define an RBF kernel per pixel (252 kernels) • Results summarized over 3 random train/test splits Feature Selection for Gender Identification

  39. Feature Selection Results Uniform MKL = 92.6  0.9 Uniform GMKL = 94.3  0.1

  40. Localize a specified object of interest if it exists in a given image Object Detection

  41. The PASCAL VOC Challenge Database

  42. PASCAL VOC Database – Cars

  43. PASCAL VOC Database – Dogs

  44. PASCAL VOC 2009 Database Statistics

  45. Detect by classifying every image window at every position, orientation and scale • The number of windows in an image runs into the hundred millions • Even if we classify a window in a second it will take us many days to detect a single object in an image Bird Detection By Classification No Bird

  46. Fast Linear SVM Jumping Window Quasi-linear SVM Feature vector Non-linear SVM PHOW Gray Fast Detection Via a Cascade PHOW Colour PHOG PHOG Sym Visual Words Self Similarity

  47. First stage • Linear SVM • Jumping windows/Branch and Bound • Time = O(#Windows) • Second stage • Quasi-linear SVM • 2 kernel • Time = O(#Windows * #Dims) • Third stage • Non-linear SVM • Exponential 2 kernel • Time = O(#Windows * #Dims * #SVs) • Th MKL Detection Overview

  48. Predictions are evaluated using precision-recall curves based on bounding box overlap • Area Overlap = BgtBp / BgtBp • Valid prediction if Area Overlap > ½ PASCAL VOC Evaluation Ground truth Bgt BgtBp Predicted Bp

  49. Some Examples of MKL Detections

  50. Some Examples of MKL Detections

More Related