1 / 29

Efficient Large-Scale Structured Learning

Efficient Large-Scale Structured Learning. Caltech. UC San Diego. UC San Diego. Steve Branson Oscar Beijbom Serge Belongie. CVPR 2013, Portland, Oregon. Overview. Structured prediction Learning from larger datasets. TINY IMAGES. Deformable part models. Object detection.

bruis
Télécharger la présentation

Efficient Large-Scale Structured Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Large-Scale Structured Learning Caltech UC San Diego UC San Diego Steve Branson Oscar BeijbomSerge Belongie CVPR 2013, Portland, Oregon

  2. Overview • Structured prediction • Learning from larger datasets TINY IMAGES Deformable part models Object detection Mammal Primate Hoofed Mammal Orangutan Gorilla Odd-toed Even-toed Large Datasets Cost sensitive Learning

  3. Overview • Available tools for structured learning not as refined as tools for binary classification • 2 sources of speed improvement • Faster stochastic dual optimization algorithms • Application-specific importance sampling routine Mammal Primate Hoofed Mammal Orangutan Gorilla Odd-toed Even-toed

  4. Summary • Usually, train time = 1-10 times test time • Publicly available software package • Fast algorithms for multiclass SVMs, DPMs • API to adapt to new applications • Support datasets too large to fit in memory • Network interface for online & active learning Mammal Primate Hoofed Mammal Orangutan Gorilla Odd-toed Even-toed

  5. Summary Mammal Cost-sensitive multiclass SVM • 10-50 times faster than SVMstruct • As fast as 1-vs-all binary SVM Primate Hoofed Mammal Orangutan Gorilla Odd-toed Even-toed Deformable part models • 50-1000 faster than • SVMstruct • Mining hard negatives • SGD-PEGASOS

  6. Binaryvs. Structured Structured Dataset Binary Learner Structured Output BINARY OUTPUT BINARY DATASET SVM, Boosting, Logistic Regression, etc. Object Detection, Pose Registration, Attribute Prediction, etc.

  7. Binaryvs. Structured • Pros: binary classifier is application independent • Cons: what is lost in terms of: • Accuracy at convergence? • Computational efficiency? Structured Dataset Binary Learner Structured Output BINARY OUTPUT BINARY DATASET SVM, Boosting, Logistic Regression, etc. Object Detection, Pose Registration, Attribute Prediction, etc.

  8. Binaryvs. Structured Structured Prediction Loss Binary Loss Convex Upper Bound

  9. Binaryvs. Structured Structured Prediction Loss Binary Loss Convex Upper Bound Convex Upper Bound on Structured Prediction Loss

  10. Binaryvs. Structured Application-specific optimization algorithms that: • Converge to lower test error than binary solutions • Lower test error for all amounts of train time

  11. Binaryvs. Structured Application-specific optimization algorithms that: • Converge to lower test error than binary solutions • Lower test error for all amounts of train time

  12. Structured SVM • SVMs w/ structured output • Max-margin MRF[Taskar et al. NIPS’03] [Tsochantaridis et al. ICML’04]

  13. Binary SVM Solvers Quadratic to linear in trainset size

  14. Binary SVM Solvers Quadratic to linear in trainset size Linear to independent in trainset size

  15. Binary SVM Solvers • Faster on multiple passes • Detect convergence • Less sensitive to regularization/learning rate Quadratic to linear in trainset size Linear to independent in trainset size

  16. Structured SVM Solvers Applied to SSVMs [Ratliff et al. AIStats’07] [Shalev-Shwartz et al. JMLR’13]

  17. Our Approach • Use faster stochastic dual algorithms • Incorporate application-specific importance sampling routine • Reduce train times when prediction time T is large • Incorporate tricks people use for binary methods Maximize Dual SSVM objective w.r.t. samples Random Example Importance Sample

  18. Our Approach For t=1… do • Choose random training example (Xi,Yi) • ,…,ImportanceSample() • Approx. maximize Dual SSVM objective w.r.t. i end (Provably fast convergence for simple approx. solver) Maximize Dual SSVM objective w.r.t. samples Random Example Importance Sample

  19. Recent Papers w/ Similar Ideas • Augmenting cutting plane SSVM w/ m-best solutions • Applying stochastic dual methodsto SSVMs A. Guzman-Rivera, P. Kohli, D. Batra. “DivMCuts…” AISTATS’13. S. Lacoste-Julien, et al. “Block-Coordinate Frank-Wolfe…” JMLR’13 .

  20. Applying to New Problems • Define loss function • Implement feature extraction routine • Implement importance sampling routine 3. Importance sampling routine 2. Features 1. Loss function

  21. Applying to New Problems 3. Implement importance sampling routine • Is fast • Favor samples w/ • High loss+ • Uncorrelated features: small

  22. Example: Object Detection 2. Features 3. Importance sampling routine • Add sliding window & loss into dense score map • Greedy NMS 1. Loss function

  23. Example: Deformable Part Models 2. Features 3. Importance sampling routine • Dynamic programming • Modified NMS to return diverse set of poses 1. Loss function sum of part losses

  24. Cost-Sensitive Multiclass SVM cat fly car bus dog ant cat ant fly car bus dog 2. Features e.g., bag-of-words 3. Importance sampling routine • Return all classes • Exact solution using 1 dot product per class 1. Loss function Class confusion cost 4

  25. Results: CUB-200-2011 • Pose mixture model, 312 part/pose detectors • Occlusion/visibility model • Tree-structured DPM w/ exact inference

  26. Results: CUB-200-2011 5794 training examples 400 training examples • ~100X faster than mining hard negatives and SVMstruct • 10-50X faster than stochastic sub-gradient methods • Close to convergence at 1 pass through training set

  27. Results: ImageNet Comparison to other fast linear SVM solvers Comparison to other methods for cost-sensitive SVMs • Faster than LIBLINEAR, PEGASOS • 50X faster than SVMstruct

  28. Conclusion • Orders of magnitude faster than SVMstruct • Publicly available software package • Fast algorithms for multiclass SVMs, DPMs • API to adapt to new applications • Support datasets too large to fit in memory • Network interface for online & active learning Mammal Primate Hoofed Mammal Orangutan Gorilla Odd-toed Even-toed

  29. Thanks!

More Related