1 / 23

Online and Batch Learning of Pseudo-Metrics

Online and Batch Learning of Pseudo-Metrics. Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram Singer, Google Inc. Andrew Y. Ng, Stanford University. Motivating Example. Our Technique. Map instances into a space in which distances correspond to labels. Outline.

baris
Télécharger la présentation

Online and Batch Learning of Pseudo-Metrics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram Singer, Google Inc. Andrew Y. Ng, Stanford University

  2. Motivating Example

  3. Our Technique • Map instances into a space in which distances correspond to labels

  4. Outline • Distance learning setting • Large margin for distances • An online learning algorithm • Online loss analysis • A dual version • Experiments: • Online - document filtering • Batch - handwritten digit recognition

  5. Problem Setting • Training examples: • two instances • similarity label • Hypotheses class: Pseudo-metrics matrix symmetric positive semi-definite matrix

  6. Large Margin for Pseudo-Metrics • Sample S is -separated w.r.t. a metric

  7. Batch Formulation s.t. s.t.

  8. we want that If: If: we want that Pseudo-metric OnlineLearning Algorithm (POLA) For • Gettwo instances • Calculate distance • Predict • Get true label and suffer hinge-loss • Update matrix and threshold

  9. Start with An example defines a half-space is the projection of onto this half-space is the projection of onto the PSD cone Core Update: Two Projections PSD cone All zero loss matrices

  10. Online Learning • Goal – minimize cumulative loss • Why Online? • Online processing tasks (e.g. Text Filtering) • Simple to implement • Memory and run-time efficient • Worst-case bounds on the performance • Online to batch conversions

  11. “Complexity” of Loss suffered by Online Loss Bound • sequence of examples s.t. • any fixed matrix and threshold • Then, Loss bound does not depend on dimension

  12. Incorporating Kernels • Matrix A can be written as , where • Therefore:

  13. Online Experiments • Task: Document filtering according to topics • Dataset: Reuters-21578 • 10,000 documents • Documents labeled as Relevant and Irrelevant • A few relevant documents (1% - 10% of entire set) • Algorithms: • POLA • 1 Nearest Neighbor (1-NN) • Perceptron Algorithm • Perceptron Algorithm with Uneven Margins (PAUM) (Li, Zaragoza, Herbrich, Shawe-Taylor, Kandola)

  14. POLA for Document Filtering • Get a document • Calculate distance to relevant documents observed so far using current matrix • Predict: document is relevant iff the distance to the closest relevant document is smaller than the current threshold • Get true label • Update matrix and threshold

  15. POLA error POLA error POLA error PAUM error Perceptron error 1-NN error Document Filtering Results • Each blue point corresponds to one topic • Y-axis designates the error of POLA • Points beneath the black diagonal line mean that POLA wins

  16. Batch Experiments • Task: Handwritten digits recognition • Dataset: MNIST dataset • 45 binary classification problems (all pairs) • 10,000 training examples • 10,000 test examples • Algorithms: Used k-NN with various metrics: • Pseudo-metric learned by POLA • Euclidean distance • Metric induced by Fisher Discriminant Analysis (FDA) • Metric learned by Relevant Component Analysis (RCA) (Bar-Hillel, Hertz, Shental, and Weinshall)

  17. MNIST Results • Each blue point corresponds to one binary classification problem • Y-axis designates the error of POLA • Points beneath the black diagonal line mean that POLA wins RCA error FDA error Euclidean distance error RCA was applied after using PCA as a pre-processing step

  18. Toy problem A color-coded matrix of Euclidean distances between pairs of images

  19. Metric found by POLA

  20. Mapping found by POLA • Our Pseudo-metrics:

  21. Mapping found by POLA

  22. Summary and Extensions • An online algorithm for learning pseudo-metrics • Formal properties, good experimental results Extensions: • Alternative regularization schemes to the Frobenius norm • “Learning to learn”: • Learning a metric from one set of classes and apply to another set of related classes

  23. Hello  bye  = w ¢ x

More Related