1 / 40

Visual Element Discovery as Discriminative Mode Seeking

Visual Element Discovery as Discriminative Mode Seeking. CMU CMU UCB. Carl Doersch , Abhinav Gupta, Alexei A. Efros. The need for mid-level representations. 6 billion images. 70 billion images. 1 billion images served daily.

kurt
Télécharger la présentation

Visual Element Discovery as Discriminative Mode Seeking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visual Element Discovery as Discriminative Mode Seeking CMU CMU UCB Carl Doersch, Abhinav Gupta, Alexei A. Efros

  2. The need for mid-level representations 6 billion images 70 billion images 1 billion images served daily 10 billion images 60 hours uploaded per minute : From Almost 90% of web traffic is visual!

  3. Discriminative patches • Visual words are too simple • Objects are too difficult • Something in the middle? (Felzenswalb et al. 2008) (Singh et al. 2012)

  4. Mid-level “Visual Elements” • Simple enough to be detected easily • Complex enough to be meaningful • “Meaningful” as measured by weak labels (Singh et al. 2012) (Doersch et al. 2012)

  5. Mid-level “Visual Elements” • Doersch et al. 2012 • Singh et al. 2012 • Jain et al. 2013 • Endres et al. 2013 • Juneja et al. 2013 (Singh et al. 2012) (Doersch et al. 2012) • Li et al. 2013 • Sun et al. 2013 • Wang et al. 2013 • Fouhey et al. 2013 • Lee et al. 2013

  6. Our goal • Provide a mathematical optimization for visual elements • Improve performance of mid-level representations.

  7. Elements as Patch Classifiers

  8. What if the labels are weak? • E.g. image has horse/no-horse • (Or even weaker, like Paris/not-Paris) • Idea: Label these all as “horse” • Problem: 10,000 patches per image, most of which are unclassifiable.

  9. The weaker the label, the bigger the problem. Task: Learn to classify Paris from Not-Paris Paris Also Paris

  10. Other approaches • Latent SVM: • Assumes we have one instance per positive image • Multiple instance learning • Not clear how to define the bags

  11. What if the labels are weak? • Negatives are negatives, positives might not be positive • Most of our data can be ignored • First: how to cluster without clustering everything (Singh et al. 2012) (Doersch et al. 2012)

  12. Mean shift

  13. Mean shift

  14. Mean shift

  15. Patch distances Input Nearest neighbor Min distance: 2.59e-4 Max distance: 1.22e-4

  16. Mean shift

  17. Paris Not Paris Negative Set

  18. Paris Not Paris Negative Set

  19. Paris Not Paris Density Ratios

  20. Paris Not Paris Density Ratios

  21. Positive Negative Adaptive Bandwidth Bandwidth

  22. Discriminative Mode Seeking • Find local optima of an estimate of the density ratio • Allow an adaptive bandwidth • Be extremely fast • Minimize the number of passes through the data

  23. Discriminative Mode Seeking • Mean shift: maximize (w.r.t. w) w Bandwidth Patch Feature Distance Centroid b

  24. Discriminative Mode Seeking B(w) is the value of b satisfying:

  25. Discriminative Mode Seeking • Distance metric: Normalized Correlation optimize s.t.

  26. Positive Negative Discriminative Mode Seeking optimize s.t. w

  27. Optimization • Initialization is straightforward • For each element, just keep around ~500 patches where wTx - b > 0 • Trivially parallelizable in MapReduce. • Optimization is piecewise quadratic s.t.

  28. Evaluation via Purity-Coverage Plot • Analogous to Precision-Recall Plot

  29. Low Purity Element 1 Element 2 Element 3 Element 4 Element 5

  30. High purity, Low Coverage Element 1 Element 2 Element 3 Element 4 Element 5

  31. Paris Not Paris Purity-Coverage Curve Purity x1e4 pixels Coverage

  32. Paris Not Paris Purity-Coverage Curve Purity x1e4 pixels Coverage

  33. Purity-Coverage Curve • Coverage for multiple elements is simply the union.

  34. This work Purity-Coverage This work, no inter-element SVM Retrained 5x (Doersch et al. 2012) LDA Retrained 5x LDA Retrained Exemplar LDA (Hariharan et al. 2012) Top 25 Elements Top 200 Elements 1 0.98 0.96 0.94 0.92 Purity 0.9 0.88 0.86 0.84 0.82 0.8 0 0.1 0.2 0.3 0.4 0.5 0 0.2 0.4 0.6 0.8 Coverage (fraction of positive dataset) Coverage (fraction of positive dataset)

  35. Results on Indoor 67 Scenes Kitchen Grocery Bowling Bakery Bathroom Elevator

  36. Results on Indoor 67 Scenes

  37. Qualitative Indoor67 Results

  38. Indoor67: Error Analysis Guess: staircase Guess: grocery store GT: corridor Ground Truth (GT): deli GT: laundromat GT: museum Guess: garage Guess: closet

  39. Thank you! More results at http://graphics.cs.cmu.edu/projects/discriminativeModeSeeking/ Paris Elements • Indoor 67 Elements Indoor 67 Heatmaps• Source code (soon) Guess: staircase Guess: grocery store GT: corridor Ground Truth (GT): deli GT: laundromat GT: museum Guess: garage Guess: closet

  40. Some New Paris Elements

More Related