1 / 42

Context as Supervisory Signal: Discovering Objects with Predictable Context

Context as Supervisory Signal: Discovering Objects with Predictable Context. Carl Doersch , Abhinav Gupta, Alexei Efros. Unsupervised Object Discovery. Children learn to see without millions of labels Is there a cue hidden in the data that we can use to learn better representations?. ….

kasia
Télécharger la présentation

Context as Supervisory Signal: Discovering Objects with Predictable Context

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Context as Supervisory Signal:Discovering Objects with Predictable Context Carl Doersch, Abhinav Gupta, Alexei Efros

  2. Unsupervised Object Discovery • Children learn to see without millions of labels • Is there a cue hidden in the data that we can use to learn better representations? …

  3. How hard is unsupervised object discovery? • In state of the art literature it is still commonplace to select 20 categories from Caltech 1011: • Le et al. got in the New York Times for spending 1M+ CPU hours and reporting just 3 objects.2 1Faktor & Irani, “Clustering by Composition” – Unsupervised Discovery of Image Categories, ECCV 2012 2Le et al, Building High-level Features Using Large Scale Unsupervised Learning, ICML 2012

  4. Object Discovery = Patch Clustering

  5. Object Discovery = Patch Clustering

  6. Object Discovery = Patch Clustering

  7. Clustering algorithms

  8. Clustering algorithms

  9. Clustering algorithms?

  10. Clustering as Prediction Y value is here Prediction: Actual

  11. Clustering as Prediction Y value is here Prediction: Actual

  12. What cue is used for clustering here? • The datapoints are redundant! • Within a cluster, knowing one coordinate tells you the other • Images are redundant too! • Contribution #1 (of 3): use patch-context redundancy for clustering.

  13. More rigorously • Unlike standard clustering, we measure the affinity of a whole cluster rather than pairs • Datapointsx can be divided into x=[p,c] • Affinity is defined by how ‘learnable’ is the function c=F(p) • Or P(c|p) • Measure how well the function has been learned via leave-one-out cross validation

  14. Previous whispers of this idea • Weakly supervised methods • Probabilistic clustering • choose clusters s.t. dataset can be compressed • Autoencoders& RBM’s (Olshausen & Field 1997)

  15. Clustering as Prediction Y value is here Prediction: Actual

  16. When is a prediction “good enough”? • Image distance metrics are unreliable Input Nearest neighbor Min distance: 2.59e-4 Max distance: 1.22e-4

  17. It’s even worse during prediction • Horizontal gratings are very close together in HOG feature space, and are easy to predict • Cats have more complex shape that is farther apart and harder to predict. ? ?

  18. Our task is more like this:

  19. Contribution #2 (of 3): Cluster Membership as Hypothesis Testing P1(X|Y) Y value is here P2(X|Y)

  20. What are the hypotheses? • thing hypothesis • The patch is a member of the proposed cluster • Context is best predicted by transferring appearance from other cluster members • stuff hypothesis • Patch is ‘miscellaneous’ • The best we can do is model the context as clutter or texture, via low-level image statistics.

  21. Our Hypotheses Prediction Region (Context) Stuff-based Prediction Stuff Model … Low Likelihood Thing Model Predicted Correspondence True Region: High Likelihood Correspondence Thing-based Prediction Condition Region (Patch)

  22. Real predictions aren’t so easy!

  23. What to do? • Generate multiple predictions from different images, and take the best • Take bits and pieces from many images • Still not enough? Generate more with warping. • With arbitrarily small pieces and lots of warping, you can match anything!

  24. Generative Image Models • Long history in computer vision • Come up with a distribution P(c|p), without looking at c • Distribution is extremely complicated due to dependencies between features - Hinton et al. 1995 - Weber et al. 2000 - Fergus et al. 2003 - Fei-Fei et al. 2007 - Sudderth et al. 2005 - Wang et al. 2006

  25. Generative Image Models Object Part Locations Features

  26. Inference in Generative Models Object Part Locations Features

  27. Contribution #3: Likelihood Factorization • Let c be context and p be a patch • Break c chunks c1…ck(e.g. HOG cells) • Now the problem boils down to estimating

  28. Our Hypotheses Prediction Region (Context) Stuff Model … Thing Model Predicted Correspondence True Region: Correspondence Condition Region (Patch)

  29. Surprisingly many good properties • This is not an approximation • Broke a hard generative problem into a series of easy discriminative ones • Latent variable inference can use maximum likelihood: it’s okay to overfit to condition region • ci’s do not need to be integrated out • Likelihoods can be compared across models

  30. Recap of Contributions • Clustering as prediction • Statistical hypothesis testing of stuff vs.thing to determine cluster membership • Factorizing the computation of stuffand thing data likelihoods

  31. Stuff Model Condition Region Find NN’s for this Region and transfer Prediction Region

  32. Faster Stuff Model Random Image Patches Dictionary (GMM of patches) x1,000,000 x5,000 Conditional:

  33. Faster Stuff Model Condition Region (e) Estimate Cluster Membership x5,000 Cond’l: Region Specified In Dictionary Marginalization Prediction Region Prediction x5,000 Features from condition region Cond’l:

  34. Thing Model Prediction Region Condition Region … Query image … … Image i …

  35. Finite and Occluded ‘Things’ • If the thingmodel believes it can’t predict well, ‘mimic’ the background model • When the thing model believes the prediction region corresponds to regions previously predicted poorly • When the thing model has been predicting badly nearby (handles occlusion!) … Query image … … Image i …

  36. Clusters discovered on 6 Pascal Categories 1 0.9 0.8 0.7 0.6 Purity 0.5 0.4 Visual Words .63 (.37) Russel et al. .66 (.38) 0.3 HOG Kmeans .70 (.40) Singh et al. .83 (.47) 0.2 Our Approach .83 (.48) 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Cumulative Coverage

  37. Results: Pascal 2011, unsupervised Rank Initial Final 1 2 5 6

  38. Results: Pascal 2011, unsupervised Rank Initial Final 19 20 31 47

  39. Results: Pascal 2011, unsupervised Rank Initial Final 153 200 220 231

  40. Errors

  41. Unsupervised keypoint transfer R F Roof R B Roof L F Roof L B Roof R Mirror L Taillight L Headlight L B Wheel Ctr R R Wheel Ctr R B Wheel Ctr R Headlight [sic] R F Wheel Ctr L Mirror

  42. Keypoint Precision-Recall 1 0.9 Ours Exemplar LDA 0.8 0.7 Precision 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.025 0.05 0.075 0.1 0.125 0.15 Recall

More Related