1 / 26

Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

Finding Things: Image Parsing with Regions and Per-Exemplar Detectors. CVPR2013 Oral. Outline. Introduction Approach Experiments Conclusions. Introduction. The problem of image parsing , or labeling each pixel in an image with its semantic category.

chun
Télécharger la présentation

Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding Things: Image Parsing with Regions and Per-Exemplar Detectors CVPR2013 Oral

  2. Outline • Introduction • Approach • Experiments • Conclusions

  3. Introduction • The problem of image parsing, or labeling each pixel in an image with its semantic category. • Our goal is achieving broad coverage – the ability to recognize hundreds or thousands of object classes that commonly occur in everyday street scenes and indoor environments.

  4. Introduction • A major challenge in doing this is posed by the non-uniform statistics of these classes in realistic scene images. • Two main categories : • Stuff : A small number of classes – mainly ones associated with large regions or “stuff,” such as road, sky, trees, buildings, etc. • Things : people, cars, dogs, mailboxes, vases, stop signs – occupy a small percentage of image pixels and have relatively few instances each.

  5. Introduction • An image parsing system that integrates region-based cues with the promising novel framework of per-exemplar. • First to transfer masks using per-exemplar detectors • Output a dense many-category labeling.

  6. Approach • Region-Based Parsing • Detector-Based Parsing • SVM Combination and MRF Smoothing

  7. Approach

  8. Parsing pipeline • Obtain a retrieval set of globally similar training images • Region based data term (ER) is computed using Superparsing system • Detector based data term (ED) : • Run per-exemplar detectors for exemplars in the retrieval set • Transfer masks from all detections above a set detection • threshold to test image • Detector data term is computed as the sum of these • masks scaled by their detection score • Combine these two data terms by training a SVM on the concatenation of ED and ER • Smooth the SVM output (ESVM) using a MRF

  9. Region-Based Parsing [27] J. Tighe and S. Lazebnik. SuperParsing: Scalable nonparametric image parsing with superpixels. IJCV, 101(2):329–349, Jan 2013.

  10. Region-Based Parsing • Find a retrieval set of images similar to the query image. • Segment the query image into superpixels and compute feature vectors for each superpixel. • For each superpixel and each feature type, find the nearest-neighbor superpixels in the retrieval set. Compute a likelihood score for each class based on the superpixel matches • Use the computed likelihoods together with pairwise co-occurrence energies in an Markov Random Field (MRF) framework to compute a global labeling of the image.

  11. Region-Based Parsing • Matches are used to produce a log-likelihood ratio score for label c at region si. • Use this score to define our region-based data term ER for each pixel p and class c:

  12. Detector-Based Parsing [19] T. Malisiewicz, A. Gupta, and A. A. Efros. Ensemble of exemplarSVMs for object detection and beyond. In ICCV, 2011.

  13. Detector-Based Parsing

  14. Detector-Based Parsing

  15. Detector-Based Parsing

  16. Detector-Based Parsing • detector-based data term ED for a class c and pixel p. • simply take the sum of all detection masks from that class weighted by their detection scores:

  17. SVM Combination andMRF Smoothing • Test image, for each pixel p and each class c • Two data terms: ER( p, c) and ED( p, c) • Training data for each SVM is generated by running region- and detector-based parsing on the entire training set. • smooth the labels with an MRF energy function

  18. Experiments • Three challenging datasets: • SIFT Flow [18] • LM+SUN [27] • CamVid [5]

  19. Experiments

  20. Experiments

  21. Experiments • SIFT Flow

  22. Experiments • LM+SUN

  23. Experiments • CamVid

  24. Experiments

  25. Experiments

  26. Conclusions • We propose an image parsing system that integrates region-based cues with the promising novel framework of per-exemplar detectors. • Our current system achieves very promising results, but at a considerable computational cost. • Reducing this cost is an important future research direction.

More Related