1 / 101

Probabilistic Models for Parsing Images

Probabilistic Models for Parsing Images. Xiaofeng Ren University of California, Berkeley. Water. back. Grass. Tiger. Tiger. Sand. head. eye. legs. tail. mouse. shadow. Parsing Images. outdoor wildlife. Objects & Scenes. Pixels & Pixel Features. Contours & Regions.

kisha
Télécharger la présentation

Probabilistic Models for Parsing Images

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic ModelsforParsing Images Xiaofeng Ren University of California, Berkeley

  2. Water back Grass Tiger Tiger Sand head eye legs tail mouse shadow Parsing Images outdoor wildlife

  3. Objects & Scenes Pixels & Pixel Features Contours & Regions Water Grass Tiger Sand Mid-level High-level Low-level Image Processing Perceptual Organization Recognition A Classical View of Visual Processing

  4. Contours & Regions Objects & Scenes Pixels Mid-level High-level Low-level Perceptual Organization Image Processing Recognition Models for Parsing Images A unified framework incorporating all levels of abstraction

  5. Labels Pixels Probabilistic Models for Images • Markov Random Fields [Geman & Geman 84] • Image restoration • Edge detection • Texture synthesis • Segmentation • Super-resolution • Contour completion ……… Empirical evidence against pixel-based MRF [Ren & Malik 02] very limited representational power

  6. Where is Structure? Our perception of structure is disrupted. We cannot efficiently reason about structure if we cannot represent it.

  7. Outline • Parsing Images • Building a Mid-level Representation • Probabilistic Models for Mid-level Vision • Contour Completion • Figure/Ground Organization • Combining Mid- and High-level Vision • Object Segmentation • Finding People • Conclusion & Future Work

  8. Outline • Parsing Images • Building a Mid-level Representation • Probabilistic Models for Mid-level Vision • Contour Completion • Figure/Ground Organization • Combining Mid- and High-level Vision • Object Segmentation • Finding People • Conclusion & Future Work

  9. Local Edge Detection • Use the Pb (probability of boundary) edge detector: combining local brightness, texture and color contrasts.

  10. Piece-wise Linear Approximation • Recursively split the boundaries (using angles) until each piece is approximately straight

  11. Constrained Delaunay Triangulation (CDT) • A variant of the standard Delaunay Triangulation • Keeps a given set of edges in the triangulation • Widely used in geometric modeling and finite elements.

  12. Scale Invariance of CDT

  13. millions of pixels •  1000 edges • fast to compute • scale-invariant • completes gaps • little loss of structure • longer ranges of • interaction Pixels Superpixels The CDT Graph: Summary [Ren & Malik; ICCV 2003] [Ren, Fowlkes & Malik; ICCV 2005] Principle of Uniform Connectedness: use homogenous regions as entry-level units in perceptual organization. [Palmer and Rock 94]

  14. Objects & Scenes Objects & Scenes Sentences & Paragraphs Contours & Regions Contours & Regions Phrases Words Superpixels Pixels Pixels Letters Analogy with Natural Language Parsing

  15. Outline • Parsing Images • Building a Mid-level Representation • Probabilistic Models for Mid-level Vision • Contour Completion • Figure/Ground Organization • Combining Mid- and High-level Vision • Object Segmentation • Finding People • Conclusion & Future Work

  16. Figure/ground organization Curvilinear grouping Region segmentation Mid-level Vision • It is not low-level vision (which can be computed independently in a local neighborhood). • It is not high-level vision (which assumes knowledge of particular object categories & scenes). • Problems in mid-level vision

  17. Mid-level Vision • Problems in mid-level vision Figure/ground organization Curvilinear grouping Region segmentation

  18. Good continuation Visual completion Illusory contours Curvilinear Grouping • Boundaries are smooth in nature! • A number of associated visual phenomena

  19. Beyond Local Edge Detection • There is psychophysical evidence that we are approaching the limit of local edge detection • Smoothness of boundaries in natural images provides an important contextual cue.

  20. Random Field: Xe Xe Xe Xe Xe Xe Xe Xe Xe Xe Xe Xe which defines a joint probability distribution on all {Xe} Xe Xe Xe Xe Xe Xe Inference on the CDT Graph Estimate the marginal P(Xe) Xe{0,1} 1: boundary 0: non-boundary

  21. Edge potentialsexp(ii) Junction potentialsexp(jj) where Conditional Random Fields (CRF) X={X1,X2,…,Xm} [Pietra, Pietra & Lafferty 97] [Lafferty, McCallum & Pereira 01] Undirected graphical model with potential functions in the exponential family

  22. Edge Potential: Local Contrast potentialsexp(ii) = average contrast on each edge e

  23. Xe Xe Xe 0 0 1 1 0 1 1 1 0 0 0 1 deg=0 (no lines) deg=1 (line ending) deg=2 (continuation) deg=3 (T-junction) Junction Potential: Degree The degree of the junction depends on the assignments of {Xe} potentialsexp(jj) j= ( deg=j)

  24. 1 1 0 deg=2 (continuation) Junction Potential: Continuity = g()·( deg=2 )

  25. 2.46 0.87 1.14 0.01 Learning the Parameters Compare to [Geman and Geman 84] mid-level representation + probabilistic framework + large annotated datasets

  26. Precision Recall matched pairs Precision = High threshold; few detections total detections matched pairs Low threshold; lots of detections Recall = total groundtruth Evaluation: Precision vs Recall match to groundtruth

  27. Horse dataset of [Borenstein and Ullman 02], 175 images training, 175 testing Curvilinear grouping improves boundary detection, both for low-recall and high-recall “Mid-level vision is useful” [Ren, Fowlkes & Malik; ICCV 2005]

  28. Image Pb CRF

  29. Image Pb CRF

  30. Mid-level Vision • Problems in mid-level vision Figure/ground organization Curvilinear grouping Region segmentation

  31. Mid-level Vision • Problems in mid-level vision Figure/ground organization Curvilinear grouping Region segmentation

  32. Ground (shapeless) Figure (face) Figure (Goblet) Ground (Shapeless) Figure/Ground Organization • A contour belongs to one of the two (but not both) abutting regions. Important for the perception of shape

  33. Xe Xe Xe Xe Xe Xe Xe Xe Xe Xe Xe Xe Xe Xe Xe Xe Xe Xe Inference on the CDT Graph Local Model: Convexity, Parallelism,… Global Model: Consistency at T-junctions Xe{-1,1} 1: Left is Figure -1: Right is Figure

  34. Results Using human segmentations [Ren, Fowlkes & Malik; ECCV 2006]

  35. Objects & Scenes Labels {Xe} Water Contours & Regions Grass Tiger Sand Superpixels Pixels Models for Contour Labeling Curvilinear Grouping Figure/Ground Assignment CRF

  36. CSP > : contour direction + : convex edge - : concave edge possible junctions (constraints) Line Labeling • Reviving the old tradition with modern technologies, for more realistic applications [Clowes 1971, Huffman 1971; Waltz 1972]

  37. Objects & Scenes Water Contours & Regions Grass Tiger Sand Superpixels Pixels Parsing Images • Add region-based variables and cues • Joint contour and region inference • Add high-level knowledge (objects)

  38. Object Segmentation Object-specific cues: • Shape • Region support • Color/Texture …

  39. Z Encoding location, scale, pose, etc. Xe Xe Yt Yt Xe Xe Xe Xe Xe Xe Yt Yt Xe Xe Yt Yt Xe Xe Xe Xe Yt Yt Yt Yt Xe Xe Xe Xe Xe Xe Xe Xe Xe Xe Yt Yt Xe Xe Yt Yt Yt Yt Xe Xe Yt Yt Xe Xe Xe Xe Xe Xe Yt Yt Xe Xe Inference on the CDT Graph Z Contour variables {Xe} Region variables {Yt} Object variable {Z} Integrating {Xe},{Yt} and{Z}: low/mid/high-level cues

  40. Grouping Cues • Low-level Cues • Edge energy along edge e • Brightness/texture similarity between two regions s and t • Mid-level Cues • Edge collinearity and junction frequency at vertex V • Consistency between edge e and two adjoining regions s and t • High-level Cues • Texture similarity of region t to exemplars • Compatibility of region support with pose • Compatibility of local edge shape with pose L1(Xe|I) L2(Ys,Yt|I) M1(XV|I) M2(Xe,Ys,Yt) H1(Yt|I) H2(Yt,Z|I) H3(Xe,Z|I)

  41. Cue Integration in CRF Estimate the marginal posteriors of X, Y and Z

  42. Object knowledge helps a lot Mid-level Cues still useful [Ren, Fowlkes & Malik; NIPS 2005]

  43. Input Input Pb Output Contour Output Figure

  44. Input Input Pb Output Contour Output Figure

  45. Finding People The challenges: • Pose articulation + self-occlusion • Clothing • Lighting • Clutter ……

  46. Objects & Scenes Objects & Scenes Contours & Regions Superpixels Pixels Pixels Finding People: Top-Down Top-down approaches • 3D model-based fails most of the time • 2D template-based needs lots of training data

  47. Objects & Scenes Objects & Scenes Objects & Scenes Contours & Regions Contours & Regions Superpixels Superpixels Superpixels Pixels Pixels Pixels Pixels Finding People: Bottom-Up

  48. [Ren, Berg & Malik; ICCV 2005]

More Related