1 / 22

Extracting Simple Verb Frames from Images

Extracting Simple Verb Frames from Images. Toward Holistic Scene Understanding Prof. Daphne Koller Research Group Stanford University Geremy Heitz DARPA CLLR Workshop December 2, 2008. Grand Goal: Scene Understanding. Cigarette. Backpack. Man. Dog. “ A cow walking through the grass

opa
Télécharger la présentation

Extracting Simple Verb Frames from Images

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extracting Simple Verb Framesfrom Images Toward Holistic Scene Understanding Prof. Daphne Koller Research Group Stanford University Geremy Heitz DARPA CLLR Workshop December 2, 2008

  2. Grand Goal: Scene Understanding Cigarette Backpack Man Dog “A cow walking through the grass on a pasture by the sea” “man wearing a backpack, smoking a cigarette, walking a dog on a sidewalk”

  3. Understanding Verb Frames • Primitives • Objects • Parts • Surfaces • Regions • Interactions • Context • Actions Methods exist to extract these, but we need to both do a better job, and get them all at once “a man is walkingon a sidewalk” Frame: to walk “a dog is walkingon a sidewalk” Man Building Cigarette Modeling verb frames requires understanding the interactions between primitives, and which fit well into the framework of graphical models. Backpack Dog Sidewalk

  4. Outline • Extracting the Primitives • Qualitative 3D Scene Layout • Modeling Relationships • Learning Frames • Refined Characterization of Objects

  5. Computer View of a “Scene” BUILDING ROAD STREETSCENE

  6. Object Detection Detection Window W = Car = Person = Motorcycle = Boat = Sheep = Cow Score(W) > 0.5

  7. Finding the Primitives Jointly SKY GRASS SEASIDEPASTURE Grass = FlatSky = FarFG = Vertical 40% Grass,30% Sky… 1 cow, 2 boats… [Heitz et al., NIPS 2008a]

  8. Results – TAS Model Contextual Detector Base Detector [Heitz et al., ECCV 2008]

  9. Qualitative 3D Scene Layout Primitives imply a certain 3D layout of the scene, absolute depth may not be preserved For example: Sky is a far, vertical plane Water, road are horizontal planes Objects “popup” from the image

  10. Modeling Relationships • We have explored how to model 2D relationships • We should be able to extend this to 3D relationships [Heitz et al., ECCV 2008] [Gould et al., IJCV 2008] Beside In front of On

  11. Outline • Extracting the Primitives • Qualitative 3D Scene Layout • Modeling Relationships • Learning Frames • Refined Characterization of Objects

  12. Learning Semantics: Verb Frames The [S][V] the [O]. [S],[O] CAR ROAD COW GRASS PERSON APPLE … [V] WALKS ON EATS DRIVES ON JUMPS OVER THROWS … Given primitives, rough layout, and relationships Let’s learn subjects, verb, and objects for frames:

  13. TheCARDRIVES ON the ROAD

  14. Refined Characterization We need to know that the white stick is a cigarette… and where the man’s mouth is… in order to determine that he’s smoking.

  15. Refined Object Characterization Set of “keypoint” landmarks Outline shape defined by connecting contour [Heitz et al., NIPS 2008b, IJCV in submission]

  16. Results Rhino Giraffe Llama

  17. Mammals Running Standing Eating Standing [Heitz et al., NIPS 2008b, IJCV in submission]

  18. Activity Recognition Drinking Eating 1) Localize the landmarks of the cow, including the head. Grass Eating Cow 2) Extract histogram of “stuff” in a window around the head landmark 3) Make a decision

  19. Activity Recognition with People Running Walking Standing Hitting • Pose of person is one of the important factors • Also need to recognize objects person interacts with

  20. How far can we take this? Front legs off ground = Jumping Apple near mouth = Eating Ball near hands = Throwing

  21. Does phased learning help? Cartoon/Caricature Exaggerates the most salient features of the object class. Simple BG Real object with no confusing clutter. Cluttered BG Object in standard pose on natural background. Articulated Once we have built a strong appearance model, can we learn complicated articulations?

  22. Our Related Papers • G. Elidan, B. Packer, G. Heitz, and D. Koller. Convex Point Estimation using Undirected Bayesian Transfer Hierarchies. UAI, 2008. • S. Gould, J. Rodgers, D. Cohen, G. Elidan, and D. Koller. Multi-Class Segmentation with Relative Location Prior.IJCV, 2008. • S. Gould, P. Baumstarck, M. Quigley, A. Ng, and D. Koller. Integrating Visual and Range Data for Robotic Object Detection.ECCV Workshop M2SFA2, 2008. • G. Heitz and D. Koller. Learning Spatial Context: Using Stuff to Find Things.ECCV, 2008. • G. Heitz, S. Gould, A. Saxena, and D. Koller. Cascaded Classification Models: Combining Models for Holistic Scene Understanding.NIPS, 2008. • G. Heitz, G. Elidan, B. Packer, and D. Koller. Shape-based Object Localization for Descriptive Classification. NIPS, 2008.

More Related