1 / 47

Semantic Parsing for Priming Object Detection in RGB-D Scenes

Semantic Parsing for Priming Object Detection in RGB-D Scenes. Cesar Cadena and Jana Kosecka. 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany ,2013. Motivation. Long-term robotic operation

ninon
Télécharger la présentation

Semantic Parsing for Priming Object Detection in RGB-D Scenes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Parsing for Priming Object Detection in RGB-D Scenes Cesar Cadena and Jana Kosecka • 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) • Karlsruhe, Germany ,2013

  2. Motivation • Long-term robotic operation • The semantic information about the surrounding environment is important for high level robotic tasks. • It is difficult to know a priori all the possible instances or classesof objects that the robot will find in a real operation. • Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time. Semantic Parsing for Priming Object Detection in RGB-D Scenes

  3. Motivation • Long-term robotic operation • The semantic information about the surrounding environment is important for high level robotic tasks. • It is difficult to know a priori all the possible instances or classesof objects that the robot will find in a real operation. • Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time. Semantic Parsing for Priming Object Detection in RGB-D Scenes

  4. Motivation • Long-term robotic operation • The semantic information about the surrounding environment is important for high level robotic tasks. • It is difficult to know a priori all the possible instances or classesof objects that the robot will find in a real operation. • Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time. Semantic Parsing for Priming Object Detection in RGB-D Scenes

  5. Motivation • Long-term robotic operation • The semantic information about the surrounding environment is important for high level robotic tasks. • It is difficult to know a priori all the possible instances or classesof objects that the robot will find in a real operation. • Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time. Semantic Parsing for Priming Object Detection in RGB-D Scenes

  6. Motivation • However: • There are things we can assume to be present (almost) always • Generic “detachable” objects also share some characteristics Urban: Ground Buildings SkyObjects Indoors: Ground Walls CeilingObjects Today: Ground – Structure – Furniture – Props Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors Semantic Parsing for Priming Object Detection in RGB-D Scenes

  7. Motivation • However: • There are things we can assume to be present (almost) always • Generic “detachable” objects also share some characteristics Urban: Ground Buildings SkyObjects Indoors: Ground Walls CeilingObjects Today: Ground – Structure – Furniture – Props Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors Semantic Parsing for Priming Object Detection in RGB-D Scenes

  8. Motivation • However: • There are things we can assume to be present (almost) always • Generic “detachable” objects also share some characteristics Urban: Ground Buildings SkyObjects Indoors: Ground Walls CeilingObjects Today: Ground – Structure – Furniture – Props Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors Semantic Parsing for Priming Object Detection in RGB-D Scenes

  9. Motivation • However: • There are things we can assume to be present (almost) always • Generic “detachable” objects also share some characteristics Urban: Ground Buildings SkyObjects Indoors: Ground Walls CeilingObjects Today: Ground – Structure – Furniture – Props Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors Semantic Parsing for Priming Object Detection in RGB-D Scenes

  10. Our Problem • However: • There are things we can assume to be present (almost) always • Generic “detachable” objects also share some characteristics Urban: Ground Buildings SkyObjects Indoors: Ground Walls CeilingObjects Today: Ground – Structure – Furniture – Props Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors Semantic Parsing for Priming Object Detection in RGB-D Scenes

  11. Our Problem • However: • There are things we can assume to be present (almost) always • Generic “detachable” objects also share some characteristics Urban: Ground Buildings SkyObjects Indoors: Ground Walls CeilingObjects Today: Ground – Structure – Furniture – Props Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors Semantic Parsing for Priming Object Detection in RGB-D Scenes

  12. NYU Depth v2 N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, Indoor segmentation and support inference from RGBD images, in ECCV, 2012. • 1449 labeled frames. • 26 scenes classes. • Labeling spans over 894 different classes. Thanks to N. Silberman for proving the mapping 894 to 4 classes. Semantic Parsing for Priming Object Detection in RGB-D Scenes

  13. The System Semantic Segmentation MAP Marginals Semantic Parsing for Priming Object Detection in RGB-D Scenes

  14. Different approaches • N. Silberman et al. ECCV 2012 • C. Couprie et al. CoRR 2013 • X. Ren et al. CVPR 2012 • D. Munoz et al. ECCV 2010 • I. Endres and D. Hoeim, ECCV 2010 Semantic Segmentation MAP Marginals • They have at least one: • Expensive over-segmentation • Expensive features • Expensive Inference Semantic Parsing for Priming Object Detection in RGB-D Scenes

  15. Our approach Semantic Segmentation MAP Marginals Conditional Random Fields Graph Structure Preprocessing Inference Potentials Semantic Parsing for Priming Object Detection in RGB-D Scenes

  16. Outline MAP Marginals (5) Results Conditional Random Fields (6) Conclusions Graph Structure (2) Preprocessing Inference Potentials (3) (1) (4) Semantic Parsing for Priming Object Detection in RGB-D Scenes

  17. Preprocessing: Over-segmentation SLIC superpixels R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk, SLIC superpixels compared to state-of-the-art superpixel methods, PAMI, 2012. Semantic Parsing for Priming Object Detection in RGB-D Scenes

  18. Graph Structure Classical choice on images Semantic Parsing for Priming Object Detection in RGB-D Scenes

  19. Graph Structure: Our choice Minimum Spanning Tree Over 3D Semantic Parsing for Priming Object Detection in RGB-D Scenes

  20. Graph Structure: Our choice Minimum Spanning Tree Over 3D Semantic Parsing for Priming Object Detection in RGB-D Scenes

  21. Potentials: Pairwise CRFs Semantic Parsing for Priming Object Detection in RGB-D Scenes

  22. Potentials: Pairwise CRFs Semantic Parsing for Priming Object Detection in RGB-D Scenes

  23. Potentials: Pairwise CRFs Semantic Parsing for Priming Object Detection in RGB-D Scenes

  24. Potentials: unary frequency of label j in a k-NN query frequency of label j the database J. Tighe and S. Lazebnik, Superparsing: Scalable nonparametric image parsing with superpixels, ECCV 2010. The database is a kd-tree of features from training data Semantic Parsing for Priming Object Detection in RGB-D Scenes

  25. Features 12D • From Image: • mean of Lab color space 3D • vertical pixel location 1D • entropy from vanishing points 1D • From 3D • height and depth 2D • mean and std of differences on depth 2D • local planarity 1D • neighboring planarity 1D • vertical orientation 1D Semantic Parsing for Priming Object Detection in RGB-D Scenes

  26. Features • From Image: • entropy from vanishing points Semantic Parsing for Priming Object Detection in RGB-D Scenes

  27. Features • From 3D • mean and std of differences on depth Semantic Parsing for Priming Object Detection in RGB-D Scenes

  28. Features • From 3D • mean and std of differences on depth Semantic Parsing for Priming Object Detection in RGB-D Scenes

  29. Features • From 3D • mean and std of differences on depth • local planarity • neighboring planarity • vertical orientation Semantic Parsing for Priming Object Detection in RGB-D Scenes

  30. Potentials: pairwise Lab color Semantic Parsing for Priming Object Detection in RGB-D Scenes

  31. Inference • We use belief propagation: • Exact results in MAP/marginals • Efficient computation, in Thanks to our graph structure choice! Semantic Parsing for Priming Object Detection in RGB-D Scenes

  32. Results: NYU-D v2 Dataset GT MAP Semantic Parsing for Priming Object Detection in RGB-D Scenes

  33. Results: NYU-D v2 Dataset • Confusion matrix: • Comparisons: Semantic Parsing for Priming Object Detection in RGB-D Scenes

  34. Results: NYU-D v2 Dataset • Confusion matrix: • Comparisons: Semantic Parsing for Priming Object Detection in RGB-D Scenes

  35. Results: NYU-D v2 Dataset • Some failures: GT MAP Semantic Parsing for Priming Object Detection in RGB-D Scenes

  36. Results: NYU-D v2 Dataset Semantic Parsing for Priming Object Detection in RGB-D Scenes

  37. Marginal probabilities • Provide very useful information for specific tasks, e.g. : • Specific object detection • Support inference P(Ground) P(Structure) P(Furniture) P(Props) Semantic Parsing for Priming Object Detection in RGB-D Scenes

  38. Conclusions • We have presented a computational efficient approach for semantic segmentation of priming objects in indoors. • Our approach effectively uses 3D and Images cues. Depth discontinuities are evidence for occlusions • The MST over 3D keeps intra-class components coherently connected. Semantic Parsing for Priming Object Detection in RGB-D Scenes

  39. Discussion • Features: • Local classifier: • Graph structure Silberman et al. 2012 Couprie et al. 2013 Ours. Bunch of engineered features (>1000D) Learned features (>1000D) Select meaningful features (12D) Logistic Regression Neural Networks k-NN Dense Connections Image None MST over 3D Semantic Parsing for Priming Object Detection in RGB-D Scenes

  40. Thanks!! Cesar Cadenaccadenal@gmu.edu Jana Koseckakosecka@.cs.gmu.edu Funded by the US Army Research Office Grant W911NF-1110476. Semantic Parsing for Priming Object Detection in RGB-D Scenes

  41. Working on: • People detection by Shenghui Zhou Semantic Parsing for Priming Object Detection in RGB-D Scenes

  42. Multi-view and video: Semantic Parsing for Priming Object Detection in RGB-D Scenes

  43. Multi-view and video: Semantic Parsing for Priming Object Detection in RGB-D Scenes

  44. Multi-view and video: Semantic Parsing for Priming Object Detection in RGB-D Scenes

  45. Multi-view and video: Semantic Parsing for Priming Object Detection in RGB-D Scenes

  46. Multi-view and video: Semantic Parsing for Priming Object Detection in RGB-D Scenes

  47. Multi-view and video: Semantic Parsing for Priming Object Detection in RGB-D Scenes

More Related