1 / 15

Depth Estimation via Scene Classification

Vladimir Nedović. Depth Estimation via Scene Classification. vnedovic@science.uva.nl. with: Arnold Smeulders & Jan-Mark Geusebroek (UvA) André Redert (Philips Research). 28-05-2008. Order in Pollock's Chaos.

Télécharger la présentation

Depth Estimation via Scene Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Vladimir Nedović Depth Estimation via Scene Classification vnedovic@science.uva.nl with: Arnold Smeulders & Jan-Mark Geusebroek (UvA) André Redert (Philips Research) 28-05-2008

  2. Order in Pollock's Chaos R.P. Taylor, A.P. Micolich and D. Jonas, Fractal Analysis Of Pollock's Drip Paintings, Nature, vol. 399, p.422 (1999) Jackson Pollock, Blue Poles: Number 1, 1952 Pre-perspective (Gothic art, before 1430) Know any tilted buildings? Simone Martini (1285-1344) Post-perspective (Quattrocento, after 1430) W. Richards, A. Jepson and J. Feldman, Priors, Preferences and Categorical Percepts, in Perception as Bayesian Inference, pp. 80-111, 1996. Sandro Botticelli, Annunciation, 1489-90 seems chaotic, but there is structure - same as in natural image statistics viewpoint constraints understood, influence on film art ‘modal’ scene configurations – structures orthogonalto each other

  3. Outline Introduction Related work Our approach Preliminary classification Conclusions

  4. Introduction The context: fully automatic 2D to 3D conversion of video data for 3DTV • We know about stereo, structure from motion, etc. but can we also derive depth from a single image? • humans can, right? • Can we exploit some constraints? • is the data really chaotic? • what about perceptual limitations of viewers? GOAL: in a fast manner, obtain an approximate, but visually pleasing 3D model from a single image

  5. Related work • Related work (1): Torralba & Oliva • showed that depth can be derived from structure, itself derived from natural image statistics (IEEE PAMI 2001) • Related work (2): Hoiem (Carnegie Melon Univ.) • obtained 3D orientation of scene surfaces using machine learning (ICCV 2005) • improved object detection (CVPR 2006 best paper) + accounted for occlusions to derive relative ordering of elements (ICCV 2007) • BUT: • outdoor images only + assumes sky&ground are always present • i.e. accounts for less than half of all possibilities • Related work (3): Saxena (Stanford Univ.) • 3D mesh from ML on low-level features (no classes)

  6. stage • Separate a visual scene into its two constituent elements: • consider objects separately from the stage on which they act object Our approach Our approach: depth estimation via geometric scene classification • i.e. holistic, not pixel-based Determine the 3D stage model first • Stage ≈ first approximation of global depth • reduces subsequent (finer) depth processing tasks • can guide other processes, e.g. object localization & recognition V. Nedović et al. ICCV2007

  7. Our approach- stage models - For the stage, a rough depth model is sufficient • regularities arise from: • natural image statistics -> texture gradients • viewpoint constraints -> perspective • modal configurations & film rules -> orthogonality Exploit geometric structure of images, which reduces the number of possible configurations Only a few configurations are prominent => the first step in depth estimation can be stage classification

  8. Our approach- stage hierarchy - • Structure of the visual world leads to only 15 geometric scene types • Influence of structure identical indoors & outdoors => such distinction unnecessary • Three-level hierarchy • perform classification in steps: first determine the geometric neighbourhood, then proceed further

  9. i.e. 2-3 sub-stages per each stage accounting for variability in parameters • geometry at bottom so constrained that pre-defined crude depth maps already possible i.e. no parameter estimation needed! Our approach- three-level hierarchy -

  10. TRECVID dataset of TV news used for evaluation • Features extracted based on a 4x4 region grid over the image • two features per region => 64 features in total A.F. Smeaton et al. “Evaluation campaigns and TRECVid”, 8th ACM Int’l Workshop on Multimedia Info. Retrieval, 2006. Preliminary classification (1) • Proof of concept with a single feature type • natural image statistics-based Weibull features (i.e. texture gradients)

  11. stage groups individual stages (results of symmetrical variants combined) • two-step classification, average within group (assuming super-stage is known) Preliminary classification (2) • Support Vector Machines (SVM) classifier based on a 1 vs. 1 multi-class approach

  12. Conclusions (1) • We need a fast & approximate solution: • do only what is necessary, viewers may not perceive it anyway • generalize where possible, to reduce the problem at every step • Separate a scene into a stage and the objects • Determine the stage 3D model first • rough model is sufficient • plus, structure greatly reduces the number of possible configurations • and, stage will help us to locate and process objects

  13. Conclusions (2) • Due to structure, we can create simple models that fit TV data • 15 stages is sufficient • no need to distinguish between indoor & outdoor • Therefore, we can use scene classification as the first step in depth estimation

  14. Conclusions (3) • Our approach: three-step classification • geometry at the bottom constrained enough, so we can already assign pre-defined depth maps • no parameter estimation necessary • Proof of concept demonstrated with a single feature type • performance much better than chance • but enhancements needed (more features etc.)

  15. Questions?

More Related