330 likes | 627 Vues
Object Recognition. Jeremy Wyatt. Plan. David Marr: the model based approach to vision Model based approaches: Geons, Model Fitting Appearance based approaches: PCA, SIFT, implicit shape model Psychological Evidence: View dependent vs. view independent recognition Summary: who is right?.
E N D
Object Recognition Jeremy Wyatt
Plan • David Marr: the model based approach to vision • Model based approaches: Geons, Model Fitting • Appearance based approaches: PCA, SIFT, implicit shape model • Psychological Evidence: View dependent vs. view independent recognition • Summary: who is right?
Model based vision • David Marr was a brilliant young British vision researcher who defined a coherent approach to the study of vision during the 1970s • According to one tradition coming out of Marr’s work: • Vision is process of reconstructing the 3d scene from 2d information • The vision system has representations of 3d geometric structures • Visual pipeline • So selecting models and recovering their parameters from image data is a key task in vision Intensity image Primal sketch 2.5d sketch Model selection
Model based vision • There is an infinite variety of objects. How do we represent, store and access models of them efficiently? • One suggestion was the use of a small library of 3d parts from which many complex models can be constructed • There are many schemes: generalised cylinders, Geons, Superquadrics • Vision researchers set about applying them
Models vs Appearances • But they didn’t work very well … • By the early 1990s people were experimenting with statistical techniques, e.g. PCA • These learn a statistical summary of the appearance of each view of an object Appearance Model
Appearance based recognition: SIFT • These statistical approaches characterise some aspects of the appearance of an object that can be used to recognise it • But this means they are (largely) view dependent, you have to learn a different statistical model for each different view • e.g. SIFT based recognition (David Lowe, UBC) • Find interest points in the scale space • Re-describe the interest points so that they are robust to: • Image translation, scaling, rotation • Partially invariant to illumination changes, affine and 3d projection changes
Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts Aleš Leonardis and Sanja Fidler University of Ljubljana Faculty of Computer and Information Science Visual Cognitive Systems Laboratory Reproduced with permission
Framework • Main properties of the framework: • Computational plausibility • Hierarchical representation • Compositionality (parts composed of parts) • Indexing & matching recognition scheme • Statistics driven learning (unsupervised learning) • Fast, incremental (continuous) learning
Recognition: Indexing and matching motorcycle dog person car hypotheses verification image LEARN Gradually limiting the search
Overview of the architecture • Starts with simple, local features and learns more and more complex compositions • Learns layer after layer to exploit the regularities in natural images as efficiently and compactly as possible • Builds computationally feasible layers of parts by selecting only the most statistically significant compositions of specific granularity • Learns lower layers in a category independent way (to obtain optimally sharable parts) and category specific higher layers which contain only a small number of highly generalizable parts for each category • New categories can efficiently and continuously be added to the representation without the need to restructure the complete hierarchy • Implements parts in a robust, layered interplay of indexing & matching
Results • Learned hierarchy for faces and cars (first three layers are the same; links show compositionality for each of the categories; spatial variability of parts is not shown)
Results - Specific categories, faces • Detection of Layer5 parts
Evidence from biology • Is human object recognition view dependent? • Shepherd & Miller • Pinker & Tarr • There is a quite a large body of experimental data that supports the view dependent camp. • Appearance based approaches fit neatly with this camp.
Summary • This is not a resolved debate • There is evidence for both sides • Structural 3d information is almost certainly extracted by the brain too • Model based: how do we extract good enough low level features (e.g. a depth map)? • Appearance based: only seems to be good for recognition, which is a small part of the vision problem.