Object Recognition

Object Recognition Jeremy Wyatt

Plan • David Marr: the model based approach to vision • Model based approaches: Geons, Model Fitting • Appearance based approaches: PCA, SIFT, implicit shape model • Psychological Evidence: View dependent vs. view independent recognition • Summary: who is right?

Model based vision • David Marr was a brilliant young British vision researcher who defined a coherent approach to the study of vision during the 1970s • According to one tradition coming out of Marr’s work: • Vision is process of reconstructing the 3d scene from 2d information • The vision system has representations of 3d geometric structures • Visual pipeline • So selecting models and recovering their parameters from image data is a key task in vision Intensity image Primal sketch 2.5d sketch Model selection

Model based vision • There is an infinite variety of objects. How do we represent, store and access models of them efficiently? • One suggestion was the use of a small library of 3d parts from which many complex models can be constructed • There are many schemes: generalised cylinders, Geons, Superquadrics • Vision researchers set about applying them

Models vs Appearances • But they didn’t work very well … • By the early 1990s people were experimenting with statistical techniques, e.g. PCA • These learn a statistical summary of the appearance of each view of an object Appearance Model

Appearance based recognition: SIFT • These statistical approaches characterise some aspects of the appearance of an object that can be used to recognise it • But this means they are (largely) view dependent, you have to learn a different statistical model for each different view • e.g. SIFT based recognition (David Lowe, UBC) • Find interest points in the scale space • Re-describe the interest points so that they are robust to: • Image translation, scaling, rotation • Partially invariant to illumination changes, affine and 3d projection changes

Category level recognition (Thanks to Bastian Liebe)

Constellation model (Thanks to Bastian Liebe)

Constellation Model (Thanks to Bastian Liebe)

Implicit Shape Model (Thanks to Bastian Liebe)

Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts Aleš Leonardis and Sanja Fidler University of Ljubljana Faculty of Computer and Information Science Visual Cognitive Systems Laboratory Reproduced with permission

Framework • Main properties of the framework: • Computational plausibility • Hierarchical representation • Compositionality (parts composed of parts) • Indexing & matching recognition scheme • Statistics driven learning (unsupervised learning) • Fast, incremental (continuous) learning

Recognition: Indexing and matching motorcycle dog person car hypotheses verification image LEARN Gradually limiting the search

Overview of the architecture • Starts with simple, local features and learns more and more complex compositions • Learns layer after layer to exploit the regularities in natural images as efficiently and compactly as possible • Builds computationally feasible layers of parts by selecting only the most statistically significant compositions of specific granularity • Learns lower layers in a category independent way (to obtain optimally sharable parts) and category specific higher layers which contain only a small number of highly generalizable parts for each category • New categories can efficiently and continuously be added to the representation without the need to restructure the complete hierarchy • Implements parts in a robust, layered interplay of indexing & matching

Part based appearance recognition (Fidler & Leonardis 07)

Results • Learned hierarchy for faces and cars (first three layers are the same; links show compositionality for each of the categories; spatial variability of parts is not shown)

Part based appearance recognition (Fidler & Leonardis 07)

Results - Detections

Results - Specific categories, faces • Detection of Layer5 parts

Results - Specific categories, faces

Evidence from biology • Is human object recognition view dependent? • Shepherd & Miller • Pinker & Tarr • There is a quite a large body of experimental data that supports the view dependent camp. • Appearance based approaches fit neatly with this camp.

Summary • This is not a resolved debate • There is evidence for both sides • Structural 3d information is almost certainly extracted by the brain too • Model based: how do we extract good enough low level features (e.g. a depth map)? • Appearance based: only seems to be good for recognition, which is a small part of the vision problem.

Object Recognition