170 likes | 284 Vues
Learning Hierarchical Models of Scenes, Objects and Parts. E. Sudderth et al. Introduction. Generative models for detection and recognition of objects in natural and cluttered scenes Take advantage of similarities between object categories which improves performance
 
                
                E N D
Learning Hierarchical Models of Scenes, Objects and Parts E. Sudderth et al.
Introduction • Generative models for detection and recognition of objects in natural and cluttered scenes • Take advantage of similarities between object categories which improves performance • Using contextual knowledge about objects found in a scene and common spatial relationships between those objects Scene object part features
The Generative Model Multinomial distribution Normal distribution
SIFT descriptors and K-means clustering used to create a visual words • Information is shared in 2 ways: • Parts combine the same features in different spatial configurations • Objects reuse the same parts in different proportions • The multinomial distributions have Dirichlet priors while the Gaussian distributions have the inverse-Wishart priors
Related Models • Similar to the author-topic model • Different from previous models because it incorporates x (the geometry or position of parts) solved using EM • Has capability of training from few examples and sharing of parts leads to simple learning algorithms which scale linearly with the number of parts
Learning Objects with shared parts 1) 2) Not considering the reference position 3) Total cost of Gibbs sampling update for M images containing N features and P parts is O(NMP)
r is unobserved and use EM to compute the mode of these parameters 1) E-step: get the expected rm 2) M-step: get the maximum likelihood estimates of the parameters EM updates applied after every Gibbs sampling operation where
Object Detection and Recognition • Computing the likelihood that the features in the test image t are generated by object category o • And is approximated as • S is samples • is the approximate modes of the posterior distribution • Without the reference position
Object Categorization Experiments • 16 categories – 7 animal faces, 5 animals and 4 vehicle types • 30 training examples from each of the 16 categories with 300 visual words, 32 shared parts • Alignment of images already done, so no need to infer reference image • Learning procedure was found not very sensitive to hyperparameter values.
Detection and Recognition • 100 training images to learn the background parts and use likelihood to classify test image as background or object • Compared shared model with unshared models • Also compared the model to only bag of words model • α affects the level of sharing • Large α increases sharing hence better detection while small α reduces sharing but slightly increases recognition • A good value for α = 1/P where P is number of parts
Single car Multiple cars problem • 72 images, 26 fully labeled, remaining labeled for cars • 40 images used for training, 6 shared parts Lighting problem
Conclusion • They described a hierarchical model for scenes, objects and parts • Showed importance of spatial information • Showed that sharing parts helps in learning from few examples and has performance benefits
References • Learning Hierarchical Models of Scenes, Objects, and Parts http://ssg.mit.edu/~esuddert/papers/iccv05.pdf • Slides of Fei-Fei