1 / 1

Discovering Objects and their Location in Images

0. 1. 0. 2. Discovering Objects and their Location in Images. Josef Sivic 1 , Bryan C. Russell 2 , Alexei A. Efros 3 , Andrew Zisserman 1 and William T. Freeman 2. CMU. CMU. MIT. MIT. 1 Oxford University 2 MIT 3 Carnegie Mellon University. Introduction.

Télécharger la présentation

Discovering Objects and their Location in Images

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 0 1 0 2 ... Discovering Objects and their Location in Images Josef Sivic1, Bryan C. Russell2, Alexei A. Efros3, Andrew Zisserman1 and William T. Freeman2 CMU CMU MIT MIT 1Oxford University 2MIT 3Carnegie Mellon University Introduction The topic discovery models Improving localization using doublets Form a new vocabulary from pairs oflocally co-occurring regions Goal:Discover visual object categories and their segmentation given a collection ofunlabelled images Probabilistic Latent Semantic Analysis (pLSA) [Hofmann’99] w … visual words d … documents (images) z … topics (‘objects’) Doublet formation Doublet example I Doublet examle II pLSA graphical model P(z|d) and P(w|z) are multinomial distributions pLSA Model fitting: Find topic vectors P(w|z) common to all documents and mixture coefficients P(z|d)specific to each document.Fit model by maximizing likelihood of data using EM. All detected visual words Singlet segmentation Singlet segmentation Doublet segmentation Bounding box overlap score for faces, singlets: 0.49, doublets: 0.61 Latent Dirichlet Allocation (LDA) [Blei et al.’03] (see paper for more details) Approach:1) Represent an image as a collection of visual words 2) Apply topic discovery models from statistical text analysis Treat multinomial weights over topics as random variables. Fit model using Gibbs sampling [Griffiths and Steyvers’04]. Experiment II: MIT dataset • 2873 images, learn 10 topics Image representation 4 of the 10 learned topics shown by the 5 most probable images for each topic Represent an image as a histogram of “visual words” LDA graphical model Results “Buildings” “Trees / Grass” Results shown only for pLSA. LDA had very similar performance. Experiment I: Caltech Dataset Histogram of visual words Four object categories: faces, motorbikes, airplanes and cars rear (total of 3,190images) and 900 background images “Computers” “Bookshelves” • Detect affine covariant regions • Represent each region by a SIFT descriptor • Build visual vocabulary by k-means clustering (K~1,000) • Assign each region to the nearest cluster centre Image Classification Example Images with multiple objects Mikolajczyk and Schmid’02, Schaffalitzky and Zisserman’02, Matas et al. ’02, Lowe’99, Sivic and Zisserman’03 Assign each image to a topic with the highest P(z|d) Learn K = (5,6,7) topics Background is better modelled by multiple topics Pre-learning background topics on a separate bg dataset improves results Performance on novel images is comparable with weakly supervised method of [Fergus et al.’03] Examples of visual words Confusion tables (K=5,6,7) learned topics Experiment III: Application to image retrieval Segmentation Learn topic vectors on Caltech database Represent new query image in terms of learned topic vectors Five samples from a ‘motorbike’ visual word Visual Polysemy. Single visual word occurring on different (but locally similar) parts on different object categories. For a given word wi in document dj examine posterior probability over topics. pLSA Retrieve images within Caltech database Visual words colour coded according to the topic with the highest probability Raw word histograms Faces Five samples from an ‘airplane’visual word Visual Synonyms. Two different visual words representing a similar part of an object (wheel of a motorbike). Query image Motorbikes Airplanes Precision – Recall plot Cars Retrieve images in movie Pretty Woman Overview Background I Retrieved images using pLSA ‘object’ coefficients P(z|d) Background II Background III Pretty Woman (6,641 keyframes) Find visual words Form histograms Example face segmentation Retrieved images using visual word histograms Represent each keyframe using topic vectors learned on Caltech database Discover topics Example airplane segmentation Example motorbike segmentation

More Related