1 / 19

Simultaneous Image Classification and Annotation

(Final Version). Simultaneous Image Classification and Annotation. Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR 2009 Presented by Eric Wang 7-3-09. Outline. Introduction Review of sLDA and Corr -LDA Model description

niabi
Télécharger la présentation

Simultaneous Image Classification and Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. (Final Version) Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR 2009 Presented by Eric Wang 7-3-09

  2. Outline • Introduction • Review of sLDA and Corr-LDA • Model description • Model Inference and Parameter Estimation • Empirical Results • Conclusion

  3. Introduction • Images Classification refers to assigning a class label to each image which globally describes the image. • Image Annotation refers to assigning words which describe individual regions of the image. • The images considered in this paper are both classified with a class label and annotated with free text. • This paper will combine the basic framework of Corr-LDA, and a highly modified version of Supervised LDA (sLDA) to yield a model which simultaneously classifies images and annotates the individual regions.

  4. Review of sLDA • For each document • In this model, the response variable y is a continuous random variable. • are treated as unknown constants to be estimated, rather than as random variables. Source: D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.

  5. Review of sLDA • An application of sLDA considered by Blei et. al was regressing a corpus of textual movie reviews to number of stars given. Source: D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.

  6. Review of Corr-LDA • For each document • Corr-LDA is a simple extension of LDA for images to annotate image regions. Source: D. M. Blei and M. I. Jordan. Modeling annotated data. In SIGIR, 2003.

  7. Annotated sLDA Model • This step is identical to LDA, the topic proportions are drawn once per document. • In this paper, is not optimized.

  8. Annotated sLDA Model • A region is characterized by one of 240 codewords (quantized from 128 dimensional SIFT features). • Regions are found by segmenting images using the N-cuts algorithm. • parameterizes a particular multinomial distribution (topic) over the quantized codewords.

  9. Annotated sLDA Model • The class label c is completely determined by the topic indicators z_{1:N} using a modified sLDA framework. • The total number of classes is known a priori and the class indicators are treated separately from the annotations. This is a simpler approach than the one taken by L.J. Li, R. Socher and L. Fei-Fei in that there is no “switch” variable which determines whether a word is an annotation or label. • The softmax function is well studied and is also known as “multinomial logistic regression”

  10. Annotated sLDA Model • The annotations are assigned to specific regions in the same manner as in Corr-LDA. • This will, for example, encourage words such as “blue” and “white” to be associated with regions (and thus, codewords) which capture sky. • Though not explicitly shown in the graphical model, and have symmetric Dirichlet priors.

  11. Inference of Latent Variables These updates are identical to those used in Corr-LDA. • Let parameterize a multinomial over the K topics • Let parameterize a Dirichlet over topic distributions. • Let parameterize a multinomial over image regions • These updates are local to each document (thus the omission of d).

  12. Inference of Latent Variables • This equation updates the posterior distribution over topics. • Note that this update depends on both class label c and the annotation information .

  13. Inference of Latent Variables Parameter Estimation Updates of codebook word f in codebook topic i (proportional to a constant). has no closed form solution and is optimized via conjugate gradient

  14. Inference of Latent Variables Parameter Estimation Updates annotation word w in annotation topic i (proportional to a constant). has no closed form solution and is optimized via conjugate gradient

  15. Empirical Results • LableMe dataset • 8 classes: “highway,” “inside city,” “tall building,” “street,” “forest,” “coast,” “mountain,” and “open country.” • 200 256x256 training images per class. • UIUC dataset • 8 types of sports: “badminton,” “bocce,” “croquet,” “polo,” “rockclimbing,” “rowing,” “sailing” and “snowboarding.” • 1792 256x256 training images. • 240 codeword dictionary. • Annotations which appeared less than 3 times were removed.

  16. Empirical Results: Classification • The black line represents of the performance of Bosch et. al 2006, which employs a non-annotated LDA on the image regions and a KNN to classify the images. • The blue line is the performance of Fei-Fei and Perona 2005, which uses unannotated labeled images • The models presented in this paper are much more resistant to overfitting than the models of Bosch et. al and Fei-Fei and Perona .

  17. Emperical Results: Classification • Confusion matrices comparing the performance of multi-class sLDA with annotations and multi-class sLDA using 100 topic models • Annotations seem to improve performance slightly, although, as the last slide shows, the main benefit is more consistent performance as a function of the number of topics.

  18. Empirical Results: Annotation • The F-Measure is used as a score. • Results are given over all numbers of topics considered above. • LabelMe: • 38.2% (corr-LDA) • 38.7% (multi-class sLDA with annotations) • UIUC-Sport: • 34.7% (corr-LDA) • 35.0% (multiclass sLDA with annotations).

  19. Conclusion • Combining image annotation with classification provides state of the art image classification performance. • However, the addition of the classification framework provides only a small improvement to the annotation performance. • The authors’ primary contribution is showing that image classification and annotation are related and can be conducted simultaneously in the same framework. • Inference was done in a Variational EM framework.

More Related