1 / 76

CS598:Visual information Retrieval

CS598:Visual information Retrieval. Lecture IV: Image Representation: Feature Coding and Pooling. Recap of Lecture III. Blob detection Brief of Gaussian filter Scale selection Lapacian of Gaussian ( LoG ) detector Difference of Gaussian ( DoG ) detector Affine co-variant region

reuben
Télécharger la présentation

CS598:Visual information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS598:Visual information Retrieval Lecture IV: Image Representation: Feature Coding and Pooling

  2. Recap of Lecture III • Blob detection • Brief of Gaussian filter • Scale selection • Lapacian of Gaussian (LoG) detector • Difference of Gaussian (DoG) detector • Affine co-variant region • Learning local image descriptors (optional reading)

  3. Outline • Histogram of local features • Bag of words model • Soft quantization and sparse coding • Supervector with Gaussian mixture model

  4. Lecture IV: Part I

  5. Bag-of-features models

  6. Origin 1: Texture recognition • Texture is characterized by the repetition of basic elements or textons • For stochastic textures, it is the identity of the textons, not their spatial arrangement, that matters Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

  7. Origin 1: Texture recognition histogram Universal texton dictionary Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

  8. Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

  9. US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/ Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

  10. US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/ Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

  11. US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/ Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

  12. Bag-of-features steps • Extract features • Learn “visual vocabulary” • Quantize features using visual vocabulary • Represent images by frequencies of “visual words”

  13. 1. Feature extraction • Regular grid or interest regions

  14. 1. Feature extraction Compute descriptor Normalize patch Detect patches Slide credit: Josef Sivic

  15. 1. Feature extraction Slide credit: Josef Sivic

  16. 2. Learning the visual vocabulary Slide credit: Josef Sivic

  17. 2. Learning the visual vocabulary Clustering Slide credit: Josef Sivic

  18. 2. Learning the visual vocabulary Visual vocabulary Clustering Slide credit: Josef Sivic

  19. K-means clustering • Want to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers mk • Algorithm: • Randomly initialize K cluster centers • Iterate until convergence: • Assign each data point to the nearest center • Recompute each cluster center as the mean of all points assigned to it

  20. Clustering and vector quantization • Clustering is a common method for learning a visual vocabulary or codebook • Unsupervised learning process • Each cluster center produced by k-means becomes a codevector • Codebook can be learned on separate training set • Provided the training set is sufficiently representative, the codebook will be “universal” • The codebook is used for quantizing features • A vector quantizer takes a feature vector and maps it to the index of the nearest codevector in a codebook • Codebook = visual vocabulary • Codevector = visual word

  21. Example codebook Appearance codebook Source: B. Leibe

  22. … … … … Appearance codebook Another codebook Source: B. Leibe

  23. Yet another codebook Fei-Fei et al. 2005

  24. Visual vocabularies: Issues • How to choose vocabulary size? • Too small: visual words not representative of all patches • Too large: quantization artifacts, overfitting • Computational efficiency • Vocabulary trees (Nister & Stewenius, 2006)

  25. level 0 Spatial pyramid representation • Extension of a bag of features • Locally orderless representation at several levels of resolution Lazebnik, Schmid & Ponce (CVPR 2006)

  26. level 1 Spatial pyramid representation • Extension of a bag of features • Locally orderless representation at several levels of resolution level 0 Lazebnik, Schmid & Ponce (CVPR 2006)

  27. level 2 Spatial pyramid representation • Extension of a bag of features • Locally orderless representation at several levels of resolution level 1 level 0 Lazebnik, Schmid & Ponce (CVPR 2006)

  28. Scene category dataset Multi-class classification results(100 training images per class)

  29. Multi-class classification results (30 training images per class) Caltech101 dataset http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html

  30. Bags of features for action recognition Space-time interest points Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.

  31. Bags of features for action recognition Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.

  32. Image classification • Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

  33. Lecture IV: Part II

  34. Outline • Histogram of local features • Bag of words model • Soft quantization and sparse coding • Supervector with Gaussian mixture model

  35. Hard quantization Slides credit: Cao & Feris

  36. Soft quantization based on uncertainty • Quantize each local feature into multiple code-words that are closest to it by appropriately splitting its weight Gemert et at., Visual Word Ambiguity, PAMI 2009

  37. Soft Quantization Hard quantization Soft quantization

  38. Some experiments on soft quantization • Improvement on classification rate from soft quantization

  39. Sparse coding • Hard quantization is an “extremely sparse representation • Generalizing from it, we may consider to solve for soft quantization, but it is hard to solve • In practice, we consider solving the sparse coding as for quantization s.t.

  40. Soft quantization and pooling [Yang et al, 2009]: Linear Spatial Pyramid Matching using Sparse Coding for Image Classification

  41. Outline • Histogram of local features • Bag of words model • Soft quantization and sparse coding • Supervector with Gaussian mixture model

  42. Quiz • What is the potential shortcoming of bag-of-feature representation based on quantization and pooling?

  43. Modeling the feature distribution • The bag-of-feature histogram represents the distribution of a set of features • The quantization steps introduces information loss! • How about modeling the distribution without quantization? • Gaussian mixture model

  44. Quiz • Given a set of features how can we fit the GMM distribution?

  45. covariance mean The Gaussian Distribution • Multivariate Gaussian • Define precision to be the inverse of the covariance • In 1-dimension

  46. Likelihood Function • Data set • Assume observed data points generated independently • Viewed as a function of the parameters, this is known as the likelihood function

  47. Maximum Likelihood • Set the parameters by maximizing the likelihood function • Equivalently maximize the log likelihood

  48. Maximum Likelihood Solution • Maximizing w.r.t. the mean gives the sample mean • Maximizing w.r.t covariance gives the sample covariance

  49. Bias of Maximum Likelihood • Consider the expectations of the maximum likelihood estimates under the Gaussian distribution • The maximum likelihood solution systematically under-estimates the covariance • This is an example of over-fitting

  50. Intuitive Explanation of Over-fitting

More Related