1 / 58

Class 3: Feature Coding and Pooling

Class 3: Feature Coding and Pooling. Liangliang Cao, Feb 7, 2013. EECS 6890 - Topics in Information Processing. Spring 2013, Columbia University. http://rogerioferis.com/VisualRecognitionAndSearch. Problem. The Myth of Human Vision. Can you understand the following?. 婴儿 बचचा. 아가 bebé.

zilya
Télécharger la présentation

Class 3: Feature Coding and Pooling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Class 3: Feature Coding and Pooling Liangliang Cao, Feb 7, 2013 EECS 6890 - Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/VisualRecognitionAndSearch

  2. Problem The Myth of Human Vision Can you understand the following? 婴儿बचचा 아가bebé People may have difficulties to understand different texts, but NOT images. Photo courtesy to luster Visual Recognition And Search Columbia University, Spring 2013 2

  3. Problem Can the computer vision system recognize objects or scenes like human? Visual Recognition And Search Columbia University, Spring 2013 3

  4. Why This Problem Important? Mobile Apps Traditional companies Searching engines Visual Recognition And Search Columbia University, Spring 2013 4

  5. Problem Examples of Object Recognition Dataset http://www.vision.caltech.edu Visual Recognition And Search Columbia University, Spring 2013 5

  6. Problem http://groups.csail.mit.edu/vision/SUN/ Visual Recognition And Search Columbia University, Spring 2013 6

  7. Overview of Classification Model Uncertainty- Based Fisher vector/ Supervector Histogram Sparse Coding of SIFT Quantization Coding GMMprobability Vector Soft Soft quantization quantization estimation quantization Pooling Histogram GMM Histogramaggregation Max pooling adptation aggregation Coding: to map local features into a compact representation Pooling: to aggregate these compact representation together Visual Recognition And Search Columbia University, Spring 2013 7

  8. Outline Outlines • Histogram of local features • Bag of words model • Soft quantization and sparse coding • Fisher vector and supervector Visual Recognition And Search Columbia University, Spring 2013 8

  9. Histogram of Local Features Visual Recognition And Search Columbia University, Spring 2013 9

  10. Bag of Words Models Recall of Last Class • Powerful local features - DoG - Hessian, Harris - Dense-sampling Non-fixed number of local regions per image! Visual Recognition And Search Columbia University, Spring 2013 10

  11. Bag of Words Models Recall of Last Class (2) • Histograms can provide a fixed size representation ofimages • Spatial pyramid/gridding can enhance histogrampresentation with spatial information Visual Recognition And Search Columbia University, Spring 2013 11

  12. Bag of Words Models Histogram of Local Features ….. dim = # codewords codewords Visual Recognition And Search Columbia University, Spring 2013 12

  13. Bag of Words Models Histogram of Local Features (2) …… dim = #codewords x #grids Visual Recognition And Search Columbia University, Spring 2013 13

  14. Bag of Words Models Local Feature Quantization … Slide courtesy to Fei-Fei Li Visual Recognition And Search Columbia University, Spring 2013 14

  15. Bag of Words Models Local Feature Quantization … Visual Recognition And Search Columbia University, Spring 2013 15

  16. Bag of Words Models Local Feature Quantization … - Vector quantization - Dictionary learning Visual Recognition And Search Columbia University, Spring 2013 16

  17. Histogram of Local Features Dictionary for Codewords Visual Recognition And Search Columbia University, Spring 2013 17

  18. Bag of Words Models Most slides in this section are courtesy to Fei-Fei Li Visual Recognition And Search Columbia University, Spring 2013 18

  19. Object Bag of ‘words’ Visual Recognition And Search Columbia University, Spring 2013 19

  20. Bag of Words Models Underlining Assumptions - Text Of all the sensory impressions proceeding tothe brain, the visual experiences are the China is forecasting a trade surplus of $90bn(£51bn) to $100bn this year, a threefold dominant ones. Our perception of the worldaround us is based essentially on the increase on 2004's $32bn. The Commerce Ministry said the surplus would be created bya pred messa For a l compa image sensory, brain, $660b China, trade, center annoy visual, perception, surplus, commerce, movie China' retinal, cerebral cortex, exports, imports, US, image delibe eye, cell, optical nerve, image yuan, bank, domestic, foreign, increase, discov agree know t yuan i percep gover Hubel, Wiesel trade, value more c also n followi dema to the countr Hubel yuan a demon permit image the US wise a freely. stored it will t has its allowi a spec image. Visual Recognition And Search Columbia University, Spring 2013 20

  21. Bag of Words Models Underlining Assumptions - Image Visual Recognition And Search Columbia University, Spring 2013 21

  22. learning recognition K-means feature detection& representation image representation category models category (and/or) classifiers decision

  23. Bag of Words Models Borrowing Techniques from Text Classification • PLSA • Naïve Bayesian Model • wn: each patch in an image - wn = [0,0,…1,…,0,0΁T • w: a collection of all N patches in an image - w = [w1,w2,…,wN] • dj: the jth image in an image collection • c: category of the image • z: theme or topic of the patch Visual Recognition And Search Columbia University, Spring 2013 23

  24. Bag of Words Models Probabilistic Latent Semantic Analysis (pLSA) z d w N D Hoffman, 2001 Latent Dirichlet Allocation (LDA)  z c w N D Blei et al., 2001 Visual Recognition And Search Columbia University, Spring 2013 24

  25. Bag of Words Models Probabilistic Latent Semantic Analysis (pLSA) z d w N D “face” Visual Recognition And Search 5 ColSivic et al. ICCV 200 25

  26. Bag of Words Models K d z w p(w| d ) p(w |z )p(z | d )  j N i i k k j D k1 Observed codeword Theme distributions Codeword distributions distributions per image per theme (topic) Parameter estimated by EM or Gibbs sampling Visual Recognition And Search Columbia University, Spring 2013 26 Slide credit: Josef Sivic

  27. Bag of Words Models Recognition using pLSA  z arg max p(z|d) z Visual Recognition And Search Columbia UnSlide credit: Josef Siv 27 ic

  28. Bag of Words Models Scene Recognition using LDA “beach” Latent Dirichlet Allocation (LDA)  z c w N D Fei-Fei et al. ICCV 2005 Visual Recognition And Search Columbia University, Spring 2013 28

  29. Bag of Words Models Spatial-Coherent Latent Topic Model Cao and Fei-Fei, ICCV 2007 Visual Recognition And Search Columbia University, Spring 2013 29

  30. Bag of Words Models Simultaneous Segmentation and Recognition Visual Recognition And Search Columbia University, Spring 2013 30

  31. Bag of Words Models Pros and Cons Images differ from texts! Bag of Words Models are good in - Modeling prior knowledge - Providing intuitive interpretation But these models suffer from - Loss of information in quantization of “visual words” - Loss of spatial information Better coding Better pooling Visual Recognition And Search Columbia University, Spring 2013 31

  32. Soft Quantization and Sparse Coding Visual Recognition And Search Columbia University, Spring 2013 32

  33. Soft Quantization Hard Quantization Visual Recognition And Search Columbia University, Spring 2013 33

  34. Soft Quantization Uncertainty-Based Quantization Model the uncertainty across multiple codewords Gemert et al, Visual Word Ambiguity, PAMI 2009 Visual Recognition And Search Columbia University, Spring 2013 34

  35. Soft Quantization Intuition of UNC Hard quantization Soft quantization based on uncertainty Gemert et al, Visual Word Ambiguity, PAMI 2009 Visual Recognition And Search Columbia University, Spring 2013 35

  36. Soft Quantization Improvement of UNC Visual Recognition And Search Columbia University, Spring 2013 36

  37. Soft Quantization Sparse Coding-Based Quantization Hard quantization can be viewed as an “extremely sparse representation” s.t. A more general but hard to solve representation In practice we consider Sparse coding Visual Recognition And Search Columbia University, Spring 2013 37

  38. Soft Quantization Sparse Coding-Based Quantization Hard quantization can be viewed as an “extremely sparse representation” s.t. A more general but hard to solve representation In practice we consider Sparse coding Visual Recognition And Search Columbia University, Spring 2013 38

  39. Soft Quantization Yang et al obtain goodrecognition accuracy bycombining sparse coding with spatial pyramid and dictionarytraining. More details will be in group presentation. Yang et al, Linear Spatial Pyramid Matching using Sparse Coding for Image Classification, CVPR 2009 Visual Recognition And Search Columbia University, Spring 2013 39

  40. Fisher Vector and Supervector One of the most powerful image/video classification techniques Thanks to Zhen Li and Qiang Chenconstructive suggestions to this section Visual Recognition And Search Columbia University, Spring 2013 40

  41. Fisher Vector and Supervector Winning Systems 2009 2010 2011 2012 Classification task 2010 2011 Large Scale Visual Recognition Challenge Visual Recognition And Search Columbia University, Spring 2013 41

  42. Fisher Vector and Supervector Literature [1] Yan et al, Regression from patch-kernel. CVPR 2008[2] Zhou et al, SIFT-Bag kernel for video event analysis. ACM Multimedia 2008 Supervector [3] Zhou et al, Hierarchical Gaussianization for imageclassification. ICCV 2009: 1971-1977 [4] Zhou et al, Image classification using super-vector coding of local image descriptors. ECCV 2010 [5] Perronnin et al, Improving the Fisher kernel for large-scale image classification, ECCV 2010 Fisher Vector [6] Jégou et al, Aggregating local image descriptorsinto compact codes PAMI 2011. These papers are not very easy to read. Let me take a simplified perspective via coding&pooling framework Visual Recognition And Search Columbia University, Spring 2013 42

  43. Fisher Vector and Supervector Coding without Information Loss • Coding with hard assignment • Coding with soft assignment • How to keep all the information? Visual Recognition And Search Columbia University, Spring 2013 43

  44. Fisher Vector and Supervector Coding without Information Loss • Coding with hard assignment • Coding with soft assignment • How to keep all the information? Visual Recognition And Search Columbia University, Spring 2013 44

  45. Fisher Vector and Supervector An Intuitive Illustration Coding Visual Recognition And Search Columbia University, Spring 2013 45

  46. Fisher Vector and Supervector An Intuitive Illustration Coding Component 2 Component 3 Component 1 Visual Recognition And Search Columbia University, Spring 2013 46

  47. Fisher Vector and Supervector An Intuitive Illustration Component 1 Component 2 Component 3 + + + + + + Pooling Visual Recognition And Search Columbia University, Spring 2013 47

  48. Fisher Vector and Supervector Implementation of Supervector In speech (speaker identification), supervector refer to stacked means of adaptive GMMs. Supervector = Visual Recognition And Search Columbia University, Spring 2013 48

  49. Fisher Vector and Supervector Interpretation with Supervector Origin distribution New distribution Picture from Reynolds, Quatieri, and Dunn, DSP, 2001 Visual Recognition And Search Columbia University, Spring 2013 49

  50. Fisher Vector and Supervector Normalization of Supervector In practice, a normalization process using the covariance matrix often improves the performance Moreover, we can subtract the original mean vector forthe ease of normalization The representation is also called Hierarchical Gaussianization (HG). Visual Recognition And Search Columbia University, Spring 2013 50

More Related