1 / 48

Visual Object Recognition

Visual Object Recognition. Bastian Leibe & Computer Vision Laboratory ETH Zurich Chicago, 14.07.2008. Kristen Grauman Department of Computer Sciences University of Texas in Austin. Outline. Detection with Global Appearance & Sliding Windows

tamas
Télécharger la présentation

Visual Object Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH Zurich Chicago, 14.07.2008 Kristen Grauman Department of Computer Sciences University of Texas in Austin

  2. Outline Detection with Global Appearance & Sliding Windows Local Invariant Features: Detection & Description Specific Object Recognition with Local Features ― Coffee Break ― Visual Words: Indexing, Bags of Words Categorization Matching Local Feature Sets Part-Based Models for Categorization Current Challenges and Research Directions 2 K. Grauman, B. Leibe

  3. Global representations: limitations • Success may rely on alignment -> sensitive to viewpoint • All parts of the image or window impact the description -> sensitive to occlusion, clutter K. Grauman, B. Leibe

  4. Maximally Stable Extremal Regions [Matas 02] Shape context [Belongie 02] Superpixels [Ren et al.] SIFT [Lowe 99] Spin images [Johnson 99] Geometric Blur [Berg 05] Local representations • Describe component regions or patches separately. • Many options for detection & description… Salient regions [Kadir 01] Harris-Affine [Mikolajczyk 04] K. Grauman, B. Leibe

  5. y1 y2 … yd x1 x2 … xd Recall: Invariant local features Subset of local feature types designed to be invariant to • Scale • Translation • Rotation • Affine transformations • Illumination • Detect interest points • Extract descriptors [Mikolajczyk01, Matas02, Tuytelaars04, Lowe99, Kadir01,… ] K. Grauman, B. Leibe

  6. Aachen Cathedral Recognition with local feature sets • Previously, we saw how to use local invariant features + a global spatial model to recognize specific objects, using a planar object assumption. • Now, we’ll use local features for • Indexing-based recognition • Bags of words representations • Correspondence / matching kernels K. Grauman, B. Leibe

  7. Basic flow … … Index each one into pool of descriptors from previously seen images … Describe features Detect or sample features List of positions, scales, orientations Associated list of d-dimensional descriptors K. Grauman, B. Leibe

  8. Indexing local features • Each patch / region has a descriptor, which is a point in some high-dimensional feature space (e.g., SIFT) K. Grauman, B. Leibe

  9. Indexing local features • When we see close points in feature space, we have similar descriptors, which indicates similar local content. Figure credit: A. Zisserman K. Grauman, B. Leibe

  10. Indexing local features • We saw in the previous section how to use voting and pose clustering to identify objects using local features Figure credit: David Lowe K. Grauman, B. Leibe

  11. Indexing local features • With potentially thousands of features per image, and hundreds to millions of images to search, how to efficiently find those that are relevant to a new image? • Low-dimensional descriptors : can use standard efficient data structures for nearest neighbor search • High-dimensional descriptors: approximate nearest neighbor search methods more practical • Inverted file indexing schemes K. Grauman, B. Leibe

  12. Indexing local features: approximate nearest neighbor search Best-Bin First (BBF), a variant of k-d trees that uses priority queue to examine most promising branches first [Beis & Lowe, CVPR 1997] Locality-Sensitive Hashing (LSH), a randomized hashing technique using hash functions that map similar points to the same bin, with high probability [Indyk & Motwani, 1998] K. Grauman, B. Leibe

  13. Indexing local features: inverted file index • For text documents, an efficient way to find all pages on which a word occurs is to use an index… • We want to find all images in which a feature occurs. • To use this idea, we’ll need to map our features to “visual words”. K. Grauman, B. Leibe

  14. Visual words: main idea • Extract some local features from a number of images … e.g., SIFT descriptor space: each point is 128-dimensional Slide credit: D. Nister K. Grauman, B. Leibe

  15. Visual words: main idea Slide credit: D. Nister K. Grauman, B. Leibe

  16. Visual words: main idea Slide credit: D. Nister K. Grauman, B. Leibe

  17. Visual words: main idea Slide credit: D. Nister K. Grauman, B. Leibe

  18. Slide credit: D. Nister K. Grauman, B. Leibe

  19. Slide credit: D. Nister K. Grauman, B. Leibe

  20. Visual words: main idea Map high-dimensional descriptors to tokens/words by quantizing the feature space • Quantize via clustering, let cluster centers be the prototype “words” Descriptor space K. Grauman, B. Leibe

  21. Visual words: main idea Map high-dimensional descriptors to tokens/words by quantizing the feature space • Determine which word to assign to each new image region by finding the closest cluster center. Descriptor space K. Grauman, B. Leibe

  22. Visual words • Example: each group of patches belongs to the same visual word Figure from Sivic & Zisserman, ICCV 2003 K. Grauman, B. Leibe

  23. First explored for texture and material representations Texton = cluster center of filter responses over collection of images Describe textures and materials based on distribution of prototypical texture elements. Visual words Leung & Malik 1999; Varma & Zisserman, 2002; Lazebnik, Schmid & Ponce, 2003;

  24. Visual words • More recently used for describing scenes and objects for the sake of indexing or classification. Sivic & Zisserman 2003; Csurka, Bray, Dance, & Fan 2004; many others. K. Grauman, B. Leibe

  25. Inverted file index for images comprised of visual words List of image numbers Word number K. Grauman, B. Leibe Image credit: A. Zisserman

  26. Bags of visual words • Summarize entire image based on its distribution (histogram) of word occurrences. • Analogous to bag of words representation commonly used for documents. Image credit: Fei-Fei Li K. Grauman, B. Leibe

  27. Video Google System Query region • Collect all words within query region • Inverted file index to find relevant frames • Compare word counts • Spatial verification Sivic & Zisserman, ICCV 2003 • Demo online at : http://www.robots.ox.ac.uk/~vgg/research/vgoogle/index.html Retrieved frames K. Grauman, B. Leibe

  28. Basic flow … … Index each one into pool of descriptors from previously seen images … or … Describe features Detect or sample features Quantize to form bag of words vector for the image List of positions, scales, orientations Associated list of d-dimensional descriptors K. Grauman, B. Leibe

  29. Visual vocabulary formation Issues: • Sampling strategy • Clustering / quantization algorithm • Unsupervised vs. supervised • What corpus provides features (universal vocabulary?) • Vocabulary size, number of words K. Grauman, B. Leibe

  30. Sampling strategies Sparse, at interest points Dense, uniformly Randomly • To find specific, textured objects, sparse sampling from interest points often more reliable. • Multiple complementary interest operators offer more image coverage. • For object categorization, dense sampling offers better coverage. • [See Nowak, Jurie & Triggs, ECCV 2006] Multiple interest operators Image credits: F-F. Li, E. Nowak, J. Sivic K. Grauman, B. Leibe

  31. Clustering / quantization methods • k-means (typical choice), agglomerative clustering, mean-shift,… • Hierarchical clustering: allows faster insertion / word assignment while still allowing large vocabularies • Vocabulary tree [Nister & Stewenius, CVPR 2006] K. Grauman, B. Leibe

  32. Example: Recognition with Vocabulary Tree • Tree construction: [Nister & Stewenius, CVPR’06] K. Grauman, B. Leibe Slide credit: David Nister

  33. Vocabulary Tree • Training: Filling the tree [Nister & Stewenius, CVPR’06] K. Grauman, B. Leibe Slide credit: David Nister

  34. Vocabulary Tree • Training: Filling the tree [Nister & Stewenius, CVPR’06] K. Grauman, B. Leibe Slide credit: David Nister

  35. Vocabulary Tree • Training: Filling the tree [Nister & Stewenius, CVPR’06] K. Grauman, B. Leibe Slide credit: David Nister

  36. Vocabulary Tree • Training: Filling the tree [Nister & Stewenius, CVPR’06] K. Grauman, B. Leibe Slide credit: David Nister

  37. Vocabulary Tree • Training: Filling the tree [Nister & Stewenius, CVPR’06] K. Grauman, B. Leibe Slide credit: David Nister

  38. Vocabulary Tree • Recognition RANSACverification [Nister & Stewenius, CVPR’06] K. Grauman, B. Leibe Slide credit: David Nister

  39. Vocabulary Tree: Performance • Evaluated on large databases • Indexing with up to 1M images • Online recognition for databaseof 50,000 CD covers • Retrieval in ~1s • Find experimentally that large vocabularies can be beneficial for recognition [Nister & Stewenius, CVPR’06] K. Grauman, B. Leibe

  40. Vocabulary formation • Ensembles of trees provide additional robustness Moosmann, Jurie, & Triggs 2006; Yeh, Lee, & Darrell 2007; Bosch, Zisserman, & Munoz 2007; … Figure credit: F. Jurie K. Grauman, B. Leibe

  41. Supervised vocabulary formation • Recent work considers how to leverage labeled images when constructing the vocabulary Perronnin, Dance, Csurka, & Bressan, Adapted Vocabularies for Generic Visual Categorization, ECCV 2006. K. Grauman, B. Leibe

  42. Supervised vocabulary formation • Merge words that don’t aid in discriminability Winn, Criminisi, & Minka, Object Categorization by Learned Universal Visual Dictionary, ICCV 2005

  43. Supervised vocabulary formation • Consider vocabulary and classifier construction jointly. Yang, Jin, Sukthankar, & Jurie, Discriminative Visual Codebook Generation with Classifier Training for Object Category Recognition, CVPR 2008. K. Grauman, B. Leibe

  44. Learning and recognition with bag of words histograms • Bag of words representation makes it possible to describe the unordered point set with a single vector (of fixed dimension across image examples) • Provides easy way to use distribution of feature types with various learning algorithms requiring vector input. K. Grauman, B. Leibe

  45. Learning and recognition with bag of words histograms • …including unsupervised topic models designed for documents. • Hierarchical Bayesian text models (pLSA and LDA) • Hoffman 2001, Blei, Ng & Jordan, 2004 • For object and scene categorization: Sivic et al. 2005, Sudderth et al. 2005, Quelhas et al. 2005, Fei-Fei et al. 2005 K. Grauman, B. Leibe Figure credit: Fei-Fei Li

  46. z d w N D “face” Learning and recognition with bag of words histograms • …including unsupervised topic models designed for documents. Probabilistic Latent Semantic Analysis (pLSA) Sivic et al. ICCV 2005 [pLSA code available at: http://www.robots.ox.ac.uk/~vgg/software/] K. Grauman, B. Leibe Figure credit: Fei-Fei Li

  47. Bags of words: pros and cons + flexible to geometry / deformations / viewpoint + compact summary of image content + provides vector representation for sets + has yielded good recognition results in practice • basic model ignores geometry – must verify afterwards, or encode via features • background and foreground mixed when bag covers whole image • interest points or sampling: no guarantee to capture object-level parts • optimal vocabulary formation remains unclear K. Grauman, B. Leibe

  48. Outline Detection with Global Appearance & Sliding Windows Local Invariant Features: Detection & Description Specific Object Recognition with Local Features ― Coffee Break ― Visual Words: Indexing, Bags of Words Categorization Matching Local Feature Sets Part-Based Models for Categorization Current Challenges and Research Directions 48 K. Grauman, B. Leibe

More Related