1 / 56

Image Context, Efficient Indexing, and Sense-Specific Category Models

Image Context, Efficient Indexing, and Sense-Specific Category Models. Trevor Darrell Kristen Grauman(*) Tom Yeh Kate Saenko MIT CSAIL  UC Berkeley EECS & ICSI (*) UT Austin CS. Outline. Photo-based Question Answering Tom Yeh Efficient indexing with local image features

Télécharger la présentation

Image Context, Efficient Indexing, and Sense-Specific Category Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Image Context, Efficient Indexing, and Sense-Specific Category Models Trevor Darrell Kristen Grauman(*) Tom Yeh Kate Saenko MIT CSAIL  UC Berkeley EECS & ICSI (*) UT Austin CS

  2. Outline • Photo-based Question Answering • Tom Yeh • Efficient indexing with local image features • Kristen Grauman • Multimodal Sense Disambiguation for Visual Category Models • Kate Saenko

  3. Photo-based Question Answering Tom Yeh John Lee Trevor Darrell MIT CSAIL UC Berkeley EECS & ICSI

  4. Text-based QA Systems Yahoo! Answers

  5. Text-based versus Photo-based QA

  6. Difficult photo-based QA can be handled by the community

  7. An easier example

  8. An easier example Current image matching and question matching technologies enable us to handle simpler photo-based QA automatically.

  9. System architecture How many floors? Template-based QA Who is the architect? Is there any problem? Books Buildings WWW Frank Gehry Layer 1 IR-based QA Resolved Questions How many stories? 9 floors Layer 2 Human-based QA People are getting lost a lot. Community Layer 3 What labs are here? CSAIL

  10. Prototype 1: Adding photos to a text-based QA system 1 2 3 4 5 6 7

  11. Prototype 2: Adding QA to a photo-album system 1 2 3 4 5

  12. Prototype 3: Applying photo-based QA to mobile devices. 1 2 3 4 5 6 7

  13. Our pilot multimedia dataset

  14. Sample questions

  15. Sample match results

  16. Outline • Photo-based Question Answering • Tom Yeh • Efficient indexing with local image features • Kristen Grauman • Multimodal Sense Disambiguation for Visual Category Models • Kate Saenko

  17. Efficient Image Indexing Methods for Scene and Object Recognition Trevor Darrell UC-Berkeley EECS & ICSI Kristen Grauman University of Texas at Austin Dept. of Computer Sciences

  18. Fast image indexing Goal: to recognize locations and objects, match queries by image content.

  19. Fast image indexing Large and evolving image repository • Key technical challenges: • Robustness to variable viewing conditions • Queries are time-sensitive, but database is huge • Approach: develop sub-linear time search methods for “good” image representations and metrics.

  20. Local Features • Local features provide invariance to geometric and photometric variation • Want fast correspondence-based search with local features

  21. Intra-class appearance Local image features Illumination Object pose Clutter Occlusions Viewpoint

  22. Maximally Stable Extremal Regions [Matas et al.] Shape context [Belongie et al.] Superpixels [Ren et al.] SIFT [Lowe] Spin images [Johnson and Hebert] Geometric Blur [Berg et al.] Local image features Describe component regions or patches separately Salient regions [Kadir et al.] Harris-Affine [Schmid et al.]

  23. Partially matching sets of features Optimal match: O(m3) Greedy match: O(m2 log m) Pyramid match: O(m) Approximation makes large sets of features practical (m=num pts). Optimal match maximizes total similarity of matched points. [Grauman & Darrell, ICCV 2005]

  24. Counting matches with intersection Histogram intersection

  25. Example pyramid match Num “new” matches

  26. Example pyramid match

  27. Example pyramid match

  28. Example pyramid match pyramid match optimal match

  29. How to index efficiently over correspondences? N 3 2 ? 1 Most similar images according to local feature correspondences Query image Large database of images Approximate matching

  30. Image search with matching-sensitive hash functions • Main idea: • Map point sets to a vector space in such a way that a dot product reflects partial match similarity (normalized pyramid match value). • Exploit random hyperplane properties to construct matching-sensitive hash functions. • Perform approximate similarity search on hashed examples. [Grauman & Darrell, CVPR 2007]

  31. Locality Sensitive Hashing (LSH) N Xi h h r1…rk r1…rk Q Guarantee “approximate”-nearest neighbors in sub-linear time, given appropriate hash functions. << N 110101 110111 Q 111101 [Indyk and Motwani 1998, Charikar 2002]

  32. LSH functions for dot products The probability that a randomhyperplane separates two unit vectors is related to the angle between them. for High dot product: unlikely to split Lower dot product: likely to split [Goemans and Williamson 1995, Charikar 2004]

  33. [ 1, 0, 3 ] A useful property of intersection histograms padded unary encoding = [1, 3, 5] = [ 1 0 0 0 0 1 1 1 0 0 1 1 1 1 1 ] = [2, 0, 3] = [ 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 ] [1+0+0+0+0+0+0+0+0+0+1+1+1+0+0]

  34. Pyramid match definition ~ Intersection diff. = number of new matches Pyramid match (un-normalized) expressed as sum of weighted intersections

  35. w0-w1 w1-w2 w2-w3 w3 Vector encoding of pyramids [11110,… 00000,… 11110,… 11110,… 11110,… 00000,… 11110,… 00000,… 11000,… 11110,… 11000,… 11000,… 11100,… 11000,… 11111] Weighted sparse count vector Implicit unary encoding Point set Multi-resolution histogram Sparse count vector

  36. w0-w1 w1-w2 w2-w3 w3 Vector encoding of pyramids w0-w1 w1-w2 Dot product between embedded point sets yields pyramid match kernel value w2-w3 w3 Length of an embedded point set is equivalent to its self-similarity

  37. Matching-sensitive hash functions Normalized pyramid match kernel value Probability of collision (hash bits equal) Probability of collision Normalized partial match similarity

  38. N Xi h h r1…rk r1…rk Q Pyramid match hashing Randomized hash functions Embed point sets as pyramids Probability of collision = normalized partial match similarity << N 110101 110111 Q 111101 Guaranteed retrieval of -approx NN in time.

  39. Indexing object images • Caltech101 data set • 101 categories 40-800 images per class • Features: • Densely sampled • SIFT descriptor + spatial • Average m=1140 per set Query object Data provided by Fei-Fei, Fergus, and Perona

  40. Results: indexing object images • Query time controlled by required accuracy • e.g., search less than 2% of database examples for accuracy close to linear scan k-NN error rate Epsilon (ε) slower search faster search

  41. Summary • Content-based queries for location recognition demand fast search algorithms for useful image metrics. • Contributions: • Scalable matching for local representations • Sub-linear time search with matching • Recently extended to semi-supervised hash functions for learned metrics • (See Jain, Kulis, & Grauman, CVPR 2008)

  42. Trevor Darrell trevor@eecs.berkeley.edu Kristen Grauman grauman@cs.utexas.edu • Relevant papers: • P. Jain, B. Kulis, and K. Grauman. Fast Image Search for Learned Metrics. To appear, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, Alaska, June 2008. • K. Grauman and T. Darrell. Pyramid Match Hashing: Sub-Linear Time Indexing Over Partial Correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, June 2007. • K. Grauman and T. Darrell.  The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Beijing, China, October 2005.

  43. Outline • Photo-based Question Answering • Tom Yeh • Efficient indexing with local image features • Kristen Grauman • Multimodal Sense Disambiguation for Visual Category Models • Kate Saenko

  44. Multimodal Sense Disambiguation for Semi-Supervised Learning of Object Categories from the Web Kate Saenko Trevor Darrell MIT CSAIL UC Berkeley EECS & ICSI

  45. Clutter and Sense ambiguity • Tag-based retrieval returns a lot of clutter • One approach: bootstrap from seed image set • E.g., Fei-Fei et al., OPTIMOL • But how to get unusual apperances of category?

  46. Topic models for image clustering • Latent Dirchlet Allocation • Unsupervised learning of latent topic space • Distance in topic space groups together similar images

  47. Mouse? A multimodal similarity measure can discover unusual appearances

  48. Fused LDA

  49. Multiple senses • Bass: Fish? Musical Instrument? • Mouse: Computer? Animal? • Topic model allows segregation of distinct senses: • use seed data to identify inlier multimodal topics, • two possible approaches: 1) select either single best inlier topic, or 2) threshold to multiple topics • compute distance based on selected latent dimensions

More Related