560 likes | 657 Vues
Learn about fast image indexing methods using local features for accurate location and object recognition in large image repositories. Explore key challenges, local feature types, and efficient indexing techniques.
E N D
Image Context, Efficient Indexing, and Sense-Specific Category Models Trevor Darrell Kristen Grauman(*) Tom Yeh Kate Saenko MIT CSAIL UC Berkeley EECS & ICSI (*) UT Austin CS
Outline • Photo-based Question Answering • Tom Yeh • Efficient indexing with local image features • Kristen Grauman • Multimodal Sense Disambiguation for Visual Category Models • Kate Saenko
Photo-based Question Answering Tom Yeh John Lee Trevor Darrell MIT CSAIL UC Berkeley EECS & ICSI
Text-based QA Systems Yahoo! Answers
An easier example Current image matching and question matching technologies enable us to handle simpler photo-based QA automatically.
System architecture How many floors? Template-based QA Who is the architect? Is there any problem? Books Buildings WWW Frank Gehry Layer 1 IR-based QA Resolved Questions How many stories? 9 floors Layer 2 Human-based QA People are getting lost a lot. Community Layer 3 What labs are here? CSAIL
Prototype 1: Adding photos to a text-based QA system 1 2 3 4 5 6 7
Prototype 3: Applying photo-based QA to mobile devices. 1 2 3 4 5 6 7
Outline • Photo-based Question Answering • Tom Yeh • Efficient indexing with local image features • Kristen Grauman • Multimodal Sense Disambiguation for Visual Category Models • Kate Saenko
Efficient Image Indexing Methods for Scene and Object Recognition Trevor Darrell UC-Berkeley EECS & ICSI Kristen Grauman University of Texas at Austin Dept. of Computer Sciences
Fast image indexing Goal: to recognize locations and objects, match queries by image content.
Fast image indexing Large and evolving image repository • Key technical challenges: • Robustness to variable viewing conditions • Queries are time-sensitive, but database is huge • Approach: develop sub-linear time search methods for “good” image representations and metrics.
Local Features • Local features provide invariance to geometric and photometric variation • Want fast correspondence-based search with local features
Intra-class appearance Local image features Illumination Object pose Clutter Occlusions Viewpoint
Maximally Stable Extremal Regions [Matas et al.] Shape context [Belongie et al.] Superpixels [Ren et al.] SIFT [Lowe] Spin images [Johnson and Hebert] Geometric Blur [Berg et al.] Local image features Describe component regions or patches separately Salient regions [Kadir et al.] Harris-Affine [Schmid et al.]
Partially matching sets of features Optimal match: O(m3) Greedy match: O(m2 log m) Pyramid match: O(m) Approximation makes large sets of features practical (m=num pts). Optimal match maximizes total similarity of matched points. [Grauman & Darrell, ICCV 2005]
Counting matches with intersection Histogram intersection
Example pyramid match Num “new” matches
Example pyramid match pyramid match optimal match
How to index efficiently over correspondences? N 3 2 ? 1 Most similar images according to local feature correspondences Query image Large database of images Approximate matching
Image search with matching-sensitive hash functions • Main idea: • Map point sets to a vector space in such a way that a dot product reflects partial match similarity (normalized pyramid match value). • Exploit random hyperplane properties to construct matching-sensitive hash functions. • Perform approximate similarity search on hashed examples. [Grauman & Darrell, CVPR 2007]
Locality Sensitive Hashing (LSH) N Xi h h r1…rk r1…rk Q Guarantee “approximate”-nearest neighbors in sub-linear time, given appropriate hash functions. << N 110101 110111 Q 111101 [Indyk and Motwani 1998, Charikar 2002]
LSH functions for dot products The probability that a randomhyperplane separates two unit vectors is related to the angle between them. for High dot product: unlikely to split Lower dot product: likely to split [Goemans and Williamson 1995, Charikar 2004]
[ 1, 0, 3 ] A useful property of intersection histograms padded unary encoding = [1, 3, 5] = [ 1 0 0 0 0 1 1 1 0 0 1 1 1 1 1 ] = [2, 0, 3] = [ 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 ] [1+0+0+0+0+0+0+0+0+0+1+1+1+0+0]
Pyramid match definition ~ Intersection diff. = number of new matches Pyramid match (un-normalized) expressed as sum of weighted intersections
w0-w1 w1-w2 w2-w3 w3 Vector encoding of pyramids [11110,… 00000,… 11110,… 11110,… 11110,… 00000,… 11110,… 00000,… 11000,… 11110,… 11000,… 11000,… 11100,… 11000,… 11111] Weighted sparse count vector Implicit unary encoding Point set Multi-resolution histogram Sparse count vector
w0-w1 w1-w2 w2-w3 w3 Vector encoding of pyramids w0-w1 w1-w2 Dot product between embedded point sets yields pyramid match kernel value w2-w3 w3 Length of an embedded point set is equivalent to its self-similarity
Matching-sensitive hash functions Normalized pyramid match kernel value Probability of collision (hash bits equal) Probability of collision Normalized partial match similarity
N Xi h h r1…rk r1…rk Q Pyramid match hashing Randomized hash functions Embed point sets as pyramids Probability of collision = normalized partial match similarity << N 110101 110111 Q 111101 Guaranteed retrieval of -approx NN in time.
Indexing object images • Caltech101 data set • 101 categories 40-800 images per class • Features: • Densely sampled • SIFT descriptor + spatial • Average m=1140 per set Query object Data provided by Fei-Fei, Fergus, and Perona
Results: indexing object images • Query time controlled by required accuracy • e.g., search less than 2% of database examples for accuracy close to linear scan k-NN error rate Epsilon (ε) slower search faster search
Summary • Content-based queries for location recognition demand fast search algorithms for useful image metrics. • Contributions: • Scalable matching for local representations • Sub-linear time search with matching • Recently extended to semi-supervised hash functions for learned metrics • (See Jain, Kulis, & Grauman, CVPR 2008)
Trevor Darrell trevor@eecs.berkeley.edu Kristen Grauman grauman@cs.utexas.edu • Relevant papers: • P. Jain, B. Kulis, and K. Grauman. Fast Image Search for Learned Metrics. To appear, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, Alaska, June 2008. • K. Grauman and T. Darrell. Pyramid Match Hashing: Sub-Linear Time Indexing Over Partial Correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, June 2007. • K. Grauman and T. Darrell. The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Beijing, China, October 2005.
Outline • Photo-based Question Answering • Tom Yeh • Efficient indexing with local image features • Kristen Grauman • Multimodal Sense Disambiguation for Visual Category Models • Kate Saenko
Multimodal Sense Disambiguation for Semi-Supervised Learning of Object Categories from the Web Kate Saenko Trevor Darrell MIT CSAIL UC Berkeley EECS & ICSI
Clutter and Sense ambiguity • Tag-based retrieval returns a lot of clutter • One approach: bootstrap from seed image set • E.g., Fei-Fei et al., OPTIMOL • But how to get unusual apperances of category?
Topic models for image clustering • Latent Dirchlet Allocation • Unsupervised learning of latent topic space • Distance in topic space groups together similar images
Mouse? A multimodal similarity measure can discover unusual appearances
Multiple senses • Bass: Fish? Musical Instrument? • Mouse: Computer? Animal? • Topic model allows segregation of distinct senses: • use seed data to identify inlier multimodal topics, • two possible approaches: 1) select either single best inlier topic, or 2) threshold to multiple topics • compute distance based on selected latent dimensions