1 / 26

Object retrieval with large vocabularies and fast spatial matching

Object retrieval with large vocabularies and fast spatial matching. James Phibin 1 , Ondrej Chum 1 , Michael Isard 2 ,Josef Sivic 1 , and Andrew Zisserman 1 1 Department of Engineering Science, 2 University of Oxford Microsoft Research,Silicon Valley. CVPR 2007. Overview. Problem

quincy
Télécharger la présentation

Object retrieval with large vocabularies and fast spatial matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Object retrieval with large vocabularies and fast spatial matching James Phibin1, Ondrej Chum1, Michael Isard2,Josef Sivic1, and Andrew Zisserman1 1Department of Engineering Science, 2University of Oxford Microsoft Research,Silicon Valley CVPR 2007

  2. Overview • Problem • Input: a user-selected region of a query image • Return: a ranked list of images retrieved from a large corpus. • Containing the same object • Objective • a promising step towards “web-scale” image corpora • Improvement • Improving the visual vocabulary • Incorporating spatial information into the ranking • Examples

  3. Datasets • Source • Flickr • Oxford 5K dataset • “Oxford Christ Church,” “Oxford Radcliffe Camera,”… with “Oxford” • 5,062 (1,024*768) images • 100K dataset • 145 most popular tags • 99,782 (1,024*768) images • 1M dataset • 450 most popular tags • 1,040,801 (500*333) images

  4. Indexing the dataset • Image description • Affine-invariant Hessian regions • 3,300 regions on a 1,024*768 image • SIFT descriptor • 128-D • 4×4× 8-direction gradient histogram • Model • bag-of-visual-words • Quantize the visual descriptors to index the image • Search engine • L2 distance as similarity • tf-idf weighting scheme • more commonly occurring = less discriminative = smaller weight 2×2 8-direction gradient histogram

  5. Train the Dictionary K-mean Approximate k-mean (AKM) Hierarchical k-mean (HKM)

  6. 2D k-d tree AKM v.s.HKM • Traditional k-mean • single iteration • O(NK) • Strategy • Reduce the number of candidates of nearest cluster heads • AKM • Approximate nearest neighbor • replace the exact computing nearest neighbors with • 8 randomized k-d tree of cluster heads • Less than 1% of points are assigned differently from k-mean for moderate values of K • HKM • “vocabulary tree” • A small number (K=10) of cluster centers at each level • Kn clusters at the n-th level • Quantization effect • AKM • Conjunction of trees • Overlapping partition • HKM • Points can additionally be assigned to some internal nodes

  7. Comparing vocabularies K-mean v.s. AKM HKM v.s.AKM Scaling up with AKM

  8. Ground Truth • Dataset • 5K dataset • Searching • Manual • Entire • For 11 landmarks • Labels • Positive • Good: nice, clear • OK: more than 25% of the object • Null • Junk: less than 25% • Negative • Absent: object not present

  9. 5 queries for each landmark

  10. Evaluation • Precision • # of retrieved positive images / # of total retrieved images • Recall • # of retrieved positive images / # of total positive images • Average precision (AP) • The area under the precision-recall curve for a query • Mean average precision (mAP) • Average AP for each of the 5 queries for a landmark • Final mAP = average for mAP for each landmark

  11. K-mean v.s. AKM

  12. HKM v.s.AKM

  13. Recognition Benchmark D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 2161-2168, June 2006.

  14. Scaling up with AKM

  15. Spatial re-ranking

  16. Use Spatial Info. • Usage • Re-ranking the top ranked results • Procedure • Estimate a transformation for each target image • Refine the estimations • Reduce the errors due to outliers • LO-RANSAC • RANdom SAmple Consensus • Additional modeL Optimization step • Re-rank target images • Scoring target images to the sum of the idf value for the inlier words • Verified images above unverified images

  17. Restricted transformation • Degree of freedom • 3 dof • Isotropic scale • Covering the changes in zoom or distance • 4 dof • Anisotropic scale • Covering foreshortening, either horizontal or vertical • 5 dof • Anisotropic scale and vertical shear • NOT • In-plane rotation foreshorten (perspective) shear

  18. Comparing spatial rankings Different transformation types Large datasets Examples Examples of errors

  19. Different transformation types

  20. Large datasets

  21. Examples

  22. Examples of errors

  23. Conclusion • Conclusion • Scalable visual object-retrieval system • Future work • More evaluation for higher scale • Including spatial info. into the index • Moving some of the burden of spatial matching to the first ranking stage

  24. RANSAC http://en.wikipedia.org/wiki/RANSAC

  25. RANSAC example

More Related