Advanced Image Retrieval through Bundling Features for Duplicate Web Image Search

Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu∗, QifaKe, Michael Isard, and Jian Sun Microsoft Research

Outline • Introduction • Bundled Features • Image Retrieval using Bundled Features • Experimental Results

Introduction Our goal, given a query image, is to locate its near- and partial-duplicate images in a large corpus of web images.

State-of-the-art large scale image retrieval systems have relied on quantizing local SIFT descriptors into visual words, and then applying scalable textual indexing and retrieval schemes. Bag-of-words representations, however: 1. reduce the discriminative power of image features due to feature quantization 2. ignore geometric relationships among visual words.

Geometric verification becomes an important post-processing step for getting a reasonable retrieval precision, especially for low-resolution images. But full geometric verification is computationally expensive. In practice therefore it is only applied to a subset of the top-ranked candidate images. For web image retrieval the number of near or partial duplicates could be large, and applying full geometric verification to only these top-ranked images may not be sufficient for good recall.

Bundled Features • SIFT The discriminative power of the quantized SIFT feature decreases rapidly, resulting in many false positive matches between individual features. • MSER (Maximally Stable Extremal Regions ) False positive matches remain an issue for large image databases. • Bundled Feature = SIFT + MSER

Image Retrieval using Bundled Features • Feature quantization Use hierarchical K means to obtain a vocabulary of one million visual words from a training set of 50 thousand images. Use a k-d tree to organize these visual words for nearest neighbor search during quantization. To reduce quantization error, we use a soft quantization scheme[13] , mapping a descriptor to its n-nearest visual words in the k-d tree. [13] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, 2008.

K-D Tree pointList = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)]

Matching bundled features

Membership term Geometric term

Inverted index • Indexing and retrieval

Voting TF-IDF

Experimental Results

Re-ranking

Impact of λ Runtime

Conclusion • Bundled features are more discriminative than individual SIFT features. • They allow us to enforce simple and robust geometric constraints at the bundle level.

Advanced Image Retrieval through Bundling Features for Duplicate Web Image Search

Advanced Image Retrieval through Bundling Features for Duplicate Web Image Search

Presentation Transcript

WISE: Large Scale Content-Based Web Image Search

Bundling small scale projects

Bundling Features for Large Scale Partial-Duplicate Web Image Search

Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms

Automatic Wrappers for Large Scale Web Extraction

VisualRank : Applying PageRank to Large-Scale Image Search

Large-Scale Image Parsing

VisualRank - Applying PageRank to Large-Scale Image Search

Cross-Indexing of Binary Scale Invariant Feature Transform Codes for Large-Scale Image Search

Hierarchical Semantic Indexing for Large Scale Image Retrieval

SVD-SIFT FOR WEB NEAR-DUPLICATE IMAGE DETECTION

Large Scale Depositional Features

Large-Scale Nonparametric Image Parsing

VisualRank : Applying PageRank to Large-Scale Image Search

FINDING NEAR DUPLICATE WEB PAGES: A LARGE-SCALE EVALUATION OF ALGORITHMS

Large-Scale Content-Based Image Retrieval

Automatic Wrappers for Large Scale Web Extraction

Very Large Scale Neighborhood Search

MUFIN: Large-scale Similarity Search

Search and Access Technologies for Large Scale Web Archives

Automatic Wrappers for Large Scale Web Extraction

HathiTrust Large Scale Search