350 likes | 536 Vues
Learning to Match Images in Large-Scale Collections. Song Cao and Noah Snavely Cornell University. Workshop on Web-scale Vision and Social Media, ECCV 2012. A key problem in Web-scale vision is to discover visual connectivity among a large set of images. Trafalgar Dataset: 6981 images.
 
                
                E N D
Learning to Match Images in Large-Scale Collections • Song Cao and Noah Snavely • Cornell University Workshop on Web-scale Vision and Social Media, ECCV 2012
A key problem in Web-scale vision is to discover visual connectivity among a large set of images Trafalgar Dataset: 6981 images
A key problem in Web-scale vision is to discover visual connectivity among a large set of images
A key problem in Web-scale vision is to discover visual connectivity among a large set of images
A key problem in Web-scale vision is to discover visual connectivity among a large set of images
An example connectivity graphhttp://landmark.cs.cornell.edu/Landmarks3/0001/3951.0/graphview.html
Background • This task requires determining whether any two images overlap or not - image matching
Background • Image matching: • SIFT feature extraction, finding nearest-neighbor features and apply RANSAC methods for all pairs of images • high accuracy, but high computational cost • Brute force (O(n2)) approach (20 pairs / sec):250,000 images ~ 31 billion image pairs; 1 year on 50 machines1,000,000 images ~ 500 billion pairs; 15 years on 50 machines • However, only a small fraction of all possible image pairs actually match (e.g. < 0.1% for city-sized datasets)
Goal • How can we classify image pairs in to matching and non-matching both quickly and accurately? Matching Non-matching
Bag-of-words Model • Widely used in image retrieval, serving as an approximate image similarity measure • Efficient and scalable in retrieval thanks to quantization and inverted files • Useful in choosing promising (similar) image candidates before matching to increase efficiency [1][2] • Usually uses tf-idf weighting as in text retrieval • Inverse Document Frequency (IDF) of word j = [1] Agarwal, S., Snavely, N., Simon, I., Seitz, S., Szeliski, R.: Building Rome in a day. In: ICCV. (2009) [2] Philbin, J., Sivic, J., Zisserman, A.: Geometric latent dirichlet allocation on a matching graph for large-scale image datasets. IJCV (2010)
Bag-of-words Model • However, BoW similarity measure can be noisy • [Example] • TateModern dataset • ~120K randomly chosen testing image pairs • Average Precision: 0.458
Main Idea • Some visual words are more reliable than others for a given dataset • Better weights on visual words may increase prediction accuracy • Our approach: • Apply discriminative learning techniques to improve prediction accuracy and hence matching efficiency • Training data comes from matching itself (iterative learning and matching)
Weighting in BoW Model • Unsupervised approaches • tf-idf weighting • Burstiness [1] • Co-occurring set (“co-ocset”) [2] • Supervised approaches • Learning a fine vocabulary [3] • Selecting important features by matching [4] [1] Jegou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: CVPR. (2009) [2] Chum, O., Matas, J.: Unsupervised discovery of co-occurrence in sparse high dimensional data. In: CVPR. (2010) [3] Mikulik, A., Perdoch, M., Chum, O., Matas, J.: Learning a fine vocabulary. In: ECCV. (2010) [4] Turcot, P., Lowe, D.: Better matching with fewer features: The selection of useful features in large database recognition problems. In: Workshop on Emergent Issues in Large Amounts of Visual Data, ICCV. (2009)
Our Approach Learn an SVM classifier for weighting with positive (matching) and negative (non-matching) image pairs
Our Approach Where does training data come from?
Iterative Learning & Matching • Given a collection of images (represented as BoW histogram vectors): • 1. Find a number of image pairs with high similarities (tfidf similaritiesin the initial round; learned similarities in later rounds) • 2. Apply image matching on them • 3. Perform learning using matching results and obtain new similarity measure • 4. Repeat from 1 until done
Our Approach Non-matching Matching
Learning Formulation • For a pair of images (a,b), define their similarity as • (W is a diagonal matrix) • Goal: learn a weighting W that best separates matching pairs from non-matching pairs • Label for matching pairs (a,b); for non-matching pairs (a’, b’) • Feature vector: for all (a,b), • S: Set of training pairs (a,b) • We use L2 regularized L2-loss SVMs, which optimize
Learning Formulation • We learn a linear classifier, but interpret its score as a similarity measure • Score histograms of matching vs. non-matching pairs are better separated (e.g. TateModern Dataset) • Our model is learned with ~100K example pairs; ~120K randomly chosen testing image pairs AP:0.458 AP:0.704 AP:0.966
Two Extensions • Insufficient training data (during early stage) might cause over-fitting; we propose two extensions: • 1. Negative examples from other datasets helps • 2. A modified regularization for SVMs that uses the tf-idf weights as a prior • Intuition: regularize s.t. weights should be close to a set of “safe” prior weights (e.g. tf-idf) • Recall SVM formulation • Substitute with , where denotes a prior weight vector such as tf-idf weights
Datasets Datasets: 5 Flickr images sets (several thousand images each) + Oxford5K and Paris Trafalgar 6,981 images LondonEye 7,047 images TateModern 4,813 images SanMarco 7,792 images TimeSquare 6,426 images
Experiments • Experiment 1: test how well similarity learning works, measured by mAP scores of ranking other images in the set. • 50 test images from each dataset Test images don’t appear in training image pairs
Experiments Experiment 1: test how well similarity learning works, measured by mAP scores of ranking other images in the set. Example: SanMarco dataset
Experiments Experiment 1: test how well similarity learning works, measured by mAP scores of ranking other images in the set. Example: SanMarco dataset
Experiments Experiment 1: test how well similarity learning works, measured by mAP scores of ranking other images in the set. Example: SanMarco dataset
Experiments Experiment 1: test how well similarity learning works, measured by mAP scores of ranking other images in the set. Example: SanMarco dataset
Experiments Experiment 1: test how well similarity learning works, measured by mAP scores of ranking other images in the set. Example: SanMarco dataset
Experiments • Experiment 1: test how well similarity learning works, measured by mAP scores of ranking other images in the set. • ~ 50 test images for each dataset • Test images don’t appear in training image pairs Oxford5K and Paris each encompass several disparate landmarks, they require more training data, hence modified regularization is essential
Experiments • Experiment 2: Test how much improvement in efficiency in matching images. • Efficiency measured by match success rate: percentage of the image pairs we try to match that turn out to be true matches match success rate(%) iteration number
Experiments • Experiment 2: system evaluation. Test how much improvement in efficiency in matching images. • Efficiency is measured by match success rate: percentage of the image pairs we try to match that turn out to be true matches
Experiments • Number of true matches found as a function of time
Experiments • Number of true matches found as a function of time
Conclusions • Even with small amounts of training data, our approach can predict matching and non-matching image pairs significantly better than tf-idf and co-ocset methods • Overall matching efficiency improved by more than a factor of two • Positive examples are quite specific to different datasets; negative examples could be shared across datasets
Limitations • Good classification for canonical images in a dataset, but worse results for rarer ones (due to uneven amounts of training data for different images)
Thank you!Questions?http://www.cs.cornell.edu/projects/matchlearn/