1 / 35

Learning to Match Images in Large-Scale Collections

Learning to Match Images in Large-Scale Collections. Song Cao and Noah Snavely Cornell University. Workshop on Web-scale Vision and Social Media, ECCV 2012. A key problem in Web-scale vision is to discover visual connectivity among a large set of images. Trafalgar Dataset: 6981 images.

lester
Télécharger la présentation

Learning to Match Images in Large-Scale Collections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to Match Images in Large-Scale Collections • Song Cao and Noah Snavely • Cornell University Workshop on Web-scale Vision and Social Media, ECCV 2012

  2. A key problem in Web-scale vision is to discover visual connectivity among a large set of images Trafalgar Dataset: 6981 images

  3. A key problem in Web-scale vision is to discover visual connectivity among a large set of images

  4. A key problem in Web-scale vision is to discover visual connectivity among a large set of images

  5. A key problem in Web-scale vision is to discover visual connectivity among a large set of images

  6. An example connectivity graphhttp://landmark.cs.cornell.edu/Landmarks3/0001/3951.0/graphview.html

  7. Background • This task requires determining whether any two images overlap or not - image matching

  8. Background • Image matching: • SIFT feature extraction, finding nearest-neighbor features and apply RANSAC methods for all pairs of images • high accuracy, but high computational cost • Brute force (O(n2)) approach (20 pairs / sec):250,000 images ~ 31 billion image pairs; 1 year on 50 machines1,000,000 images ~ 500 billion pairs; 15 years on 50 machines • However, only a small fraction of all possible image pairs actually match (e.g. < 0.1% for city-sized datasets)

  9. Goal • How can we classify image pairs in to matching and non-matching both quickly and accurately? Matching Non-matching

  10. Bag-of-words Model • Widely used in image retrieval, serving as an approximate image similarity measure • Efficient and scalable in retrieval thanks to quantization and inverted files • Useful in choosing promising (similar) image candidates before matching to increase efficiency [1][2] • Usually uses tf-idf weighting as in text retrieval • Inverse Document Frequency (IDF) of word j = [1] Agarwal, S., Snavely, N., Simon, I., Seitz, S., Szeliski, R.: Building Rome in a day. In: ICCV. (2009) [2] Philbin, J., Sivic, J., Zisserman, A.: Geometric latent dirichlet allocation on a matching graph for large-scale image datasets. IJCV (2010)

  11. Bag-of-words Model • However, BoW similarity measure can be noisy • [Example] • TateModern dataset • ~120K randomly chosen testing image pairs • Average Precision: 0.458

  12. Main Idea • Some visual words are more reliable than others for a given dataset • Better weights on visual words may increase prediction accuracy • Our approach: • Apply discriminative learning techniques to improve prediction accuracy and hence matching efficiency • Training data comes from matching itself (iterative learning and matching)

  13. Weighting in BoW Model • Unsupervised approaches • tf-idf weighting • Burstiness [1] • Co-occurring set (“co-ocset”) [2] • Supervised approaches • Learning a fine vocabulary [3] • Selecting important features by matching [4] [1] Jegou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: CVPR. (2009) [2] Chum, O., Matas, J.: Unsupervised discovery of co-occurrence in sparse high dimensional data. In: CVPR. (2010) [3] Mikulik, A., Perdoch, M., Chum, O., Matas, J.: Learning a fine vocabulary. In: ECCV. (2010) [4] Turcot, P., Lowe, D.: Better matching with fewer features: The selection of useful features in large database recognition problems. In: Workshop on Emergent Issues in Large Amounts of Visual Data, ICCV. (2009)

  14. Our Approach Learn an SVM classifier for weighting with positive (matching) and negative (non-matching) image pairs

  15. Our Approach Where does training data come from?

  16. Iterative Learning & Matching • Given a collection of images (represented as BoW histogram vectors): • 1. Find a number of image pairs with high similarities (tfidf similaritiesin the initial round; learned similarities in later rounds) • 2. Apply image matching on them • 3. Perform learning using matching results and obtain new similarity measure • 4. Repeat from 1 until done

  17. Our Approach Non-matching Matching

  18. Learning Formulation • For a pair of images (a,b), define their similarity as • (W is a diagonal matrix) • Goal: learn a weighting W that best separates matching pairs from non-matching pairs • Label for matching pairs (a,b); for non-matching pairs (a’, b’) • Feature vector: for all (a,b), • S: Set of training pairs (a,b) • We use L2 regularized L2-loss SVMs, which optimize

  19. Learning Formulation • We learn a linear classifier, but interpret its score as a similarity measure • Score histograms of matching vs. non-matching pairs are better separated (e.g. TateModern Dataset) • Our model is learned with ~100K example pairs; ~120K randomly chosen testing image pairs AP:0.458 AP:0.704 AP:0.966

  20. Two Extensions • Insufficient training data (during early stage) might cause over-fitting; we propose two extensions: • 1. Negative examples from other datasets helps • 2. A modified regularization for SVMs that uses the tf-idf weights as a prior • Intuition: regularize s.t. weights should be close to a set of “safe” prior weights (e.g. tf-idf) • Recall SVM formulation • Substitute with , where denotes a prior weight vector such as tf-idf weights

  21. Datasets Datasets: 5 Flickr images sets (several thousand images each) + Oxford5K and Paris Trafalgar 6,981 images LondonEye 7,047 images TateModern 4,813 images SanMarco 7,792 images TimeSquare 6,426 images

  22. Experiments • Experiment 1: test how well similarity learning works, measured by mAP scores of ranking other images in the set. • 50 test images from each dataset Test images don’t appear in training image pairs

  23. Experiments Experiment 1: test how well similarity learning works, measured by mAP scores of ranking other images in the set. Example: SanMarco dataset

  24. Experiments Experiment 1: test how well similarity learning works, measured by mAP scores of ranking other images in the set. Example: SanMarco dataset

  25. Experiments Experiment 1: test how well similarity learning works, measured by mAP scores of ranking other images in the set. Example: SanMarco dataset

  26. Experiments Experiment 1: test how well similarity learning works, measured by mAP scores of ranking other images in the set. Example: SanMarco dataset

  27. Experiments Experiment 1: test how well similarity learning works, measured by mAP scores of ranking other images in the set. Example: SanMarco dataset

  28. Experiments • Experiment 1: test how well similarity learning works, measured by mAP scores of ranking other images in the set. • ~ 50 test images for each dataset • Test images don’t appear in training image pairs Oxford5K and Paris each encompass several disparate landmarks, they require more training data, hence modified regularization is essential

  29. Experiments • Experiment 2: Test how much improvement in efficiency in matching images. • Efficiency measured by match success rate: percentage of the image pairs we try to match that turn out to be true matches match success rate(%) iteration number

  30. Experiments • Experiment 2: system evaluation. Test how much improvement in efficiency in matching images. • Efficiency is measured by match success rate: percentage of the image pairs we try to match that turn out to be true matches

  31. Experiments • Number of true matches found as a function of time

  32. Experiments • Number of true matches found as a function of time

  33. Conclusions • Even with small amounts of training data, our approach can predict matching and non-matching image pairs significantly better than tf-idf and co-ocset methods • Overall matching efficiency improved by more than a factor of two • Positive examples are quite specific to different datasets; negative examples could be shared across datasets

  34. Limitations • Good classification for canonical images in a dataset, but worse results for rarer ones (due to uneven amounts of training data for different images)

  35. Thank you!Questions?http://www.cs.cornell.edu/projects/matchlearn/

More Related