Large-Scale Content-Based Image Retrieval

Large-Scale Content-Based Image Retrieval Project Presentation CMPT 880: Large Scale Multimedia Systems and Cloud Computing Under supervision of Dr. Mohamed Hefeeda By: Ahmed Abdelsadek (aabdelsa@sfu.ca)

Outlines • Introduction • Project Scope • Work Flow • Image Features • Indexing and Retrieval • Matching • Evaluation • Conclusion

Introduction • Current image search engines rely heavily on text to retrieve images • User provides keywords, and images having that keyword in the filename or in nearby html are candidates for retrieval. • In this project we are willing to try content-based retrieval techniques where the query is an image.

Project Scope • Similarity using local features. • Extracting features from the reference images. • Index these features in efficient data structure in a scalable large scale environment • Process query images. • Search and Match. • This project is NOT • Recognition, Classification, Categorization

Work Flow

Image Features • Using SIFT features (Scale-invariant feature transform). • A SIFT feature is a selected image region (also called keypoint) with an associated descriptor. • A SIFT descriptor is a histogram of the image gradients surrounding a keypoint. • Using PCA for Dimension Reduction

KD-Tree • Using KD-Trees • Each tree level represent a dimension of a feature • Searching the index for the K-nearest neighbours

Logical View

Physical View

Matching • For each query we extract the features and then search the index for the K-NN features. • For each query feature, each neighbouring feature of it votes to certain image with a score of its rank. • The maximum 10 images for the voting array are reported as the most similar images.

Evaluation • Core KNN • Experiments on local machine. • Our results vs brute force • Image retrieval • CalTech, and TRICVID datasets • On amazon AWS cloud. • We 8 machines. • Dual core • 4 GB ram

Precision of KNN

Scanned Bins Size

Affect of Data Size

Image Recall @ K

First Correct @ K

Implementation Details • The system is implemented in Java • We use Hadoop 1.0.3 • We run cloud experiments on AWS services • S3 • EMR • We use some open source libraries • For images preprocessing we use : FFMPEG • For extracting SIFT features we use : VLFeat

Conclusion • We implement a full pipeline for image retrieval problem. • The framework can easily support different types of features, different indexing methods. • We show how we can build a big cloud system from small components.

Conclusion • Intersection with my research • Contributions • Feature Selection and Extraction • Implement Dimension Reduction • Design and Implement Map/Reduce Index • Implement Image Matching and Ranking

Questions ?

Thank you !

Large-Scale Content-Based Image Retrieval

Large-Scale Content-Based Image Retrieval

Presentation Transcript

Content-Based Image Retrieval

Large-scale Satellite Image Browsing using Automatic Semantic Categorization and Content-based Retrieval

Content-based Image Retrieval

Content-Based Image Retrieval

WISE: Large Scale Content-Based Web Image Search

Content-Based Image Retrieval

Content-based Image Retrieval

Private Content Based Image Retrieval

Content-Based Image Retrieval (CBIR)

Bayesian Content-Based Image Retrieval

Content Based Image Retrieval

Content-based Image Retrieval (CBIR)

Content Based Image Retrieval

Content-based Image Retrieval

Content Based Image Retrieval

Content-Based Image Retrieval

Content-Based Image Retrieval

Content-based Image Retrieval (CBIR)

Scaling Content Based Image Retrieval Systems

Content Based Image Retrieval

Content Based Image Retrieval