220 likes | 384 Vues
Large-Scale Content-Based Image Retrieval. Project Presentation CMPT 880 : Large Scale Multimedia Systems and Cloud Computing. Under supervision of Dr. Mohamed Hefeeda By: Ahmed Abdelsadek (aabdelsa@sfu.ca). Outlines. Introduction Project Scope Work Flow Image Features
E N D
Large-Scale Content-Based Image Retrieval Project Presentation CMPT 880: Large Scale Multimedia Systems and Cloud Computing Under supervision of Dr. Mohamed Hefeeda By: Ahmed Abdelsadek (aabdelsa@sfu.ca)
Outlines • Introduction • Project Scope • Work Flow • Image Features • Indexing and Retrieval • Matching • Evaluation • Conclusion
Introduction • Current image search engines rely heavily on text to retrieve images • User provides keywords, and images having that keyword in the filename or in nearby html are candidates for retrieval. • In this project we are willing to try content-based retrieval techniques where the query is an image.
Project Scope • Similarity using local features. • Extracting features from the reference images. • Index these features in efficient data structure in a scalable large scale environment • Process query images. • Search and Match. • This project is NOT • Recognition, Classification, Categorization
Image Features • Using SIFT features (Scale-invariant feature transform). • A SIFT feature is a selected image region (also called keypoint) with an associated descriptor. • A SIFT descriptor is a histogram of the image gradients surrounding a keypoint. • Using PCA for Dimension Reduction
KD-Tree • Using KD-Trees • Each tree level represent a dimension of a feature • Searching the index for the K-nearest neighbours
Matching • For each query we extract the features and then search the index for the K-NN features. • For each query feature, each neighbouring feature of it votes to certain image with a score of its rank. • The maximum 10 images for the voting array are reported as the most similar images.
Evaluation • Core KNN • Experiments on local machine. • Our results vs brute force • Image retrieval • CalTech, and TRICVID datasets • On amazon AWS cloud. • We 8 machines. • Dual core • 4 GB ram
Implementation Details • The system is implemented in Java • We use Hadoop 1.0.3 • We run cloud experiments on AWS services • S3 • EMR • We use some open source libraries • For images preprocessing we use : FFMPEG • For extracting SIFT features we use : VLFeat
Conclusion • We implement a full pipeline for image retrieval problem. • The framework can easily support different types of features, different indexing methods. • We show how we can build a big cloud system from small components.
Conclusion • Intersection with my research • Contributions • Feature Selection and Extraction • Implement Dimension Reduction • Design and Implement Map/Reduce Index • Implement Image Matching and Ranking