260 likes | 382 Vues
This research presents a novel Contextual Object Retrieval (COR) model aimed at improving visual query retrieval accuracy. Current visual search techniques often struggle with user inaccuracy and image quality issues. The COR model integrates contextual data to refine image search processes by estimating search intent scores through spatial and appearance propagation methods. Experimental evaluations with three datasets (Oxford5K, ImageNet, and Web1M) demonstrate significant performance improvements over existing methods. The findings highlight the importance of incorporating context, providing a pathway for future advancements in multimedia retrieval.
E N D
Object Retrieval Using Visual Query Context Linjun Yang Bo Geng Yang Cai Alan Hanjalic Xian-Sheng Hua Presented By: Shimon Berger
What is a Visual Query? • TinEye • Google Image Search • Google Goggles
Current Shortcomings • Bounding box • Complex shapes • User inaccuracy • Issues with the image itself • Too small • Lacks texture
How Can We Improve a Visual Query? Objects in real-life aren’t bound by a box
Proposal • Introduce a contextual object retrieval (COR) model • Evaluate experimentally using 3 image datasets • Demonstrate the benefit of introducing contextual data into the query
Existing Methods • Relevance feedback • “Bag of visual words” • Scale-invariant feature transform (SIFT) • Cosine retrieval model • Language modeling
Proposed COR Model • Based on the Kullbak-Leibler retrieval model • Detect interest points • Extract SIFT descriptors • Convert into visual words • Match words to documents in a database • Uses Jelinek-Mercer smoothing method • Captures important patterns, while removing noise
COR Model • Begins with contrast-based saliency detection • Produces saliency score • Uses as a control variable • Estimate search intent score for each visual word • Indicates probability of a given visual word to reflect user’s search intent
COR Search Intent Score • Standard LM approach uses binary search intent score • Two proposed algorithms to compute SI from bounding box with context: • Based on pixel distance from bounding box (spatial propagation) • Based on color coherence of the pixels (appearance propagation)
Spatial Propagation (CORa) • Bounding box is usually rough and inaccurate • Lack of user effort • Limiting rectangular shape • Use smoothed approximation of bounding box • Dual-sigmoid function • Uses as a control variable
Appearance Propagation (CORm) • Assign high scores to object of interest, normally in foreground • Assign low scores to background objects, or objects of no interest • Similar to image matting • Separate foreground and background using alpha values • Separate relevant objects from irrelevant in bounding box
Appearance Propagation (CORm) Three step approach: • Estimate foreground and background models guided by bounding box • GrabCut algorithm • Use models to select foreground and background pixels • Search intent score estimated based on pixel information • Use pseudo-foreground and -background pixels to account for spatial smoothness • Top 10% of foreground pixels from inside box and top 20% of background pixels from outside box
CORmIn Experiments • CORmis broken down into 2 variations: • CORg • Only uses GrabCut algorithm, not all 3 steps • CORw • Uses alpha values based on weighted foregroundprobability
Experiments • Experiments performed using 3 image datasets: • Oxford5K • Oxford5K+ImageNet500K • Web1M • # 1, 2 use 11 landmarks (55 total images) as queries • # 3 adds an additional 45 images • Randomly selected • Various categories
Experiments • COR models compared to 2 baseline retrieval models: • Cosine • General language modeling (context-unaware) • Baseline models only use visual words from inside bounding box • All models evaluated in terms of average precision (AP) • AP over all queries are averaged to obtain mean average precision (MAP)
AP for different landmarks on Oxford5K+ImageNet500K dataset.
Web1M Dataset Best performance enhancement on landmarks:
Control Parameters • is the control for saliency • is the control for the reliability of the bounding box
Future Work • Context-aware multimedia retrieval • Using the contextual information shown here • Text surrounding query image • User logs and history