Similarity searching in image retrieval and annotation

Similarity searchinginimage retrievalandannotation Petra Budíková

Outline • Motivation • Image search applications • General image retrieval • Text-based approach • Content-based approach • Challenges and open problems • Multi-modal image retrieval • Comparative study of approaches • Our contributions • Current results, research directions • Automatic image annotation • Naive solution • Demo • Better solution (work in progress) stamp

Motivation • Explosion of digital data • Size - world’s digital data production: • 5 billion GB (2003) vs. 1 800 billion GB (2011) • Data is growing by a factor of 10 every five years. • Diversity • Availability of technologies => multimedia data • Images • Personal images • Flickr: 4 billion (2009), 6 billion (2011) • Facebook: 60 billion (2010), 140 billion (2011) • Scientific data • Surveilance data • …

Motivation II • Applications of image searching • Collection browsing • Collection organization • Targeted search • Data annotation and categorization • Authentization • … Cornflower Photo from my trip to highlands purple • blue • color • plant • beauty • nature • garden • petals • hydrangea • weed Iris in the botanical garden Unknown violet flower Get maximum information about this: General image collections Large-scale searching What is this? Summer holiday 2011 1552 photos Checking… OK

General image retrieval • Basic approaches: • Attribute-basedsearching • Text-basedsearching • Content-basedsearching • Attribute-basedsearching • Size, type, category, price, location, … • Relationaldatabases • Text-basedsearching • Image title, text ofsurrounding web page, Flickrtags, … • Mature text retrievaltechnologies • Basic assumption: additionalmetadata are available • Humanparticipationmostlyneeded • Not realisticfor many applications

General image retrieval II • Content-basedsearching • Q uery by example • Similarity measure (distance function) • Optimal unknown • Subjective, context-dependent • Should reflect semantics as well as visual features • State-of-the-art representations of image • Reflect low-level visual features • Global image descriptors: MPEG7 colors, shapes • Local image descriptors: SIFT, SURF • Semantic gap problem • In general, it is very difficult to extract semantics • Possible only in specialized applications, e.g. face search • The more sophisticated representation of image, the more costly evaluation of distances

General image retrieval III • Summary Observations: • Simple image descriptors -> semantic gap, not distinctive enough • Complex image descriptors -> extraction and evaluation not feasible • A single ideal descriptor does not exist • Current direction in image retrieval: multimodal searching • More similarity measures combined in efficient way

Multi-modal searching • Modalities: projections of data into search spaces • Global visual descriptors, local descriptors, text, category, … • Typical combinations • Text + local visual descriptor (Google, Bing) • Text + global visual descriptor (MUFIN) • Different global visual descriptors (MUFIN) • Visual descriptors + GPS • … • Advantages of multi-modal searching • More distinctive than single modality • Simple text search vs. Google text search with Page rank • Allows flexible balancing of modalities • Better approximation of human understanding of similarity • Allows efficient implementation • Parallel processing of modalities • Iterative filtering of candidates

Multi-modal searching II • Challenges • Selection of suitable modalities • Availability • Suitability for given dataset and application • Balancing of importance of individual modalities • Automatic • User-defined • Cross-modality information mining • Automatic • User-assisted • Efficient implementation of multi-modal retrieval

Multi-modal searching III

Multi-modal searching IV • Our focus • Let us suppose two modalities – text and global visual features • Frequently used • Available in web search applications • Only consider two-phase searching • Basic search over whole database • Postprocessing of basic search results • Categorize possible solutions • Implement & evaluate • Large-scale data processing • Analyze results

Text-and-visual basic search • Single modality basic search • Text: Lucene search engine • Visual: MESSIF content-based retrieval • Multi-modal basic search Query Candidate objects Query Dataset Result Results postprocessing Basic search

Postprocessing • Types of ranking functions: • Orthogonal modality ranking • Rank by modality other than the one(s) used for basic search • Fusion ranking • Merge multiple results of basic search • Differs from late fusion in the size of the merged sets • Pseudo-RF ranking • Some additional knowledge about query object or similarity function is mined from the results of basic search • Interactive ranking • User provides additional information • Not considered in experiments Query Candidate objects Query Dataset Result Results postprocessing Basic search

Evaluation • Experiments • 6 basic search methods x 7 ranking methods x parameters • About 90 solutions for two-modal search :) • 100 queryobjects • Top-30 queryforeachmethodandqueryobject • 2 datasets • Profimedia: 20M high-qualityimageswithrichandpreciseannotations • Flickr: 20M imageswith user descriptions • Humanevaluationofresults relevance • Highlyrelevant / partiallyrelevant / irrelevant two coins smiling face zebra cornfield handwriting

Preliminary results • Profimediadatasetresultsonly • Bestmethod: Text-basedsearch + visualranking • Googlesolution • Betterthan more complexfusionsolutions • Collection with high-quality text • Semantics very important in queries • Ranking adds about 20% relevance • Text search vs. text search + visual ranking • Content-based search vs. content-based + text ranking

Preliminary results II [NDCG] • Limitations of text-based approaches • Not enough relevant images with relevant keywords • Too broad semantical concept • Visual component crucial [k] Query text: bird

Multi-modal search – future work • Text-and-visual search • Complete analysis of results • Determine conditions which influence usability of individual methods • Dataset properties • Query properties • Automatic recommendation of query processing • Multi-modal search in general • Combination of more than two modalities

Annotation • Task • For a given image, retrieve relevant text information • Easier: relevant keywords • More difficult: relevant text (Wiki page, …) • Applications • Recommendation of tags in social networks • Classification • Method • Only image available – search by visual features theonly possibility • Exploit dataset of images with textual information • Obtain a set of results, what can we do with these? • Simple solution: analyze keywords related to images in similarity search result, return the most frequent ones • Advanced solution: analyze relationships between keywords

Annotation – simple solution • MUFIN Image Annotation plugin for Firefox Cornflower Photo from my trip to highlands purple • blue • color • plant • beauty • nature • garden • petals • hydrangea • weed Iris in the botanical garden Unknown violet flower

Annotation – simple solution II • Limitations • Relevance of results found by content-based retrieval • Semantic gap • Quality of source data • Spelling mistakes, different languages, names, stopwords, … • Natural language features • Synonyms • Hypernyms, homonyms • Noun vs. verb • … • Possible solutions • Consistence checking over the results • Source text cleaning • Advanced text processing

Annotation – advanced solution • Employ knowledge-base to learn about semantics • WordNet: lexical database of English • Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. • Synsets are interlinked by means of conceptual-semantic and lexical relations • Dataset preprocessing • Determine the correct synsets for keywords in the dataset • Analysis of keywords related to the same image • The correct synsets should be “near” in the WordNet relationships graph • Annotation process (work in progress) • Retrieve similar objects • Analyze relationships between synsets • Synsets found: beagle, dog, terrier -> there’s a dog in the image

For more information… … visit mufin.fi.muni.cz http://mufin.fi.muni.cz/profimedia collection browsing and targeted search in 20M image collection http://mufin.fi.muni.cz/annotation info about annotation, demo, plugin download stamp

Similarity searching in image retrieval and annotation

Similarity searching in image retrieval and annotation

Presentation Transcript

Image Similarity

Image Retrieval

Sequence Similarity Searching

BLAST Similarity Searching

Automatic Image Annotation and Retrieval using Cross-Media Relevance Models

“Semantic” Image Annotation and Retrieval

Seminar on Image Similarity and Image Retrieval

Character retrieval and annotation in multimedia

Image Retrieval

COmbined Docking And Similarity Searching

Neighborhood sequences for comparing similarity vectors in image retrieval

Image Similarity

Feature Sets Based Similarity Measures for Image Retrieval

Database Similarity Searching

Image Retrieval and Annotation via a Stochastic Modeling Approach

Image Similarity

Image Similarity

Adaptive tree similarity learning for image retrieval

Sequence Similarity Searching

Image Retrieval