220 likes | 235 Vues
Similarity searching in image retrieval and annotation. Petra Budíková. Outline. Motivation Image search applications General image retrieval Text-based approach Content-based approach Challenges and open problems Multi-modal image retrieval Comparative study of approaches
E N D
Similarity searchinginimage retrievalandannotation Petra Budíková
Outline • Motivation • Image search applications • General image retrieval • Text-based approach • Content-based approach • Challenges and open problems • Multi-modal image retrieval • Comparative study of approaches • Our contributions • Current results, research directions • Automatic image annotation • Naive solution • Demo • Better solution (work in progress) stamp
Motivation • Explosion of digital data • Size - world’s digital data production: • 5 billion GB (2003) vs. 1 800 billion GB (2011) • Data is growing by a factor of 10 every five years. • Diversity • Availability of technologies => multimedia data • Images • Personal images • Flickr: 4 billion (2009), 6 billion (2011) • Facebook: 60 billion (2010), 140 billion (2011) • Scientific data • Surveilance data • …
Motivation II • Applications of image searching • Collection browsing • Collection organization • Targeted search • Data annotation and categorization • Authentization • … Cornflower Photo from my trip to highlands purple • blue • color • plant • beauty • nature • garden • petals • hydrangea • weed Iris in the botanical garden Unknown violet flower Get maximum information about this: General image collections Large-scale searching What is this? Summer holiday 2011 1552 photos Checking… OK
General image retrieval • Basic approaches: • Attribute-basedsearching • Text-basedsearching • Content-basedsearching • Attribute-basedsearching • Size, type, category, price, location, … • Relationaldatabases • Text-basedsearching • Image title, text ofsurrounding web page, Flickrtags, … • Mature text retrievaltechnologies • Basic assumption: additionalmetadata are available • Humanparticipationmostlyneeded • Not realisticfor many applications
General image retrieval II • Content-basedsearching • Q uery by example • Similarity measure (distance function) • Optimal unknown • Subjective, context-dependent • Should reflect semantics as well as visual features • State-of-the-art representations of image • Reflect low-level visual features • Global image descriptors: MPEG7 colors, shapes • Local image descriptors: SIFT, SURF • Semantic gap problem • In general, it is very difficult to extract semantics • Possible only in specialized applications, e.g. face search • The more sophisticated representation of image, the more costly evaluation of distances
General image retrieval III • Summary Observations: • Simple image descriptors -> semantic gap, not distinctive enough • Complex image descriptors -> extraction and evaluation not feasible • A single ideal descriptor does not exist • Current direction in image retrieval: multimodal searching • More similarity measures combined in efficient way
Multi-modal searching • Modalities: projections of data into search spaces • Global visual descriptors, local descriptors, text, category, … • Typical combinations • Text + local visual descriptor (Google, Bing) • Text + global visual descriptor (MUFIN) • Different global visual descriptors (MUFIN) • Visual descriptors + GPS • … • Advantages of multi-modal searching • More distinctive than single modality • Simple text search vs. Google text search with Page rank • Allows flexible balancing of modalities • Better approximation of human understanding of similarity • Allows efficient implementation • Parallel processing of modalities • Iterative filtering of candidates
Multi-modal searching II • Challenges • Selection of suitable modalities • Availability • Suitability for given dataset and application • Balancing of importance of individual modalities • Automatic • User-defined • Cross-modality information mining • Automatic • User-assisted • Efficient implementation of multi-modal retrieval
Multi-modal searching IV • Our focus • Let us suppose two modalities – text and global visual features • Frequently used • Available in web search applications • Only consider two-phase searching • Basic search over whole database • Postprocessing of basic search results • Categorize possible solutions • Implement & evaluate • Large-scale data processing • Analyze results
Text-and-visual basic search • Single modality basic search • Text: Lucene search engine • Visual: MESSIF content-based retrieval • Multi-modal basic search Query Candidate objects Query Dataset Result Results postprocessing Basic search
Postprocessing • Types of ranking functions: • Orthogonal modality ranking • Rank by modality other than the one(s) used for basic search • Fusion ranking • Merge multiple results of basic search • Differs from late fusion in the size of the merged sets • Pseudo-RF ranking • Some additional knowledge about query object or similarity function is mined from the results of basic search • Interactive ranking • User provides additional information • Not considered in experiments Query Candidate objects Query Dataset Result Results postprocessing Basic search
Evaluation • Experiments • 6 basic search methods x 7 ranking methods x parameters • About 90 solutions for two-modal search :) • 100 queryobjects • Top-30 queryforeachmethodandqueryobject • 2 datasets • Profimedia: 20M high-qualityimageswithrichandpreciseannotations • Flickr: 20M imageswith user descriptions • Humanevaluationofresults relevance • Highlyrelevant / partiallyrelevant / irrelevant two coins smiling face zebra cornfield handwriting
Preliminary results • Profimediadatasetresultsonly • Bestmethod: Text-basedsearch + visualranking • Googlesolution • Betterthan more complexfusionsolutions • Collection with high-quality text • Semantics very important in queries • Ranking adds about 20% relevance • Text search vs. text search + visual ranking • Content-based search vs. content-based + text ranking
Preliminary results II [NDCG] • Limitations of text-based approaches • Not enough relevant images with relevant keywords • Too broad semantical concept • Visual component crucial [k] Query text: bird
Multi-modal search – future work • Text-and-visual search • Complete analysis of results • Determine conditions which influence usability of individual methods • Dataset properties • Query properties • Automatic recommendation of query processing • Multi-modal search in general • Combination of more than two modalities
Annotation • Task • For a given image, retrieve relevant text information • Easier: relevant keywords • More difficult: relevant text (Wiki page, …) • Applications • Recommendation of tags in social networks • Classification • Method • Only image available – search by visual features theonly possibility • Exploit dataset of images with textual information • Obtain a set of results, what can we do with these? • Simple solution: analyze keywords related to images in similarity search result, return the most frequent ones • Advanced solution: analyze relationships between keywords
Annotation – simple solution • MUFIN Image Annotation plugin for Firefox Cornflower Photo from my trip to highlands purple • blue • color • plant • beauty • nature • garden • petals • hydrangea • weed Iris in the botanical garden Unknown violet flower
Annotation – simple solution II • Limitations • Relevance of results found by content-based retrieval • Semantic gap • Quality of source data • Spelling mistakes, different languages, names, stopwords, … • Natural language features • Synonyms • Hypernyms, homonyms • Noun vs. verb • … • Possible solutions • Consistence checking over the results • Source text cleaning • Advanced text processing
Annotation – advanced solution • Employ knowledge-base to learn about semantics • WordNet: lexical database of English • Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. • Synsets are interlinked by means of conceptual-semantic and lexical relations • Dataset preprocessing • Determine the correct synsets for keywords in the dataset • Analysis of keywords related to the same image • The correct synsets should be “near” in the WordNet relationships graph • Annotation process (work in progress) • Retrieve similar objects • Analyze relationships between synsets • Synsets found: beagle, dog, terrier -> there’s a dog in the image
For more information… … visit mufin.fi.muni.cz http://mufin.fi.muni.cz/profimedia collection browsing and targeted search in 20M image collection http://mufin.fi.muni.cz/annotation info about annotation, demo, plugin download stamp