Flickr Distance

ACM Multimedia 2008 Flickr Distance Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, Shipeng Li Microsoft Research Asia University of Science and Technology of China October 28, 2008

Multimedia Information Retrieval Indexing Ranking Clustering …… Recommendation Annotation

Indexing Ranking Annotation Image Similarity/ Distance Multimedia Information Retrieval Concept Similarity/ Distance Clustering Recommendation ……

Image Similarity/Distance Image Similarity/ Distance Concept Similarity/ Distance

Image Similarity/Distance Concept Similarity/Distance Numerous efforts have been made. Concept Similarity/ Distance

Image Similarity/Distance Numerous efforts have been made. Concept Similarity/Distance Olympic Cat Sports Paw Tiger More and more used, but not well studied.

WordNet Distance • WordNet • 150,000 words • WordNet Distance • Quite a few methods to get it in WordNet • Basic idea is to measure the length of the path between two words • Pros and Cons • Pros: • Cons: Built by human experts, so close to human perception Coverage is limited and difficult to extend

Google Distance • Normalized Google Distance (NGD) • Reflects the concurrency of two words in Web documents • Defined as • Pros and Cons • Pros: • Cons: Easy to get and huge coverage Only reflects concurrency in textual documents. Not really concept distance (semantic relationship)

Tag Concurrence Distance • Image Tag Concurrence Distance (Qi, Hua, et al. ACMMM07) • Reflects the frequency of two tags occur in the same images • Based on the same idea of NGD • Mostly is sparse (> 95% are zero in the similarity matrix) • Pros and Cons • Pros: • Cons: Images are taken into account Tags are sparse so visual concurrency is not well reflected Training data is difficult to get similarity matrix: 500 tags similarity matrix: 50 tags

Different Concept Relationships table tennis ping-pong airplane airport horse donkey — car wheel — — — Concurrency Synonymy Visually Similar Meronymy exist at the same scene/place similar things or things of same type different words but the same meaning part and the whole

Image tag concurrence distance implicitly uses image information, but tags are too sparse Google distance’s coverage is very high, but it is for text domain Mine from image tags Concept Distance WordNet distance is good, but coverage is too low Mine from text documents Mine from ontology

Can we mine concept distance from image content?

Some Facts • Semantic concept distance is based on human’s cognition • 80% of human cognition comes from visual information • There are around 2.8 billion photos on Flickr (by Sep 08) • In average each Flickr image has around 8 tags To mine concept distance from a large tagged image collection based on image content bear, fur, grass, tree polar bear, water, sea polar bear, fighting, usa

Overview of Flickr Distance Concept A: Airplane Concept B: Airport Concept Model A Concept Model B Flickr Distance (A, B)

Flickr Distance is able to cover the four different semantic relationships Synonymy, Visually Similar, Meronymy, and Concurrency

What We Need • R1: A Good Image Collection • Large • High coverage, especially on daily life • With tags

Discriminative Generative What We Need • R2: A Good Concept Representation or Model • Based on image content • Can cover wider concept relationships • Can handle large-concept set Concept Models SVM, Boosting, … Global Feature Local Feature w/o Spatial Relation Bag-of-Words (pLSA, LDA), … 2D HMM, MRF, … w/ Spatial Relation

Discriminative Generative What We Need • VLM – Visual Language Model • Spatial-relation sensitive • Efficient • Can handle object variations Concept Models SVM, Boosting, … Global Feature Local Feature w/o Spatial Relation Bag-of-Words, … 2D HMM, MRF, … w/ Spatial Relation

Statistical Language Model I am talking about statistical language model.

Visual Language Model (VLM) Visual Word Generation Hashing  Visual Word Patch  Gradient Texture Histogram Image  Patch

Performance of VLM • Comparison on Image Categorization • Caltech 8 categories / 5097 images

Latent-Topic VLM (1) • Why Latent-Topic • Latent-Topic VLM • Visual variations of concept are taken as latent topics

Latent-Topic VLM (2) • Latent-Topic VLM Training • Solved by EM algorithm, • The objective function is to maximize the joint distribution of concept and its visual word arrangement Aw Estimate the posteriors of the hidden topics Maximize the likelihood of visual arrangement

Performance of LT-VLM • Comparison on Image Categorization • Caltech 8 categories / 5097 images

Flickr Distance • Kullback – Leibler (KL) divergence • Good, but not symmetric • Jensen –Shannon (JS) divergence • Better, as it is symmetric • And, square root of JS divergence is a metric, so is Flickr Distance topic distance topic distance concept distance

Procedure of Flickr Distance Concept A: Airplane Concept B: Airport Tag search in Flickr LT-VLM Concept Model A Concept Model B Jensen-Shannon Divergence Flickr Distance (A, B)

Experiments • Evaluation • Objective evaluation • Subjective evaluation • Applications • Concept clustering • Image annotation • Tag recommendation

Experiments - Configurations • Images • 6,400,000 from Flickr • Concepts • 130,000,000 different tags • 10,000,000 filtered tags • 1,000 randomly-selected tags • Comparison • Normalized Google Distance (NGD) • Tag Concurrence Distance (TCD) • Flickr Distance (FD)

Eva1: Subjective Evaluation • Ground-Truth • 12 persons are asked to score semantic correlation of each concept pair • Average scores are taken as ground-truth • Evaluate Accuracy of “Relative Distance Pairs” • Step 1: Find all distance pairs D(a,b) and D(c,d) • Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth

Eva2: Objective Evaluation • Ground-Truth • WordNet Distance • Only 497 concepts (overlap of WordNet and the 1000 concepts) • Evaluate Accuracy of “Relative Distance Pairs” • Step 1: Find all distance pairs D(a,b) and D(c,d) • Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth

App1: Concept Clustering • Concept Clustering • 23 concepts; • 3 groups – (1) outer space, (2) animal and (3) sports

App2: Image Annotation • Based on an approach using concept relation • Dual Cross-Media Relevance Model (DCMRM, J. Liu et al. ACMMM 2007) • On 79 concepts / 79,000 images The number of correctly annotated keywords at the first N words

App3: Tag Recommendation • To Improve Tagging Quality • Eliminating tag incompletion, noises, and ambiguity • 500 images / 10 recommended tags per image Precision @ 10

Discussion • Why VLM divergence can estimate concept distance? • Why FD works well even tags are not complete? room patterns VLM: distribution of trigrams computer patterns other patterns Computer room patterns TV patterns other patterns TV room patterns screen patterns other patterns Office

If we find similar patterns in the images associated with different concepts, the corresponding concept relationships can be discovered. Computer Office

Summary Flickr Distance • A novel approach to discover semantic relationships from image content • based on real-life images from the Web • based on collective intelligence from grassroots • A distance more consistent with human’s perception • A measurement more effective in many applications

Future Work Flickr Distance as a Service.

Thank You

Backup

TagNet • TagNet – Visual Concept Net • Can be used in many applications • Knowledge representation • Concept learning • Multimedia retrieval • ...

TagNet • Visualization • The bigger the distance, the longer the edge • Using a tool called NetDraw provided by International Network for Social Network Analysis

Outline • Motivation • Overview • Visual Language Model • Flickr Distance Calculation • Evaluations and Applications

Semantic Relationship Is Important • Many efforts on using semantic relationships • GJ Qi et al. Correlative Multi-Label Video Annotation. ACM MM 2007. • R. Datta et al. Image Retrieval: Ideas, Influences and the Trends of the New Age. ACM Computing Surveys, 2008. • L. Leslie et al. Annotation of Paintings with High-Level Semantic Concepts Using Transductive Inference and Ontology-based Concept Disambiguation. ACM MM 2007. • J. Yu et al. Semantic Subspace Projection and Its Application in Image Retrieval. IEEE T CSVT 2008. • Applications of semantic relationships • Natural language processing • Object detection • Concept detection • Multimedia retrieval

Discussion • Why VLM divergence can estimate concept distance? • Why FD works well even tags are not complete? Flickr Distance is able to cover the four different semantic relationships Synonymy, Visually Similar, Meronymy, and Concurrency room patterns VLM: distribution of trigrams computer patterns other patterns Computer room patterns TV patterns other patterns TV room patterns screen patterns other patterns Office

Text vs. Image • Word • Grammar • 1-dim dependence • Statistical Language Model • Visual word • Visual grammar • 2-dim dependence • Visual Language Model

Visual Word Generation • Typical methods • SIFT + Clustering/PCA • Our method • Patch + Texture Direction Histogram + Hashing • Efficient, low-dimension, and rotation-Invariant • Only need 1/20 computation of SIFT feature Hashing  Visual Word Patch  Gradient Texture Histogram Image  Patch

Performance of VLM • Comparison on Image Categorization • Caltech 8 categories / 5097 images (L. Wu, et al. MIR 2007/T-MM 2008)

Flickr Distance

Flickr Distance

Presentation Transcript

Flickr

Jenser (Flickr)

Flickr Tags Network

Flickr

Flickr 543

Flickr : Web Services

Flickr

Flickr

flickr

Flickr Tag Analysis

Flickr/jahdakinebrah