1 / 53

Flickr Distance

ACM Multimedia 2008. Flickr Distance. Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, Shipeng Li Microsoft Research Asia University of Science and Technology of China October 28, 2008. Multimedia Information Retrieval. Indexing. Ranking. Clustering. ……. Recommendation. Annotation.

azia
Télécharger la présentation

Flickr Distance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ACM Multimedia 2008 Flickr Distance Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, Shipeng Li Microsoft Research Asia University of Science and Technology of China October 28, 2008

  2. Multimedia Information Retrieval Indexing Ranking Clustering …… Recommendation Annotation

  3. Indexing Ranking Annotation Image Similarity/ Distance Multimedia Information Retrieval Concept Similarity/ Distance Clustering Recommendation ……

  4. Image Similarity/Distance Image Similarity/ Distance Concept Similarity/ Distance

  5. Image Similarity/Distance Concept Similarity/Distance Numerous efforts have been made. Concept Similarity/ Distance

  6. Image Similarity/Distance Numerous efforts have been made. Concept Similarity/Distance Olympic Cat Sports Paw Tiger More and more used, but not well studied.

  7. WordNet Distance • WordNet • 150,000 words • WordNet Distance • Quite a few methods to get it in WordNet • Basic idea is to measure the length of the path between two words • Pros and Cons • Pros: • Cons: Built by human experts, so close to human perception Coverage is limited and difficult to extend

  8. Google Distance • Normalized Google Distance (NGD) • Reflects the concurrency of two words in Web documents • Defined as • Pros and Cons • Pros: • Cons: Easy to get and huge coverage Only reflects concurrency in textual documents. Not really concept distance (semantic relationship)

  9. Tag Concurrence Distance • Image Tag Concurrence Distance (Qi, Hua, et al. ACMMM07) • Reflects the frequency of two tags occur in the same images • Based on the same idea of NGD • Mostly is sparse (> 95% are zero in the similarity matrix) • Pros and Cons • Pros: • Cons: Images are taken into account Tags are sparse so visual concurrency is not well reflected Training data is difficult to get similarity matrix: 500 tags similarity matrix: 50 tags

  10. Different Concept Relationships table tennis ping-pong airplane airport horse donkey — car wheel — — — Concurrency Synonymy Visually Similar Meronymy exist at the same scene/place similar things or things of same type different words but the same meaning part and the whole

  11. Image tag concurrence distance implicitly uses image information, but tags are too sparse Google distance’s coverage is very high, but it is for text domain Mine from image tags Concept Distance WordNet distance is good, but coverage is too low Mine from text documents Mine from ontology

  12. Can we mine concept distance from image content?

  13. Some Facts • Semantic concept distance is based on human’s cognition • 80% of human cognition comes from visual information • There are around 2.8 billion photos on Flickr (by Sep 08) • In average each Flickr image has around 8 tags To mine concept distance from a large tagged image collection based on image content bear, fur, grass, tree polar bear, water, sea polar bear, fighting, usa

  14. Overview of Flickr Distance Concept A: Airplane Concept B: Airport Concept Model A Concept Model B Flickr Distance (A, B)

  15. Flickr Distance is able to cover the four different semantic relationships Synonymy, Visually Similar, Meronymy, and Concurrency

  16. What We Need • R1: A Good Image Collection • Large • High coverage, especially on daily life • With tags

  17. Discriminative Generative What We Need • R2: A Good Concept Representation or Model • Based on image content • Can cover wider concept relationships • Can handle large-concept set Concept Models SVM, Boosting, … Global Feature Local Feature w/o Spatial Relation Bag-of-Words (pLSA, LDA), … 2D HMM, MRF, … w/ Spatial Relation

  18. Discriminative Generative What We Need • VLM – Visual Language Model • Spatial-relation sensitive • Efficient • Can handle object variations Concept Models SVM, Boosting, … Global Feature Local Feature w/o Spatial Relation Bag-of-Words, … 2D HMM, MRF, … w/ Spatial Relation

  19. Statistical Language Model I am talking about statistical language model.

  20. Visual Language Model (VLM) Visual Word Generation Hashing  Visual Word Patch  Gradient Texture Histogram Image  Patch

  21. Performance of VLM • Comparison on Image Categorization • Caltech 8 categories / 5097 images

  22. Latent-Topic VLM (1) • Why Latent-Topic • Latent-Topic VLM • Visual variations of concept are taken as latent topics

  23. Latent-Topic VLM (2) • Latent-Topic VLM Training • Solved by EM algorithm, • The objective function is to maximize the joint distribution of concept and its visual word arrangement Aw Estimate the posteriors of the hidden topics Maximize the likelihood of visual arrangement

  24. Performance of LT-VLM • Comparison on Image Categorization • Caltech 8 categories / 5097 images

  25. Flickr Distance • Kullback – Leibler (KL) divergence • Good, but not symmetric • Jensen –Shannon (JS) divergence • Better, as it is symmetric • And, square root of JS divergence is a metric, so is Flickr Distance topic distance topic distance concept distance

  26. Procedure of Flickr Distance Concept A: Airplane Concept B: Airport Tag search in Flickr LT-VLM Concept Model A Concept Model B Jensen-Shannon Divergence Flickr Distance (A, B)

  27. Experiments • Evaluation • Objective evaluation • Subjective evaluation • Applications • Concept clustering • Image annotation • Tag recommendation

  28. Experiments - Configurations • Images • 6,400,000 from Flickr • Concepts • 130,000,000 different tags • 10,000,000 filtered tags • 1,000 randomly-selected tags • Comparison • Normalized Google Distance (NGD) • Tag Concurrence Distance (TCD) • Flickr Distance (FD)

  29. Eva1: Subjective Evaluation • Ground-Truth • 12 persons are asked to score semantic correlation of each concept pair • Average scores are taken as ground-truth • Evaluate Accuracy of “Relative Distance Pairs” • Step 1: Find all distance pairs D(a,b) and D(c,d) • Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth

  30. Eva2: Objective Evaluation • Ground-Truth • WordNet Distance • Only 497 concepts (overlap of WordNet and the 1000 concepts) • Evaluate Accuracy of “Relative Distance Pairs” • Step 1: Find all distance pairs D(a,b) and D(c,d) • Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth

  31. App1: Concept Clustering • Concept Clustering • 23 concepts; • 3 groups – (1) outer space, (2) animal and (3) sports

  32. App2: Image Annotation • Based on an approach using concept relation • Dual Cross-Media Relevance Model (DCMRM, J. Liu et al. ACMMM 2007) • On 79 concepts / 79,000 images The number of correctly annotated keywords at the first N words

  33. App3: Tag Recommendation • To Improve Tagging Quality • Eliminating tag incompletion, noises, and ambiguity • 500 images / 10 recommended tags per image Precision @ 10

  34. Discussion • Why VLM divergence can estimate concept distance? • Why FD works well even tags are not complete? room patterns VLM: distribution of trigrams computer patterns other patterns Computer room patterns TV patterns other patterns TV room patterns screen patterns other patterns Office

  35. If we find similar patterns in the images associated with different concepts, the corresponding concept relationships can be discovered. Computer Office

  36. Summary Flickr Distance • A novel approach to discover semantic relationships from image content • based on real-life images from the Web • based on collective intelligence from grassroots • A distance more consistent with human’s perception • A measurement more effective in many applications

  37. Future Work Flickr Distance as a Service.

  38. Thank You

  39. Backup

  40. TagNet • TagNet – Visual Concept Net • Can be used in many applications • Knowledge representation • Concept learning • Multimedia retrieval • ...

  41. TagNet • Visualization • The bigger the distance, the longer the edge • Using a tool called NetDraw provided by International Network for Social Network Analysis

  42. Outline • Motivation • Overview • Visual Language Model • Flickr Distance Calculation • Evaluations and Applications

  43. Semantic Relationship Is Important • Many efforts on using semantic relationships • GJ Qi et al. Correlative Multi-Label Video Annotation. ACM MM 2007. • R. Datta et al. Image Retrieval: Ideas, Influences and the Trends of the New Age. ACM Computing Surveys, 2008. • L. Leslie et al. Annotation of Paintings with High-Level Semantic Concepts Using Transductive Inference and Ontology-based Concept Disambiguation. ACM MM 2007. • J. Yu et al. Semantic Subspace Projection and Its Application in Image Retrieval. IEEE T CSVT 2008. • Applications of semantic relationships • Natural language processing • Object detection • Concept detection • Multimedia retrieval

  44. Discussion • Why VLM divergence can estimate concept distance? • Why FD works well even tags are not complete? Flickr Distance is able to cover the four different semantic relationships Synonymy, Visually Similar, Meronymy, and Concurrency room patterns VLM: distribution of trigrams computer patterns other patterns Computer room patterns TV patterns other patterns TV room patterns screen patterns other patterns Office

  45. Text vs. Image • Word • Grammar • 1-dim dependence • Statistical Language Model • Visual word • Visual grammar • 2-dim dependence • Visual Language Model

  46. Visual Word Generation • Typical methods • SIFT + Clustering/PCA • Our method • Patch + Texture Direction Histogram + Hashing • Efficient, low-dimension, and rotation-Invariant • Only need 1/20 computation of SIFT feature Hashing  Visual Word Patch  Gradient Texture Histogram Image  Patch

  47. Performance of VLM • Comparison on Image Categorization • Caltech 8 categories / 5097 images (L. Wu, et al. MIR 2007/T-MM 2008)

More Related