1 / 28

Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach

Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach. Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan. Wen-Cheng Lin Department of Medical Informatics Tzu Chi University

elysia
Télécharger la présentation

Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining Text and Image Queries at ImageCLEF2005:A Corpus-Based Relevance-Feedback Approach Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan Wen-Cheng Lin Department of Medical Informatics Tzu Chi University Hualien, Taiwan Yih-Cheng Chang Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan ImageCLEF 2005

  2. Why Combining Text and Image Queries in Cross Language Image Retrieval ? • Text-based image retrieval • Translation errors in cross language image retrieval • Annotation errors in automatic annotation • Easy to catch semantic meanings • Easy to construct textual query • Content-based image retrieval (CBIR) • Semantic meanings are hard to be represented • Have to find/draw example images • Avoid translation in cross-language image retrieval • Annotation is not necessary

  3. How to Combine Text and Image Features in Cross Language Image Retrieval ? • Parallel approachConducting text- and content-based retrieval separately and merging the retrieval results • Pipeline approachUsing textual or visual information to perform initial retrieval, and then employing the other feature to filter out the irrelevant images • Transformation-based approachMining the relations between images and text, and employing the mined relations to transform textual information into visual one, and vice versa

  4. Approach at ImageCLEF 2004 • Automatically transform textual queries into visual representations • Mine the relationships between text and images • Divide an image into several smaller parts • Link the words in caption to the corresponding parts • Analogous to word alignment in a sentence aligned parallel corpus • Build a transmedia dictionary • Transform a textual query into visual one using the transmedia dictionary

  5. Training collection Target collection Source language textual query Images Image captions Images Image captions Language resources Text-Image correlation learning Query transformation Query translation Visual query Target language textual query Visual index Textual index Transmedia dictionary Content-based image retrieval Text-based image retrieval Result merging Retrieved images System at ImageCLEF2004

  6. Learning Correlation • Mare and foal in field, slopes of Clatto Hill, Fife segmentation hill mare foal field slope B01 B02 B03 B04

  7. Text-Based Image Retrievalat ImageCLEF2004 • Using similarity-based backward transliteration improves performance 69.71%

  8. Cross-Language Experimentsat ImageCLEF2004 +0.46%: Insignificant Performance Increase + poor

  9. Analyses of These Approaches • Parallel approach and Pipeline approach • Simple and useful • Not employ the relations betweenvisual and textual features • Transformation-based approach • Textual and visual queries can be translated to each other using relations between visual and textual features • Hard to learn all relations between all visual and textual features • Degree of ambiguity of the relations is usually high

  10. Our Approach at ImageCLEF2005:A Corpus-Based Relevance Feedback Method • A Corpus-Based Relevance Feedback approach • Initiate a content-based retrieval • Treat the retrieved images and their text descriptions as aligned documents • Adopt a corpus-based method to select key terms from text descriptions, and generate a new query.

  11. Fundamental Concepts of a Corpus-Based Relevant Feedback Approach

  12. (Aircraft on the ground) VIPER system

  13. Bilingual Ad hoc Retrieval Task • 28,133 photographs from St. Andrews University Library’s photographic collection • Collection is in English and queries are in different languages • In our experiments, queries are in Chinese • All images are accompanied by a textual description written in English by librarians working at St. Andrews Library • The test set contains 28 topics, and each topic has text description and an example image.

  14. An Example –An image and Its Description

  15. An Example –A topic in Chinese An English Title A Chinese Title

  16. Some Models in Formal Runs

  17. Experiment Results at ImageCLEF2005 + +15.78% +25.96% + +11.01% Performance of EE+EX > CE+EX  EE > EX > CE > Visual run

  18. Lessons Learned • Combining Textual and Visual information can improve performance • Comparing to initial visual retrieval, average precision is increased from 8.29% to 34.25% after feedback cycle.

  19. Example: Aircraft on the Ground ( ) • Text only (monolingual) • Text only (cross-lingual ) Top 2 images in cross-lingual run are non-relevant because of query translation problem : clear ( ), above ( ), floor ( )

  20. Example: Aircraft on the Ground(after integration) • Text (monolingual) + Visual Text+VisualRun is better than monolingualrun because it expands some useful words, e.g., aeroplane, military air base, airfield

  21. ImageCLEF2004 vs. ImageCLEF2005 • Text-based IR (monolingual case) • 0.6304 (2004) vs. 0.3952 (2005) • Topics of this year is a little harder • Text+Image IR (monolingual case) • 0.6591 (2004) vs. 0.5053 (2005) • Text+Image IR (crosslingual case) • 0.4441 (2004) vs. 0.3977 (2005) • 70.45% vs. 100.63%

  22. Automatic Annotation Task • The automatic annotate task in ImageCLEF 2005 can be seen as a classification task, since each image can only be annotated with one word (i.e., a category) • We propose several methods to measure the similarity between a test image and a category, and a test image is classified to the most similar category. • The methods we proposed use the same image features, but different classification approaches.

  23. Image Feature Extraction • Resize images to 256 x 256 pixels • Segment each image into 32 x 32 blocks (each block is 8 x 8 pixels). • Compute the average gray value of each block to construct a vector with 1,024 elements. • The similarity between two images is measured by cosine formula.

  24. Some Models and Experimental Results • NTU-annotate05-1NN Baseline model. It uses 1-NN method to classify each image. • NTU-annotate05-Top2 Computing the similarity between a test image and a category using the top 2 nearest images in each category, and classify the test image to the most similar category. • NTU-annotate05-SC Training data is clustered using k-means algorithm (k=1000). We compute the centroid of each category in each cluster, and classify a test image to the category of the nearest centroid.

  25. Conclusion:Bilingual Ad hoc Retrieval Task • An approach of combining textual and image features is proposed for Chinese-English image retrieval.  a corpus-basedfeedback cycle from CBIR • Compared with the performance of monolingual IR (0.3952), integrating visual and textual queries achieves better performance in CL image retrieval (0.3977). resolve part of translation errors • The integration of visual and textual queries also improves the performance of the monolingual IR from 0.3952 to 0.5053.  provide more information • The improvement is the best among all the groups.  78.2% of the best monolingual text retrieval

  26. Conclusion:Automatic Annotation Task • A feature extraction algorithm is proposed and several classification approaches are explored under the same image features. • The approaches of 1-NN and top-2, which have error rates 21.7%, outperform the centroid-based approach (with error rate 22.5%). • Our method is 9% worse than the group of the best performance (error rate 12.6%), but is better than most of the groups in this task.

  27. Thank You and Comments

More Related