CSM06 Information Retrieval

CSM06 Information Retrieval Lecture 7: Image Retrieval Dr Andrew Salway a.salway@surrey.ac.uk

Recap… • So far we have concentrated on text analysis techniques and indexing-retrieval of written documents • The indexing-retrieval of visual information (image and video data) presents a new set of challenges – especially for understanding the content of images and videos…

Lecture 7: OVERVIEW • Different kinds of metadata for indexing-retrieving images (these also apply to videos) • The “sensory gap” and the “semantic gap”, and why these pose problems for image/video indexing-retrieval • Three approaches to the indexing-retrieval of images: • Manual indexing, e.g. CORBIS, Tate • Content-based Image Retrieval (visual similarity; query-by-example), e.g. QBIC and BlobWorld • Automatic selection of keywords from text related to images, e.g. WebSEEK, Google, AltaVista

Different kinds of images • Photographs: holiday albums, news archives, criminal investigations • Fine art and museum artefacts • Medical images: x-rays, scans • Meteorological / Satellite Images As with written documents, each image in an image collection needs to be indexed before it can be retrieved...

Image Description Exercise Imagine you are the indexer of an image collection… 1) List all the words you can think of that describe the following image, so that it could be retrieved by as many users as possible who might be interested in it. Your words do NOT need to be factually correct, but they should show the range of things that could be said about the image 2) Put your words into groups so that each group of words says the same sort of thing about the image 3) Which words (metadata) do you think a machine could extract from the image automatically?

Words to index the image…

Metadata for Images • “A picture is worth a thousand words…” • The words that can be used to index an image relate to different aspects of it  • We need to label different kinds of metadata for images • to structure how we store / process metadata • some kinds of metadata will require human input than others

Metadata for Images • Del Bimbo (1999): • content-independent; • content-dependent; • content-descriptive. • Shatford (1986): (in effect refines ‘content descriptive’) • pre-iconographic; • iconographic; • iconological.

Metadata for Images (Del Bimbo 1999) • Content-independent: data which is not directly concerned with image content, and could not necessarily be extracted from it, e.g. artist name, date, ownership • Content-dependent: perceptual facts to do with colour, texture, shape; can be automatically (and therefore objectively) extracted from image data • Content-descriptive: entities, actions, relationships between them as well as meanings conveyed by the image; more subjective and much harder to extract automatically

Three levels of visual content • Based on Panofsky (1939); adapted by Shatford (1986) for indexing visual information. In effect refines ‘content descriptive’. • Pre-iconographic: generic who, what, where, when • Iconographic: specific who, what, where, when • Iconological: abstract “aboutness”

The Sensory Gap “The sensory gap is the gap between the object in the world and the information in a (computational) description derived from a recording of that scene” (Smeulders et al 2000)

The Semantic Gap “The semantic gap is the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation” (Smeulders et al 2000)

What visual properties do these images of tomatoes all have in common? Set Reading for Lecture 7

What is this? tomato?, setting sun?, clown’s nose?….

“democracy” SEMANTIC GAP

DISCUSSION • What is the impact of the sensory gap, and the semantic gap, on image retrieval systems?

Three Approaches to Image Indexing-Retrieval • Index by manually attaching keywords to images – query by keywords • Index by automatically extracting visual features from images – query by visual example • Index by automatically extracting keywords from text already connected to images – query by keywords

1. Manual Image Indexing • Rich keyword-based descriptions of image content can be manually annotated • May use a controlled vocabulary and consensus decisions to minimise subjectivity and ambiguity • Cost can be prohibitive

Example Systems • Examples of manually annotated image libraries: http://www.tate.org.uk/servlet/SubjectSearch (Art gallery) www.corbis.com (Commercial) • Examples of controlled indexing schemes, see: • www.iconclass.nl (Iconclass developed as an extensive decimal classification scheme for the content of paintings) • http://www.getty.edu/research/conducting_research/vocabularies/aat/ (Art and Architecture Thesaurus) • http://www.sti.nasa.gov/products.html#pubtools (NASA thesaurus for space / science images)

2. Indexing-Retrieval based on Visual Features • Also known as “Content-based Image Retrieval”; cf. del Bimbo’s content-dependent metadata • To query: • draw coloured regions (sketch-based query) ; • or choose an example image (query by example) • Images with similar visual features are retrieved (not necessarily similar ‘semantic content’)

Indexing-Retrieval based on Visual Features • Visual Features • Colour • Texture • Shape • Spatial Relations • These features can be computed directly from image data – they characterise the pixel distribution in different ways • Different features may help retrieve different kinds of images

What images would this query return?

Example Systems • QBIC (Query By Image Content), developed by IBM and used by, among others, the Hermitage Art Museum http://wwwqbic.almaden.ibm.com/ • Blobworld - developed by researchers at the University of California http://elib.cs.berkeley.edu/photos/blobworld/start.html

3. Extracting keywords from text already associated with images… “One way to resolve the semantic gap comes from sources outside the image by integrating other sources of information about the image in the query. Information about an image can come from a number of different sources: the image content, labels attached to the image, images embedded in a text, and so on.” (Smeulders et al 2000).

Extracting keywords from text already associated with images… • Images are often accompanied by, or associated with, collateral text, e.g. the caption of a photograph in a newspaper, the caption of a painting in an art gallery… • And, on the Web, the text in the HREF tag • Keywords can be extracted from the collateral text and used to index the image

WebSEEK System • The WebSEEK system processes HTML tags linking to image data files in order to index visual information on the Web • NB. Current web search engines, like Google and AltaVista, appear to be doing something similar

WebSEEK System (Smith and Chang 1997) • Keyword indexing and subject-based classification for WWW-based image retrieval: user can query or browse hierarchy • System trawls Web to find HTML pages with links to images • The HTML text in which the link to an image is embedded is used for indexing and classifying the video • >500,000 images and videos indexed with 11,500 terms; 2,128 classes manually created

WebSEEK System (Smith and Chang 1997) • The WebSeek system processed HTML tags linking to image and video data files in order to index visual information on the Web • The success of this kind of approach depends on how well the keywords in the collateral text relate to the image • Keywords are mapped automatically to subject categories; the categories are created previously with human input

WebSEEK System (Smith and Chang 1997) • Term Extraction: terms extracted from URLs, alt tags and hyperlink text, e.g. http://www.mynet.net/animals/domestic-beasts/dog37.jpg • “animals”, “domestic”, “beasts”, “dog” • Terms used to make an inverted index for keyword-based retrieval • Directory names also extracted, e.g. “animals/domestic-beasts”

WebSEEK System (Smith and Chang 1997) • Subject Taxonomy: manually created ‘is-a’ hierarchy with key-term mappings to map key-terms automatically to subject classes • Facilitates browsing of the image collection

WebSEEK System (Smith and Chang 1997) • The success of this kind of approach depends on how well the keywords in the collateral text relate to the image • URLs, alt tags and hyperlink text may or may not be informative about the image content; even if informative they tend to be brief – perhaps further kinds of collateral text could be exploited

Image Retrieval in Google • Rather like WebSEEK, Google appears to match keywords in file names and in ‘alt’ caption, e.g. <img src="/images/020900.jpg" width=150 height=180 alt="David Beckham tussles with Emmanuel Petit">

Essential Exercise Image Retrieval Exercise: “The aim of this exercise is for you to understand more about the approaches used by different kinds of systems to index and retrieve digital images.” **DOWNLOAD from module webpage**

Further Reading • A paper about the WebSEEK system: Smith and Chang (1997), “Visually Searching the Web for Content”, IEEE Multimedia July-September 1997, pp. 12-20. **Available via library’s eJournal service.** • Different kinds of metadata for images, and an overview of content-based image retrieval: Excerpts from del Bimbo (1999), Visual Information Retrieval – available in library short-term loan articles. • For a comprehensive review of CBIR, and discussions of sensory gap and semantic gap Smeulders, A.W.M.; Worring, M.; Santini, S.; Gupta, A.; Jain, R. (2000), “Content-based image retrieval at the end of the early years.” IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 22, number 12, pp.1349-1380. **Available online through library’s eJournals.** Eakins (2002), ‘Towards Intelligent Image Retrieval’, Pattern Recognition 35, pp. 3-14. Enser (2000), ‘Visual Image Retrieval: seeking the alliance of concept-based and content-based paradigms’, Journal of Information Science 26(4), pp. 199-210.

Lecture 7: LEARNING OUTCOMES You should be able to: • Define and give examples of different kinds of metadata for images. • Discuss how different kinds of image metadata are appropriate for different users of image retrieval systems • Explain what is meant by the sensory gap and semantic gap, and discuss how they impact on image retrieval systems • Describe, critique and compare three different approaches to indexing-retrieving images with reference to example systems

Reading ahead for LECTURE 8 If you want to prepare for next week’s lecture then take a look at… Informedia Research project: http://www.informedia.cs.cmu.edu/ Yanai (2003), “Generic Image Classification Using Visual Knowledge on the Web”, Procs ACM Multimedia 2003. ***Only Section 1 and Section 5 are essential

CSM06 Information Retrieval