Image Retrieval

Image Retrieval CS 663, AjitRajwade Many slides referring to the paper “The Earth Mover’s Distance as a metric for image retrieval”, IJCV 2000, by Rubner, Tomasi and Guibas.

What is image retrieval? • Consider a database of images, each labeled by a set of textual descriptors (eg: village scene, urban scene, forest, cricket match, etc.). • Given a textual query, retrieve all images from the database that have the given label, or a label similar to it (eg: “cricket” is similar to “test match”).

What is image retrieval? • Now consider that the labels were absent in the database. • And applications like the following: given an arbitrary image, return all images from the database that are similar to the given image. • Do not use textual information, tags, or textual content from the web-pages where the image was found (if this is a web application). Open problem as of today!

Results of a query on the word “TajMahal” in images.google.com. When you click one of the image, you get a grid containing some k images that are most similar to the one you selected

Useful in recommender systems • Given a selected video, dish out a list of other videos from the web that are similar to the selected one (K-nearest neighbors)! • Variant (open research problem!): given a selected video, provide me a list of videos that are similar to the selected video, but still dissimilar from each other (K most-diverse nearest neighbors). • All based on visual content (derived from the full video, or from some selected key-frames).

Image similarity? • Direct distance measures between images are not appropriate – change in viewpoint, illumination, scale, dimension; noise. • Various image descriptors possible: e.g. statistics of color values, statistics of texture descriptors, etc. Compare these descriptors instead!

Color statistics • First choose a color space – RGB, HSV, CIE Lab, etc. Generally, the latter two are preferred! • Statistics: • Moments – mean, variance/covariance, etc. • Histograms of color values (in RGB or Lab space) • Cluster the color values (say, using k-means or mean-shift), and store some K cluster centers. The set of cluster centers is called a signature.

Color histogram • It is a mapping of the form {hi} where i is the center of a bin/interval of feature vectors or color-vectors (say 3D vector for RGB or Lab space). • hi = # of pixels that have the feature vector that lies inside the bin centered at i. • Often normalized so that all the {hi} sum up to 1. • All bins need not have equal widths.

Histogram – 1D Bins represent intervals of the form: [ai,ai+1] Function is of the form: p(x)

Histogram – 2D Bins represent intervals of the form: [ai,ai+1] x [bj,bj+1] Function is of the form: p(x,y)

Histogram in 3D Bins represent intervals of the form: [ai,ai+1] x [bj,bj+1] x [ck,ck+1] Function is of the form: p(x,y,z) Visualization a 3D histogram is a bit more challenging: The bin itself is a 3D entity. To visualize, draw a bubble in each bin. If we are coding colors (e.g. RGB), then color the bubble based on the bin. Example, if the bin is (0,100,100), color it cyan. The size of the bubble tells you the frequency value for that bin! Bins with more frequency have larger bubbles! http://rsbweb.nih.gov/ij/plugins/images/3d-inspector.jpg

Color histogram comparison Minkowski distance: generalized form that subsumes L1, L2, L∞,etc. Histogram intersection Kullback-Leibler divergence: the average number of extra bits required to code samples from distribution h using an encoder based on k. Jeffrey’s divergence: m is the average of the histograms h and k. Chi-squared distance: m is the average of the histograms h and k

Problems!  • If bin width is too large, the histogram contains insufficient information. If it is too small, it is very noise sensitive, and entries will move in and out of bins arbitrarily. • The histogram distance measures compare only those entries indexed by the same value. Cross-bin relationships are ignored. • Such distances are often not intuitive! L1 dist. = 2 L1 dist. = 1

There are lies, then there are damned lies, and then there are statistics – Mark Twain.

Cross-bin distance: Quadratic-form distance Over-emphasizes similarity between histograms that do not have a sharp peak. Also note the similarity to the L2 distance, in which case the matrix A is an identity matrix. CB dist. = 0.143 CB dist. = 0.089

h1 h2 h3

Earth Mover’s Distance • Consider one of the distributions as a range of hills, the other as a group of valleys. • EMD = minimal amount of work that needs to be done to move the “earth” from the hills into the valleys. h k

EMD • Amount of work in converting one distribution p to q = (actual value moved from bin ‘i’ of p to bin ‘j’ of q) x distance(i,j) – summed up over ‘i’ and ‘j’. • There are several ways of doing so. We are interested in the minimal amount of work required – subject to some constraints. • Distributions between which less amount of work needs to be done are more similar. • The number of histogram bins need not even be equal!

EMD: step 1 Move non-negative amount of earth from p to q p cannot send more earth than there is! q cannot receive more earth than its capacity! Move maximum amount of earth possible

EMD: step 2 Amount of earth is non-negative! p cannot send more earth than there is! q cannot receive more earth than its capacity! Move maximum amount of earth possible

Source: slides by Prof. SharatChandran

x and y are two distributions with 2 and 3 masses respectively. d(1,1) = 155.7 d(1,2) = 277 d(1,3) = 252.3 d(2,1) = 155.7 d(2,2) = 316.3 d(2,3) = 198.2 One possible set of values in the flow matrix – total work done = 0.23*155.7 + 0.51*252.3 + 0.26*316.3 = 246.7. Not optimal! http://robotics.stanford.edu/~scohen/research/emdg/emdg.html#flow_eqw_notopt

x and y are two distributions with 2 and 3 masses respectively. d(1,1) = 155.7 d(1,2) = 277 d(1,3) = 252.3 d(2,1) = 155.7 d(2,2) = 316.3 d(2,3) = 198.2 One possible set of values in the flow matrix – total work done = 0.23*155.7 + 0.26*277.0 + 0.25*252.3 + 0.26*198.2 = 222.4. http://robotics.stanford.edu/~scohen/research/emdg/emdg.html#flow_eqw_notopt

Computation of EMD Linear Objective Function (i.e. a linear combination of unknown quantities) Linear constraints Linear programming (LP) problem: can be solved by one of the many existing LP solvers (eg: MATLAB: linprog)

http://www.seas.upenn.edu/~ofirpele/FastEMD/code/ L1 dist. = 1 EMD = 0.5 L1 dist. = 2 EMD = 0.2653 D(H1,H2): L1 dist. = 2 EMD = 0.4898 D(H1,H3): L1 dist. = 1.333 EMD = 0.7007 D(H2,H3): L1 dist. = 1.333 EMD = 0.5986

Histograms to signatures • Binning is computationally expensive if the dimensionality of the feature vector is high (even 3 or more). • It also results in a histogram which has most bins that are empty (e.g.: most bins of an RGB histogram of a beach image will be empty. Only light blue and light brown bins will be full, representing the sky and sand respectively). • In such cases, the feature vectors from all the pixels are clustered using some method such as K-means or mean-shift. • The K cluster centers and their probability density values are together stored as the signature for the image: • EMD is applicable to signatures as well! Simple bin to bin distances are not! Analogous to a histogram bin

EMD on signatures Move non-negative amount of earth from p to q p cannot send more earth than there is! q cannot receive more earth than its capacity! Move maximum amount of earth possible

Retrieval Experiment 1 • Given an image, retrieve all images from the database with similar color histograms (i.e. with histogram distance less than some threshold). • Precision = number of relevant images out of the retrieved images /total number of retrieved images. • Recall = number of retrieved images/total number of relevant images. Let’s say 1000 images were retrieved. 900 were relevant. But some 200 other relevant images were not retrieved! Recall = 900/1200 Let’s say 1000 images were retrieved. But 900 were relevant. Precision = 900/1000

In this figure the relevant items are to the left of the straight line while the retrieved items are within the oval. The red regions represent errors. On the left these are the relevant items not retrieved (false negatives), while on the right they are the retrieved items that are not relevant (false positives). http://en.wikipedia.org/wiki/Precision_and_recall

Retrieval Experiment 1 • Dissimilarity measures: L1, Quadratic form, Jeffrey divergence, Chi-square distance, EMD on histograms, EMD on signatures. • Color space: CIE-Lab, number of bins = 4 x 8 x 8 (along L,a,b axes), i.e. 256 bins in total. • Database: 20,000 images • The ground distance for EMD is Euclidean distance between colors, i.e.

Retrieval Experiment 1(a) • 75 images of red cars chosen. • 10 images taken out as query image. • For each query image, the 8 most similar images retrieved (using all the dis-similarity measures).

Query images Query images SOURCE: Paper “The Earth Mover’s Distance as a metric for image retrieval”, IJCV 2000, by Rubner, Tomasi and Guibas.

SOURCE: Paper “The Earth Mover’s Distance as a metric for image retrieval”, IJCV 2000, by Rubner, Tomasi and Guibas.

Retrieval Experiment 1(b) • 10 images of red cars taken out as query images. • For each query image, find all the similar images retrieved (using all the dis-similarity measures).

SOURCE: Paper “The Earth Mover’s Distance as a metric for image retrieval”, IJCV 2000, by Rubner, Tomasi and Guibas.

Retrieval Experiment 1(c) • Find all images with a% pink color and b% green color and don’t care for the remaining (100-a-b)%.

Source: Rubner’s PhD thesis

Caution! Histograms are sometimes misleading. But we are doing histogram-based retrieval. Source: Rubner’s PhD thesis

Joint spatio-intensity Histograms • Intensity statistics alone are not enough. • Spatial information is often also useful. Example: blue sky (above) and green forest (below) versus animal swimming in a lake (below) bordered by trees (above). • EMD with spatio-intensity histograms (in 5D space) will use a ground distance given by:

Intensity/Color isn’t everything

How do you retrieve images such as these?

Notion of texture • Texture is a set of self-similar, repeating patterns in various parts of the image. http://homepages.inf.ed.ac.uk/s0346435/projects/mrf/mrf_texture_project.htm

Characterization of textures: (1) Spatial domain • Textures can be represented as a set of small-sized image patches (say size 3 x 3, 5 x 5 or 7 x 7). • Since a texture is a repeating pattern, there will be a high density of such patches. • Clustering in this space of patches will give you a good indication of the presence of textural patterns.

Characterization of textures: (2) Frequency domain • Textures are commonly modeled as patterns containing a large concentration of a small range of spatial frequencies. • Different textures = different dominant spatial frequencies. • Best characterized by a group of band-pass filter in the frequency domain (this group of filters is called a filter bank).

Image Retrieval

Image Retrieval

Presentation Transcript

Image and Video Retrieval

Content-Based Image Retrieval

Content-based Image Retrieval

Content-Based Image Retrieval

Image Retrieval

Content-based Image Retrieval

Image-based Material Retrieval

Image Database Retrieval

Image Retrieval

Image Information Retrieval

Image Retrieval

Content Based Image Retrieval

Image Retrieval

Content Based Image Retrieval

Image and Video Retrieval

Botany Image Retrieval

Keypoints in Image Retrieval

Image Database Retrieval

Image retrieval and categorization

Image Retrieval

Content Based Image Retrieval

Image Retrieval