Visual Attention

Visual Attention Jeremy Wyatt

Where to look? • Many visual processes are expensive • Humans don’t process the whole visual field • How do we decide what to process? • How can we use insights about this to make machine vision more efficient?

Visual salience • Salience ~ visual prominence • Must be cheap to calculate • Related to features that we collect from very early stages of visual processing • Colour, orientation, intensity change and motion are all important indicators of salience

On/Off cells OFF Cell ON Cell • Recall centre surround cells Light OFF area ON area ON area OFF area Time Light spot

Colour sensitive On/Off cells • Recall that some ganglion ON cells are sensitive to the outputs of cones OFF ON

An intensity change map • I = (r+g+b)/3 gives I, the intensity map • The intensity change manp is formed from a grid of on/off cells (they overlap) • There are several maps, each from cells with receptive fields at a different scale • Each cell fires for its area

How do we calculate the maps? • We can create each on cell using a pair of Gaussians - = ON area Light spot OFF area

How do we calculate the maps? • Imagine grids of fat and thin Gaussians • We calculate the value of each Gaussian in each grid and then subtract one grid (here with 16 elements) from the other • This implements our grid of on cells

Calculating the intensity change map • We do this for a mix of scales • We have to interpolate the values of some maps to match the outputs of others (this corresponds to cells that have overlapping receptive fields) • By aligning and then combining the maps at different scales we have implemented a grid of on cells, or a grid of off cells

Other maps • We can now do this for red, green, yellow and blue • We also do this for intensity changes of a certain orientation - gives

Combining maps to calculate saliency • We now add the maps to obtain the saliency of each group of pixels in the scene Saliency map • We normalise each map to the same range before adding • We weight each map before combining it • We attend to the most active point in the saliency map

Attending to areas of the scene • We use the salience model I have described to attend to certain areas of the scene • We can now use this salience model to make other visual processes more efficient (e.g. object recognition)

Learning names and appearances of objects

Salience can be modulated by language

Modulating visual salience by language:results

Summary • Visual attention is guided by many features • A good model of attention involves parts of early visual processing we have already seen • We can use this to make object learning in robots more efficient

Visual Attention