Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Featuresfrom Scale-Invariant Keypoints David G. Lowe International Journal of Computer Vision(IJCV), 2004

Extracting distinctive invariant features • Points are individually ambiguous • More unique matches are possible with small regions of images http://www.csie.ntu.edu.tw/~cyy/courses/vfx/05spring/lectures/handouts/lec04_feature.pdf

Desired properties for features • Invariant: invariant to scale, rotation, affine, illumination and noise for robust matching across a substantial range of affine distortion, viewpoint change and so on. • Distinctive: a single feature can be correctly matched with high probability

Moravec corner detector (1980) • We should easily recognize the point by looking through a small window • Shifting a window in anydirection should give a large change in intensity

Moravec corner detector flat

Moravec corner detector flat edge

Moravec corner detector corner isolated point flat edge

Window function Shifted intensity Intensity Moravec corner detector Change of intensity for the shift [u,v]: Four shifts: (u,v) = (1,0), (1,1), (0,1), (-1, 1) Problem: responds too strong for edges because only minimum of E is taken into account

Harris corner detector [1992] • Consider all small shifts by Taylor’s expansion W(x, y): Gaussian function => M: 2x2 Hessian matrix, 1, 2 – eigenvalues of M

Harris corner detector 2 Measure of corner response: edge 2 >> 1 Corner 1 and 2 are large,1 ~ 2;E increases in all directions edge 1 >> 2 Classification of image points using eigenvalues of M: flat 1

Harris Detector: Problem • non-invariant to image scale! All points will be classified as edges Corner !

Scale-invariant feature transform (SIFT) • Scale-invariant feature transform (or SIFT) is an algorithm to detect and describe local features in images. • Distinctive features • Invariant to image scale, rotation and affine distortion • Applied locally on key-points • Based upon the image gradients in a local neighborhood

descriptor detector local descriptor SIFT stages: • Scale-space extrema detection • Keypoint localization • Orientation assignment • Keypoint descriptor

Convolution with a variable-scale Gaussian 1. Detection of scale-space extrema Difference-of-Gaussian (DoG) filter

Scale space  doubles for the next octave 2 K=2(1/s), s+3 images for each octave k∙ 

DoG • Efficient function to compute • A close approximation to the scale-normalized Laplacian of Gaussian

2. Keypoint localization X is selected if it is larger or smaller than all 26 neighbors

Decide scale sampling frequency

Pre-smoothing  =1.6, plus a double expansion

If has offset larger than 0.5, sample point is changed. If is less than 0.03 (low contrast), it is discarded. 2. Accurate keypoint localization Reject points with low contrast and poorly localized along an edge

Eliminating edge responses Let Keep the points with r=10

3. Orientation assignment • By assigning a consistent orientation, the keypoint descriptor can be orientation invariant. • For a keypoint, L is the image with the closest scale • 36-bin orientation histogram over 360° • weighted by m • Peak is the dominant orientation • Local peak within 80% creates multiple orientations • About 15% has multiple orientations

4. Local image descriptor • Image gradients are sampled over 16x16 array of locations in scale space • Create array of orientation histograms • 8 orientations x 4x4 histogram array = 128 dimensions

Recognition examples

Distinctive Image Features from Scale-Invariant Keypoints