Advanced Computer Vision

Advanced Computer Vision Lecture 04

Today’s Topics • Schedule • Histogram Intersection • Moments • Opponent Color Channels • SIFT Scene Classification Outline • Classifiers

Region Representation • Color Histogram used to represent a region of an image • No spatial information • Robust to changes in pose and shape • Template Window of pixel intensities • Template matching between images • Assumes spatial arrangement does not change

Histograms • Image- two dimensional mapping I : x v (from pixels (x,y)’ to value v) Histogram hI(0) (b) = nb , b = 1,2…B B is the number of bins in the histogram nb is number of pixels in bth bin

Similarity between Histograms • Similarity between histogram bins: ρn(nb, n’b) = min(nb, n’b) / ∑nj j=1…B • Sum all the bin similarities ∑ρn(nb, n’b) Assuming both histograms have ∑nj j=1…B pixels M. Swain and D. Ballard. “Color indexing,”International Journal of Computer Vision, 7(1):11–32, 1991.

Histogram Intersection • A simple example: • g = [ 17, 23, 45, 61, 15]; (histogram bins) • h = [ 15, 21, 42, 51, 17]; • in=sum(min(h,g))/min( sum(h),sum(g)) • in = 0.9863

If Histograms Identical • g = 15 21 42 51 17 • h = 15 21 42 51 17 • >> in=sum(min(h,g))/min( sum(h),sum(g)) • in = 1

Different Histograms • h = 15 21 42 51 17 • g = 57 83 15 11 1 • >> in=sum(min(h,g))/min( sum(h),sum(g)) • in = 0.4315

Use Gray Scale for Example

Region and Histogram Similarity with itself: >>h = hist(q(:),256); >> g=h; >> in=sum(min(h,g))/min( sum(h),sum(g)) in = 1

>> r=236;c=236; >> g=im(1:r,1:c); >> g= hist(g(:),256); >> in=sum(min(h,g))/min( sum(h),sum(g)) in = 0.5474

Partial Matches >> g= hist(g(:),256); >> in=sum(min(h,g))/min( sum(h),sum(g)) in = 0.8014 in=sum(min(h,g))/min( sum(h),sum(g)) in = 0.8566

Lack of Spatial Information • Different patches may have similar histograms

in=sum(min(h,g))/min( sum(h),sum(g)) in = 1

Note • The examples can be easily extended to color images; rgb, hsv, etc. • 256 bins were used in the histograms • Reduced number of bins will allow for better matches of similar, but not identical patches • Too manybins will result in poor performances – too many mis-matches

Attempt to Include Spatial Information • Spatiograms Versus Histograms for Region-Based Tracking Stanley T. Birchfield Sriram Rangarajan

Moments Moments, Central moments and Invariant moments

Moments • 2D moment of order (p+q) of image f(x,y): mpq = ΣΣxp yq f(x, y) p,q = 0,1,2,… x y • Central moment: µpq = ΣΣ(x-x)p (y-y)q f(x, y) Where x = m10/ m00 and y = m01/ m00 • Normalized central moment: ηpq = µpq/µ γ00 where γ = {(p+q)/2}+1 _ _ _ _

M1 = (η20 + η02) M2 = (η20 − η02)2 + 4η211 M3 = (η30 − 3η12)2 + (3η21 − η03)2 M4 = (η30 + η12)2 + (η21 + η03)2 M5 = (η30 − 3η12)(η30 + η12)[(η30 + η12)2 − 3(η21 + η03)2] + (3η21 − η03)(η21 + η03)[3(η30 + η12)2 − (η21 + η03)2] M6 = (η20 − η02)[(η30 + η12)2 − (η21 + η03)2] + 4η11(η30 + η12)(η21 + η03) M7 = (3η21 − η03)(η30 + η12)[(η30 + η12)2 − 3(η21 + η03)2] − (η30 + 3η12)(η21 + η03)[3(η30 + η12)2 − (η21 + η03)2]

Images • Mean and Variance Consider: >> A = rand(20); >> B = rand(20); >> C1=[A;B]; >> C2=[B;A] >>figure, imshow(C1),colormap(hot),title('C1') >>figure, imshow(C2),colormap(hot),title('C2')

Mean? Variance?

>> mean(C2(:)) ans = 0.5064 >> var(C2(:)) ans = 0.0800 >> mean(C1(:)) ans = 0.5064 >> var(C1(:)) ans = 0.0800 CANNOT USE MEAN AND VARIANCE FOR RECOGNITION

Moment Invariants >> invMomentsC1=abs(log(invmoments(C1))) invMomentsC1 = 0.8741 2.7730 11.1135 12.6103 24.4960 15.2538 26.1970 >> invMomentsC2=abs(log(invmoments(C2))) invMomentsC2 = 0.8889 2.8229 10.2520 11.6699 23.2758 13.3174 22.7919

Image Creation im1 = imread('CornellClock.jpg'); im1rgb = double(rgb2gray(im1)); figure, imshow(im1rgb, [ ]), title('Cornell Clock') im1rgbCrop1 = im1rgb(450:549,825:924); figure, imshow(im1rgbCrop1, [ ]), title('Cornell Clock Cropped') im1rgbCrop1Sm = im1rgbCrop1(1:2:end, 1:2:end); im1rgbCrop1r45 = imrotate(im1rgbCrop1,45,'bilinear'); im1rgbCrop1flr = fliplr(im1rgbCrop1); figure, imshow(im1rgbCrop1r45, [ ]), title('45 Degree Rotation') figure, imshow(im1rgbCrop1flr, [ ]), title('Clock Flipped') figure, imshow(im1rgbCrop1Sm, [ ]), title('Small Clock')

phi 1 phi 2 phi 3 phi 4 phi 5 phi 6 phi 7 6.3014 17.8100 23.9168 23.0700 47.7443 32.4888 46.6251 6.3016 17.8126 23.9169 23.0717 47.7114 32.4889 46.6324 6.3014 17.8100 23.9168 23.0700 47.7443 32.4888 46.7308 6.2892 17.8693 23.8802 23.0783 47.6431 32.5994 46.6333

phi 1 phi 2 phi 3 phi 4 phi 5 phi 6 phi 7 6.6675 19.2909 26.4725 25.9716 52.4851 35.7716 52.7540 6.6677 19.2901 26.4667 25.9741 52.4872 35.7738 52.7522 6.6675 19.2909 26.4725 25.9716 52.4851 35.7716 52.8475 6.6682 19.2976 26.3441 25.9768 52.3908 35.7817 52.7870

6.5112 14.7902 23.4190 26.5393 51.6422 34.5953 52.2770 6.5112 14.7902 23.4191 26.5393 51.6424 34.5953 52.2770 6.5113 14.7897 23.4136 26.5266 51.6467 34.5529 52.1718

6.7102 16.2675 23.5469 26.5158 52.3896 35.0203 51.7451 6.7101 16.2675 23.5469 26.5158 52.3898 35.0204 51.7452 6.7107 16.2645 23.5353 26.4955 52.3316 34.9976 51.7141

Invariant Moment Summary CORNELL CLOCK REGIONS 6.3014 17.8100 23.9168 23.0700 47.7443 32.4888 46.6251 REGION 1 6.3016 17.8126 23.9169 23.0717 47.7114 32.4889 46.6324 6.3014 17.8100 23.9168 23.0700 47.7443 32.4888 46.7308 6.2892 17.8693 23.8802 23.0783 47.6431 32.5994 46.6333 6.6675 19.2909 26.4725 25.9716 52.4851 35.7716 52.7540 REGION 2 6.6677 19.2901 26.4667 25.9741 52.4872 35.7738 52.7522 6.6675 19.2909 26.4725 25.9716 52.4851 35.7716 52.8475 6.6682 19.2976 26.3441 25.9768 52.3908 35.7817 52.7870 PEOPLE 6.5112 14.7902 23.4190 26.5393 51.6422 34.5953 52.2770 PERSON 1 6.5112 14.7902 23.4191 26.5393 51.6424 34.5953 52.2770 6.5113 14.7897 23.4136 26.5266 51.6467 34.5529 52.1718 6.7102 16.2675 23.5469 26.5158 52.3896 35.0203 51.7451 PERSON 2 6.7101 16.2675 23.5469 26.5158 52.3898 35.0204 51.7452 6.7107 16.2645 23.5353 26.4955 52.3316 34.9976 51.7141

Useful Papers • Distinctive Image Features from Scale-Invariant Keypoints - David G. Lowe • Histograms of Oriented Gradients for Human Detection- Navneet Dalal and Bill Triggs • Finding People in Images and Videos-Navneet Dalal (PhD) • Image Description using Scale-Space Edge Pixel Directions Histogram Ant´onio M. G. Pinheiro

MPEG-7 Texture: Edge Histogram Descriptor • http://makarandtapaswi.wordpress.com/2010/07/15/mpeg-7-texture-edge-histogram-descriptor/ • Useful for scene classification • Histogram of edge orientations defined in local sub images

Homogeneous Texture Descriptor • Texture Features • http://www.cs.auckland.ac.nz/compsci708s1c/lectures/Glect-html/topic4c708FSC.htm

Color • Expand SIFT to color images • Could simply use RGB color space • Opponent Color space has demonstrated better performance • Channel 1 = (R-G)/ √2 • Channel 2 = (R+G-2B)/ √6 • Channel 3 = (R+G+B)/√3

Channels Channel 1 = (R-G)/ √2 Color along Red Green axis Channel 2 = (R+G-2B)/ √6 Color along Yellow Blue axis Channel 3 = (R+G+B)/√3 This is simply the intensity information

Color SIFT • Select keypoints • Dense grid? • Calculate 128 value feature vector for each plane • Keypoint has 384 (3x128) features

Visual Dictionary • For a known scene class, extract SIFT feature vectors (in this example, 384 values • We don’t want 10,000s separate feature vectors for each class • Cluster features – there are many clustering algorithms in addition to k-means. • You need to estimate k, the number of classes • Each cluster is a ‘visual word’ in the dictionary • How many visual words? 500, 1000, 5000?

Key Point Locations • Key points should be informative, not all clustered in one region • Advantage of grid sampling is points are uniformly sampled

Matching Key Points • How do you match key point features from an unknown image to visual words in the dictionary? • Euclidean distance? Other metrics • What if a key point from the unknown image matches several visual words – discard, and try another key point from the unknown image

Histogram of Matching Visual Words • Histogram intersection • Other methods

Classifiers • Neural Networks – Help Product Help • Matlab command line >>demo • Neural Network Toolkit • Getting Started

Classifiers • Crab Classification M or F • Training data: Measure 6 features • Input matrix x size: 6x200 • 6 features per crab (each row a feature) • 200 examples (each column an example) • Target matrix t: 2x200 • 2 classes (M =0,1 or F = 1,0) • 200 examples

NN Architecture • 6 inputs (number of features) • 2 outputs (number of output classes) Fully connected Every neuron connected to neuron in next column Output Neurons Input Neurons Hidden Neurons

>> net = patternnet(10); >> view(net) Matlab automatically divides data into training, validation and testing data >>[net,tr] = train(net,x,t); >>nntraintool

Training

Classifiers • Matlab Help pages • Classify using support vector machine (SVM) • svmclassify

Advanced Computer Vision