Image Smoothing Techniques with Padding and Neighborhood Operations

Introduction to Computer Vision Lecture 6 Dr. Roger S. Gaborski

Intro to CV Graduate Projects • Correlation/Convolution • David Rubel’s Master’s Project (slides included at end of this lecture)

How can we average the pixel values in an image? • The average depends on a number of pixels (a pixel and its neighbors) • Neighborhood Operation • The more neighbors, the more smoothing (averaging)

Smoothing Example • Done on white board • How do we handle pixels along the edges?????

Padding -- padarray • fp = padarray(f, [r c], method, direction) • f is input image • fp is padded image • [r c] is number of rows and columns to pad f • method and direction – next slide

Chapter 3 www.prenhall.com/gonzalezwoodseddins

padarray Example >> f = [1 2; 3 4] f = 1 2 3 4 >> fp = padarray(f, [3 2], 'replicate', 'post') fp = 1 2 2 2 3 4 4 4 3 4 4 4 3 4 4 4 3 4 4 4 Post – pad after the last element in both directions [3 2] – pad 3 rows and 2 columns

>> fp = padarray(f, [2 1], 'replicate', 'post') fp = 1 2 2 3 4 4 3 4 4 3 4 4 Post – pad after the last element in both directions [2 1] – pad 2 rows and 1 columns

>> f = [1 2 3; 1 2 3; 1 2 3] f = 1 2 3 1 2 3 1 2 3 >> fp = padarray(f, [2 2], 'symmetric', 'both') fp = ??????

>> f = [1 2 3; 1 2 3; 1 2 3] f = 1 2 3 1 2 3 1 2 3 >> fp = padarray(f, [2 2], 'symmetric', 'both') fp = 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2

Spatial Filtering • Neighborhood processing • Define center point (x,y) • Perform operations involving only pixels in the neighborhood • Result of operation is response to process at that point • Moving the pixel results in a new neighborhood • Repeat process for every point in the image

Linear and Nonlinear Spatial Filtering • Linear operation • Multiply each pixel in the neighborhood by the corresponding coefficient and sum the results to get the response for each point (x,y) • If neighborhood is m x n , then mn coefficients are required • Coefficients are arranged in a matrix, called • Filter • Filter mask • Kernel • Template • Mask sizes are typically odd sizes (3x3, 5x5, etc.) • Larger the mask, greater the compute time

Chapter 3 www.prenhall.com/gonzalezwoodseddins

Correlation -- Convolution • Correlation • Place mask w on the image array f as previously described • Convolution • First rotate mask w by 180 degrees • Place rotated mask on image as described previously

Example - Correlation • Assume w and f are one dimensional • Origin of f is its left most point • Place w so that its right most point coincides with the origin of f • Pad f with 0s so that there are corresponding f points for each w point (also pad end with 0s) • Multiply corresponding points and sum • In this case (example on next page) result is zero • More w to the right one value, repeat process • Continue process for whole length of f

Chapter 3 www.prenhall.com/gonzalezwoodseddins ‘full’ is the result we obtain from the operations on the previous slide. If instead of aligning the left most element of f with the right most element of w we aligned the center element of w with the left most value of f we would obtain the ‘same’ result, same indicating the result is the same length of the original w

‘Full’ correlation

‘Same’ correlation etc.

Example - Convolution • Convolution is the same procedure, but the filter is first rotated 180 degrees. • If the filter is symmetric, correlation and convolution results are the same

Chapter 3 www.prenhall.com/gonzalezwoodseddins Can simply extend to images

SCENE CLASSIFICATION USING PLSA AND SPATIAL INFORMATION MS Project by David Rubel

OUTLINE Problem Previous Work Datasets Key Concepts Implementation Results Questions

PROBLEM • What is scene classification? • Assigning a scene label to arbitrary images • Potential uses • Content-based image retrieval • Web accessibility • Object detection/localization

PREVIOUS WORK • Holistic Methods • Oliva and Torralba (2001) • Defined a spatial envelope for each image • Consists of naturalness, openness, roughness, expansion and ruggedness. • Trained Discriminant Spectral Templates (DSTs) to processes novel images • Used K-Nearest Neighbors for classification • Produced an excellent dataset

PREVIOUS WORK Water Rock Grass • Semantic Methods • Vogel and Schiele (2004) • Divide each image into 10x10 grid and label each material using SVMs • Create three histograms of materials (COVs) • Classify the image using these COVs • Created another interesting dataset

PREVIOUS WORK • Bag-of-Words Methods • Fei-Fei and Perona (2005) • Search images for textons • Group textons into visual words using k-means clustering • Group visual words together using Bayesian statistics • Label images using a Bayesian classifier • Bosch, Zisserman and Muñoz (2008) • Use SIFT features instead of textons • Use pLSA to group words into topics • Classify images with SVM

DATASET • Oliva and Torralba (OT) • 1472 natural images

DATASET • Oliva and Torralba (OT) • 1216 man-made images

DATASET • Vogel and Schiele (VS) • 700 natural images

KEY CONCEPTS (SIFT) • Scale-Invariant Feature Transform (SIFT) • Interest point detector introduced by David G. Lowe • Points are invariant to scale and rotation • Partially invariant to affine warp and lighting • Four stage process • Scale-space extrema detection • Keypoint localization • Orientation assignment • Keypoint descriptors

KEY CONCEPTS (SIFT) a) b) • Scale-space extrema detection

KEY CONCEPTS (SIFT) • Keypoint localization • Keypoints are refined to subpixel accuracy • Keypoints along edges are removed • Keypoints in areas of low contrast are removed • Orientation assignment • Gradient direction and magnitude are computed for the area surrounding the keypoint • The keypoint is assigned the orientation most represented in the pixel neighborhood • Uses a 36-bin directional histogram with Gaussian weight

KEY CONCEPTS (SIFT) • Keypoint descriptors • 4x4x8 bin histogram of gradient magnitudes • Normalized for some lightning invariance

KEY CONCEPTS (PLSA) D W • Probabilistic Latent Semantic Analysis (pLSA) • Factor analysis presented by Thomas Hofmann • Originally used in text processing field • Set of words W = {w1, …, wM} • Set of documents D = {d1, …, dN} • Describe each document as a histogram of words n(wi, dj )

KEY CONCEPTS (PLSA) D Z D = * W W Z P(zk | dj ) • Compare documents by their distribution of words • Not an ideal solution • Synonyms & polysems • Dense descriptors • pLSA: Add a latent variable (Z = {z1, …, zK}) P(wi | dj ) P(wi | zk )

KEY CONCEPTS (PLSA) • Compute matrices with Expectation Maximization • Expectation Step – computes posterior probabilities • Maximization Step – computes other probabilities • Continue running until perplexity stops decreasing on hold-out data

KEY CONCEPTS (SVMS) Best separator Convex Hulls • Support Vector Machines (SVMs) • Binary classification tool which finds separating hyperplanes

KEY CONCEPTS (SVMS) • Not all problems are linearly separable • Find a best-fit separator • Use a kernel to map data to a higher dimension Best-fit separator Use of RBF kernel

IMPLEMENTATION DETAILS • Building visual words • Find SIFT features in image dataset • Color SIFT features, HSV color space • Dense SIFT detector for better results (M = 8) • Scale-invariance via 4 concentric circles (r = 4, 8, 12, 16) • 64-bit floats -> 16-bit unsigned integers • Cluster features to create visual words • SIFT features alone are too varied • Improved k-means clustering by Charles Elkan • K = 1,500 • 200,000 features • Quantize features • Build histograms

IMPLEMENTATION DETAILS • Testing the classification system • Divide the images into training & testing sets • Run pLSA if requested • Use standard pLSA for training data • Use fold-in heuristic for testing data • Z = 25 • Train SVMs • LIBSVM with MATLAB wrapper • Use one-versus-all method • RBF kernel

RESULTS Test the grouping of pLSA topics

RESULTS Test the discriminative power of pLSA topics

RESULTS KNN: pLSA outperforms BOW (74.6% to 65.0%)

RESULTS • Tried incorporating spatial information • Divide the image into a grid • Train SVMs for each section • Sum results for each class over all sections

RESULTS Four-Class OT (Natural Images)

RESULTS Ambiguous images from the OT dataset

RESULTS Four-Class OT (Man-Made Images)

RESULTS Eight-Class OT (Both)

Image Smoothing Techniques with Padding and Neighborhood Operations

Image Smoothing Techniques with Padding and Neighborhood Operations

Presentation Transcript

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision

Introduction to Computer Vision