540 likes | 628 Vues
Learn how to average pixel values and apply spatial filtering in computer vision. Explore padding methods, spatial filtering concepts, linear and nonlinear operations, correlation, and convolution techniques. Enhance your understanding of handling edge pixels and image processing. Dive into scene classification approaches using PLSA and spatial information for image recognition applications.
E N D
Introduction to Computer Vision Lecture 6 Dr. Roger S. Gaborski
Intro to CV Graduate Projects • Correlation/Convolution • David Rubel’s Master’s Project (slides included at end of this lecture)
How can we average the pixel values in an image? • The average depends on a number of pixels (a pixel and its neighbors) • Neighborhood Operation • The more neighbors, the more smoothing (averaging)
Smoothing Example • Done on white board • How do we handle pixels along the edges?????
Padding -- padarray • fp = padarray(f, [r c], method, direction) • f is input image • fp is padded image • [r c] is number of rows and columns to pad f • method and direction – next slide
Chapter 3 www.prenhall.com/gonzalezwoodseddins
padarray Example >> f = [1 2; 3 4] f = 1 2 3 4 >> fp = padarray(f, [3 2], 'replicate', 'post') fp = 1 2 2 2 3 4 4 4 3 4 4 4 3 4 4 4 3 4 4 4 Post – pad after the last element in both directions [3 2] – pad 3 rows and 2 columns
>> fp = padarray(f, [2 1], 'replicate', 'post') fp = 1 2 2 3 4 4 3 4 4 3 4 4 Post – pad after the last element in both directions [2 1] – pad 2 rows and 1 columns
>> f = [1 2 3; 1 2 3; 1 2 3] f = 1 2 3 1 2 3 1 2 3 >> fp = padarray(f, [2 2], 'symmetric', 'both') fp = ??????
>> f = [1 2 3; 1 2 3; 1 2 3] f = 1 2 3 1 2 3 1 2 3 >> fp = padarray(f, [2 2], 'symmetric', 'both') fp = 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2
Spatial Filtering • Neighborhood processing • Define center point (x,y) • Perform operations involving only pixels in the neighborhood • Result of operation is response to process at that point • Moving the pixel results in a new neighborhood • Repeat process for every point in the image
Linear and Nonlinear Spatial Filtering • Linear operation • Multiply each pixel in the neighborhood by the corresponding coefficient and sum the results to get the response for each point (x,y) • If neighborhood is m x n , then mn coefficients are required • Coefficients are arranged in a matrix, called • Filter • Filter mask • Kernel • Template • Mask sizes are typically odd sizes (3x3, 5x5, etc.) • Larger the mask, greater the compute time
Chapter 3 www.prenhall.com/gonzalezwoodseddins
Correlation -- Convolution • Correlation • Place mask w on the image array f as previously described • Convolution • First rotate mask w by 180 degrees • Place rotated mask on image as described previously
Example - Correlation • Assume w and f are one dimensional • Origin of f is its left most point • Place w so that its right most point coincides with the origin of f • Pad f with 0s so that there are corresponding f points for each w point (also pad end with 0s) • Multiply corresponding points and sum • In this case (example on next page) result is zero • More w to the right one value, repeat process • Continue process for whole length of f
Chapter 3 www.prenhall.com/gonzalezwoodseddins ‘full’ is the result we obtain from the operations on the previous slide. If instead of aligning the left most element of f with the right most element of w we aligned the center element of w with the left most value of f we would obtain the ‘same’ result, same indicating the result is the same length of the original w
‘Full’ correlation
‘Same’ correlation etc.
Example - Convolution • Convolution is the same procedure, but the filter is first rotated 180 degrees. • If the filter is symmetric, correlation and convolution results are the same
Chapter 3 www.prenhall.com/gonzalezwoodseddins Can simply extend to images
SCENE CLASSIFICATION USING PLSA AND SPATIAL INFORMATION MS Project by David Rubel
OUTLINE Problem Previous Work Datasets Key Concepts Implementation Results Questions
PROBLEM • What is scene classification? • Assigning a scene label to arbitrary images • Potential uses • Content-based image retrieval • Web accessibility • Object detection/localization
PREVIOUS WORK • Holistic Methods • Oliva and Torralba (2001) • Defined a spatial envelope for each image • Consists of naturalness, openness, roughness, expansion and ruggedness. • Trained Discriminant Spectral Templates (DSTs) to processes novel images • Used K-Nearest Neighbors for classification • Produced an excellent dataset
PREVIOUS WORK Water Rock Grass • Semantic Methods • Vogel and Schiele (2004) • Divide each image into 10x10 grid and label each material using SVMs • Create three histograms of materials (COVs) • Classify the image using these COVs • Created another interesting dataset
PREVIOUS WORK • Bag-of-Words Methods • Fei-Fei and Perona (2005) • Search images for textons • Group textons into visual words using k-means clustering • Group visual words together using Bayesian statistics • Label images using a Bayesian classifier • Bosch, Zisserman and Muñoz (2008) • Use SIFT features instead of textons • Use pLSA to group words into topics • Classify images with SVM
DATASET • Oliva and Torralba (OT) • 1472 natural images
DATASET • Oliva and Torralba (OT) • 1216 man-made images
DATASET • Vogel and Schiele (VS) • 700 natural images
KEY CONCEPTS (SIFT) • Scale-Invariant Feature Transform (SIFT) • Interest point detector introduced by David G. Lowe • Points are invariant to scale and rotation • Partially invariant to affine warp and lighting • Four stage process • Scale-space extrema detection • Keypoint localization • Orientation assignment • Keypoint descriptors
KEY CONCEPTS (SIFT) a) b) • Scale-space extrema detection
KEY CONCEPTS (SIFT) • Keypoint localization • Keypoints are refined to subpixel accuracy • Keypoints along edges are removed • Keypoints in areas of low contrast are removed • Orientation assignment • Gradient direction and magnitude are computed for the area surrounding the keypoint • The keypoint is assigned the orientation most represented in the pixel neighborhood • Uses a 36-bin directional histogram with Gaussian weight
KEY CONCEPTS (SIFT) • Keypoint descriptors • 4x4x8 bin histogram of gradient magnitudes • Normalized for some lightning invariance
KEY CONCEPTS (PLSA) D W • Probabilistic Latent Semantic Analysis (pLSA) • Factor analysis presented by Thomas Hofmann • Originally used in text processing field • Set of words W = {w1, …, wM} • Set of documents D = {d1, …, dN} • Describe each document as a histogram of words n(wi, dj )
KEY CONCEPTS (PLSA) D Z D = * W W Z P(zk | dj ) • Compare documents by their distribution of words • Not an ideal solution • Synonyms & polysems • Dense descriptors • pLSA: Add a latent variable (Z = {z1, …, zK}) P(wi | dj ) P(wi | zk )
KEY CONCEPTS (PLSA) • Compute matrices with Expectation Maximization • Expectation Step – computes posterior probabilities • Maximization Step – computes other probabilities • Continue running until perplexity stops decreasing on hold-out data
KEY CONCEPTS (SVMS) Best separator Convex Hulls • Support Vector Machines (SVMs) • Binary classification tool which finds separating hyperplanes
KEY CONCEPTS (SVMS) • Not all problems are linearly separable • Find a best-fit separator • Use a kernel to map data to a higher dimension Best-fit separator Use of RBF kernel
IMPLEMENTATION DETAILS • Building visual words • Find SIFT features in image dataset • Color SIFT features, HSV color space • Dense SIFT detector for better results (M = 8) • Scale-invariance via 4 concentric circles (r = 4, 8, 12, 16) • 64-bit floats -> 16-bit unsigned integers • Cluster features to create visual words • SIFT features alone are too varied • Improved k-means clustering by Charles Elkan • K = 1,500 • 200,000 features • Quantize features • Build histograms
IMPLEMENTATION DETAILS • Testing the classification system • Divide the images into training & testing sets • Run pLSA if requested • Use standard pLSA for training data • Use fold-in heuristic for testing data • Z = 25 • Train SVMs • LIBSVM with MATLAB wrapper • Use one-versus-all method • RBF kernel
RESULTS Test the grouping of pLSA topics
RESULTS Test the discriminative power of pLSA topics
RESULTS KNN: pLSA outperforms BOW (74.6% to 65.0%)
RESULTS • Tried incorporating spatial information • Divide the image into a grid • Train SVMs for each section • Sum results for each class over all sections
RESULTS Four-Class OT (Natural Images)
RESULTS Ambiguous images from the OT dataset
RESULTS Four-Class OT (Man-Made Images)
RESULTS Eight-Class OT (Both)