html5-img
1 / 54

Introduction to Computer Vision

Introduction to Computer Vision. Lecture 6 Dr. Roger S. Gaborski. Intro to CV Graduate Projects Correlation/Convolution David Rubel’s Master’s Project (slides included at end of this lecture). How can we average the pixel values in an image?.

telma
Télécharger la présentation

Introduction to Computer Vision

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Computer Vision Lecture 6 Dr. Roger S. Gaborski

  2. Intro to CV Graduate Projects • Correlation/Convolution • David Rubel’s Master’s Project (slides included at end of this lecture)

  3. How can we average the pixel values in an image? • The average depends on a number of pixels (a pixel and its neighbors) • Neighborhood Operation • The more neighbors, the more smoothing (averaging)

  4. Smoothing Example • Done on white board • How do we handle pixels along the edges?????

  5. Padding -- padarray • fp = padarray(f, [r c], method, direction) • f is input image • fp is padded image • [r c] is number of rows and columns to pad f • method and direction – next slide

  6. Chapter 3 www.prenhall.com/gonzalezwoodseddins

  7. padarray Example >> f = [1 2; 3 4] f = 1 2 3 4 >> fp = padarray(f, [3 2], 'replicate', 'post') fp = 1 2 2 2 3 4 4 4 3 4 4 4 3 4 4 4 3 4 4 4 Post – pad after the last element in both directions [3 2] – pad 3 rows and 2 columns

  8. >> fp = padarray(f, [2 1], 'replicate', 'post') fp = 1 2 2 3 4 4 3 4 4 3 4 4 Post – pad after the last element in both directions [2 1] – pad 2 rows and 1 columns

  9. >> f = [1 2 3; 1 2 3; 1 2 3] f = 1 2 3 1 2 3 1 2 3 >> fp = padarray(f, [2 2], 'symmetric', 'both') fp = ??????

  10. >> f = [1 2 3; 1 2 3; 1 2 3] f = 1 2 3 1 2 3 1 2 3 >> fp = padarray(f, [2 2], 'symmetric', 'both') fp = 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2

  11. Spatial Filtering • Neighborhood processing • Define center point (x,y) • Perform operations involving only pixels in the neighborhood • Result of operation is response to process at that point • Moving the pixel results in a new neighborhood • Repeat process for every point in the image

  12. Linear and Nonlinear Spatial Filtering • Linear operation • Multiply each pixel in the neighborhood by the corresponding coefficient and sum the results to get the response for each point (x,y) • If neighborhood is m x n , then mn coefficients are required • Coefficients are arranged in a matrix, called • Filter • Filter mask • Kernel • Template • Mask sizes are typically odd sizes (3x3, 5x5, etc.) • Larger the mask, greater the compute time

  13. Chapter 3 www.prenhall.com/gonzalezwoodseddins

  14. Correlation -- Convolution • Correlation • Place mask w on the image array f as previously described • Convolution • First rotate mask w by 180 degrees • Place rotated mask on image as described previously

  15. Example - Correlation • Assume w and f are one dimensional • Origin of f is its left most point • Place w so that its right most point coincides with the origin of f • Pad f with 0s so that there are corresponding f points for each w point (also pad end with 0s) • Multiply corresponding points and sum • In this case (example on next page) result is zero • More w to the right one value, repeat process • Continue process for whole length of f

  16. Chapter 3 www.prenhall.com/gonzalezwoodseddins ‘full’ is the result we obtain from the operations on the previous slide. If instead of aligning the left most element of f with the right most element of w we aligned the center element of w with the left most value of f we would obtain the ‘same’ result, same indicating the result is the same length of the original w

  17. ‘Full’ correlation

  18. ‘Same’ correlation etc.

  19. Example - Convolution • Convolution is the same procedure, but the filter is first rotated 180 degrees. • If the filter is symmetric, correlation and convolution results are the same

  20. Chapter 3 www.prenhall.com/gonzalezwoodseddins Can simply extend to images

  21. SCENE CLASSIFICATION USING PLSA AND SPATIAL INFORMATION MS Project by David Rubel

  22. OUTLINE Problem Previous Work Datasets Key Concepts Implementation Results Questions

  23. PROBLEM • What is scene classification? • Assigning a scene label to arbitrary images • Potential uses • Content-based image retrieval • Web accessibility • Object detection/localization

  24. PREVIOUS WORK • Holistic Methods • Oliva and Torralba (2001) • Defined a spatial envelope for each image • Consists of naturalness, openness, roughness, expansion and ruggedness. • Trained Discriminant Spectral Templates (DSTs) to processes novel images • Used K-Nearest Neighbors for classification • Produced an excellent dataset

  25. PREVIOUS WORK Water Rock Grass • Semantic Methods • Vogel and Schiele (2004) • Divide each image into 10x10 grid and label each material using SVMs • Create three histograms of materials (COVs) • Classify the image using these COVs • Created another interesting dataset

  26. PREVIOUS WORK • Bag-of-Words Methods • Fei-Fei and Perona (2005) • Search images for textons • Group textons into visual words using k-means clustering • Group visual words together using Bayesian statistics • Label images using a Bayesian classifier • Bosch, Zisserman and Muñoz (2008) • Use SIFT features instead of textons • Use pLSA to group words into topics • Classify images with SVM

  27. DATASET • Oliva and Torralba (OT) • 1472 natural images

  28. DATASET • Oliva and Torralba (OT) • 1216 man-made images

  29. DATASET • Vogel and Schiele (VS) • 700 natural images

  30. KEY CONCEPTS (SIFT) • Scale-Invariant Feature Transform (SIFT) • Interest point detector introduced by David G. Lowe • Points are invariant to scale and rotation • Partially invariant to affine warp and lighting • Four stage process • Scale-space extrema detection • Keypoint localization • Orientation assignment • Keypoint descriptors

  31. KEY CONCEPTS (SIFT) a) b) • Scale-space extrema detection

  32. KEY CONCEPTS (SIFT) • Keypoint localization • Keypoints are refined to subpixel accuracy • Keypoints along edges are removed • Keypoints in areas of low contrast are removed • Orientation assignment • Gradient direction and magnitude are computed for the area surrounding the keypoint • The keypoint is assigned the orientation most represented in the pixel neighborhood • Uses a 36-bin directional histogram with Gaussian weight

  33. KEY CONCEPTS (SIFT) • Keypoint descriptors • 4x4x8 bin histogram of gradient magnitudes • Normalized for some lightning invariance

  34. KEY CONCEPTS (PLSA) D W • Probabilistic Latent Semantic Analysis (pLSA) • Factor analysis presented by Thomas Hofmann • Originally used in text processing field • Set of words W = {w1, …, wM} • Set of documents D = {d1, …, dN} • Describe each document as a histogram of words n(wi, dj )

  35. KEY CONCEPTS (PLSA) D Z D = * W W Z P(zk | dj ) • Compare documents by their distribution of words • Not an ideal solution • Synonyms & polysems • Dense descriptors • pLSA: Add a latent variable (Z = {z1, …, zK}) P(wi | dj ) P(wi | zk )

  36. KEY CONCEPTS (PLSA) • Compute matrices with Expectation Maximization • Expectation Step – computes posterior probabilities • Maximization Step – computes other probabilities • Continue running until perplexity stops decreasing on hold-out data

  37. KEY CONCEPTS (SVMS) Best separator Convex Hulls • Support Vector Machines (SVMs) • Binary classification tool which finds separating hyperplanes

  38. KEY CONCEPTS (SVMS) • Not all problems are linearly separable • Find a best-fit separator • Use a kernel to map data to a higher dimension Best-fit separator Use of RBF kernel

  39. IMPLEMENTATION DETAILS • Building visual words • Find SIFT features in image dataset • Color SIFT features, HSV color space • Dense SIFT detector for better results (M = 8) • Scale-invariance via 4 concentric circles (r = 4, 8, 12, 16) • 64-bit floats -> 16-bit unsigned integers • Cluster features to create visual words • SIFT features alone are too varied • Improved k-means clustering by Charles Elkan • K = 1,500 • 200,000 features • Quantize features • Build histograms

  40. IMPLEMENTATION DETAILS • Testing the classification system • Divide the images into training & testing sets • Run pLSA if requested • Use standard pLSA for training data • Use fold-in heuristic for testing data • Z = 25 • Train SVMs • LIBSVM with MATLAB wrapper • Use one-versus-all method • RBF kernel

  41. RESULTS Test the grouping of pLSA topics

  42. RESULTS Test the discriminative power of pLSA topics

  43. RESULTS KNN: pLSA outperforms BOW (74.6% to 65.0%)

  44. RESULTS • Tried incorporating spatial information • Divide the image into a grid • Train SVMs for each section • Sum results for each class over all sections

  45. RESULTS Four-Class OT (Natural Images)

  46. RESULTS Ambiguous images from the OT dataset

  47. RESULTS Four-Class OT (Man-Made Images)

  48. RESULTS Eight-Class OT (Both)

More Related