Introduction to Computer Vision

Introduction to Computer Vision Olac Fuentes Computer Science Department University of Texas at El Paso El Paso, TX, U.S.A.

What is Computer Vision? Computer Vision is the process of extracting knowledge about the world from one or more digital images

Digital Images are 2D arrays (matrices) of numbers:

Digital Images Color Images are formed with three 2-D arrays, representing the Red, Green and Blue components of the image.

Computer Vision – Main Tasks • Model generation • Object Recognition • Object Detection • Tracking

Computer Vision – Object DetectionDetecting Faces

Computer Vision – Object DetectionDetecting Pedestrians

Computer Vision – Object DetectionDetecting Cars

Computer Vision – Object DetectionHow to do it? Idea: Use Machine Learning Training: Training Set: • Positive examples are images of objects that belong to the class of interest • Negative examples are images of objects that don’t belong to that class Train classifier using the training set Detection Given an image to analyze, apply classifier to every subimage (there are lots of them, so a low false positive rate is important!)

Face Detection – Training Images

Efficient Object DetectionViola & Jones, 2005 Idea #1: Classifier Structure Build a cascade classifiers: • Where stage i is simpler (and faster) than stage i+1

Efficient Object DetectionViola & Jones, 2005 Idea #2: Features Use a large number of very simple features:

Efficient Object DetectionViola & Jones, 2005 Idea #3: Feature Computation Compute the features very efficiently using the integral image:

Efficient Object DetectionViola & Jones, 2005 Idea #4: Dealing with multiple scales

Efficient Object DetectionViola & Jones, 2005 Idea #4: Dealing with multiple scales Obvious solution: Build a detector for each possible scale

Efficient Object DetectionViola & Jones, 2005 Idea #4: Dealing with multiple scales Obvious solution: Build a detector for each possible scale Better idea: Build a detector for a single scale During detection, scale the image

Efficient Object DetectionThe Modified census transform (Froba and Ernst, 2004) Used local intensity descriptors as features

Efficient Object DetectionThe Modified census transform (Froba and Ernst, 2004) Used local intensity descriptors as features Used simple voting classifiers and Adaboost to build a cascade of classifiers

Efficient Object DetectionHistograms of Gradients (Dalal, 2005) Histograms of Gradients (Dalal, 2005) Used histograms of oriented gradients as features Used Support Vector Machine as classifier Best results to date

Object Recognition Training Testing Owl ?? Duck Toucan ?? Egret

Object Recognition – Face Recognition Eigenfaces are a set of "standardized face ingredients", derived from statistical analysis of many pictures of faces. First four eigenfaces from the AT&T database

Eigenfaces • One person's face might be made up of 10% from face 1, 24% from face 2 and so on. Very few eigenvector terms are needed to give a fair likeness of most people's faces Eigenfaces provide a means of applying data compression to faces for identification purposes.

Eigenfaces • Let E1,...,En, be the eigenfaces obtained from a face database Let F1,...,Fm be the images in our training/testing sets. (For the training images we also know the person’s identity) The attributes of Fi are given by the sum of the pixel by pixel products of Fi and E1,...,En, that is, Fi is represented by n numbers: [Fi·E1, Fi·E2, ..., Fi·En] Using the attribute vectors and the class information we can now construct a classifier

Tracking Continuous detection of objects of interest in video streams

Reconstruction • Build a 3D models of world given 2D Images • Most-common Approach: Stereo Vision • Inspired by human 3D perception • Use two cameras of known geometry

Reconstruction • Build a 3D models of world given 2D Images • Most-common Approach: Stereo Vision • Inspired by human 3D perception • Use two cameras of known geometry • Take images

Reconstruction • Build a 3D models of world given 2D Images • Most-common Approach: Stereo Vision • Inspired by human 3D perception • Use two cameras of known geometry • Take images • Find correspondences • Reconstruct using correspondences and known geometry

Reconstruction

Reconstruction Problems with Stereo Vision: Finding matches reliably is difficult Calibration is difficult It hard to deal with featureless areas Computationally expensive

Reconstruction Microsoft to the rescue!

Reconstruction Microsoft to the rescue! Seriously!

Reconstruction Microsoft Kinect Reconstruction using active illumination Project a known pattern of light at an invisible wavelength Learn the appearance of that pattern at different distances Fast and easy

Reconstruction Microsoft Kinect

Introduction to Computer Vision