30 likes | 256 Vues
Face Detection. Real-Time Face Detection and Tracking Using Multiple Cameras RIT Computer Engineering Senior Design Project. Hardware Configuration.
E N D
Face Detection Real-Time Face Detection and Tracking Using Multiple Cameras RIT Computer Engineering Senior Design Project Hardware Configuration The face detection was done using a Support Vector Machine, which is a learning machine that has the ability to classify complicated information, such as faces.The face detection was first trained using approximately 150 20x20 images of both non-faces and faces. • From the Camera to the PC, the hardware utilized was: • Sony EVI D100 Color Video Cameras • SVC – Scene View Camera with wide angle view • OVC – Object View Camera with narrow angle view • 2 PCs with: • Osprey 200 Frame grabber cards • 2GB RAM • Dual 2.8GHz Intel Xeons Jared Holsopple John Ruppert Justin Hnatow This project effectively detects and tracks human faces. Using two cameras with different zoom levels -- one viewing an entire scene, one zoomed in on a human face – it is able to work through partial occlusion and slight illumination changes. Because of the color space that was used, this system has the ability to track people of all races. Utilizing multiple cameras in conjunction enables a more robust detection and tracking environment, while increasing the complexity of the design. The key elements of the design are the graphical user interface, the communications algorithm, the face detection algorithm, the tracking algorithm, and the camera view correspondence. Resized Capture System Setup Grayscale Original Before classification and training, each image was converted to grayscale and resized to 20x20 pixels. The histogram of the image was then equalized to normalize brightness and increase contrast. The image was then masked on the edges to reduce the background noise. Equalized Segment Masked Face Tracking In order to perform object tracking in real-time, an algorithm that was not very computationally expensive was desired. The algorithm chosen to perform this task was the Continuously Adaptive Mean Shift (CAMSHIFT) tracking algorithm. CAMSHIFT is a modified version of the mean shift algorithm. First, the image is captured from the camera by the frame grabber card. Using OpenCV functions, the image is then converted from its native RGB color space to the HSI color space. This is done because the hue value of all humans with skin pigment is in a certain well-defined range. Backprojection Original Face Detect After being captured and converted, the image processing begins. It first goes through skin color filtering with the skin segmentation algorithm. Skin segmentation is done using a 2-dimensional histogram of hue and saturation values that was generated from a sample set of skin images. After everything but skin tones are filtered from the image, a scaled black and white image is created. This density map is 4x smaller than the original image. Each pixel is determined to be either black or white dependent upon the percentage of skin pixels in a 4x4 region. A connected components algorithm is run on the density map to bound the regions skin. Software Configuration Each PC was running Gentoo Linux. The Intel OpenCV libraries were installed on both PCs. A GUI was created using OpenCV. The GUI displays the current image with detection and tracking information as well as a handy command window displaying all of the useful user interrupts. The algorithm utilizes a 3-dimensional histogram based on the hue, saturation, and intensity values of a training set. Based on the histogram, a grayscale image is generated where each pixel represents the probability that that pixel contains skin. This image is used to find and resize a search window that, through successive frames, tracks the object of interest. Once the connected components algorithm has been run, rectangular regions of skin tones are generated. Each of these is run through a face detection algorithm to determine if it is a region of interest. The face detection algorithm then confidently classifies each region into either a face or non-face category. SVC View Track with SVC Camera View Correspondence The region that the face detection algorithm most confidently classifies is passed to the tracking algorithm on the SVC. The SVC tracks the face until it leaves the scene or becomes occluded. The cameras are modeled using the pinhole camera model. A coordinate system is introduced for the translation of region of interest information between cameras. 3D depth information is extracted from 2D images based on the relationship between average face size and distance from the camera. After the SVC begins to track the face, it transmits the coordinates of the face to the OVC. The OVC then converts the coordinates to its own coordinate system, moves to find the face, and begins tracking the face with a higher zoom level. If it loses the face, it will notify the SVC and wait for a packet containing the latest coordinates of the face. It will then re-center the camera’s view on the face and, once again, begin tracking. Track with OVC OVC View Contributors and Resources • Contributors: • Dr. Czernikowski – Thank you for your advice • Dr. Savakis – Thank you for the project idea, equipment, and advice. • Paul Mezzanini – Thank you for administering our computers. • Yuriy Luzanov – Thank you for your guidance. • All the people who allowed us to take their pictures. • Resources: • Intel OpenCV Library – http://www.intel.com/research/mrl/research/opencv/ • SVM Light - http://svmlight.joachims.org/ Parameter computation for driving the pan and tilt angles of the OVC to the pixel center of the region of interest of the SVC is accomplished using geometric transformations and pixel-to-millimeter mapping information extracted from test images.