J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard, T. Moeslund and M. M. Trivedi

A Two-level Pose Estimation Framework Using Majority Voting of Gabor Waveletsand Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard, T. Moeslund and M. M. Trivedi Computer Vision and Robotics Lab University of California, San Diego La Jolla, CA, U.S.A.

Applications: • Intelligent meeting room • Driver’s focus analysis • Problem setup: • Pose is determined by both pan angle (ß) and tilt angle (α) • For attention focus problem, both angles need to be determined in a fine scale INTRODUCTION

93 poses in total • Pan angle goes from –900 to +900 with discrete interval of 150 • Tilt angle goes from –900 to +900 with interval steps of either 150 or 300 • For every pose, 15 images from 15 subjects are used as training samples while another 15 images from the same 15 subjects are used as the testing samples PROBLEM DESCRIPTION

Second level output First level output POSE ESTIMATION FRAMEWORK • Two level structure---coarse to fine: • First level: pose estimation is accurate up to a 3x3 neighborhood • Second level: accurate pose is determined in the 3x3 neighborhood

FIRST LEVEL: Multi-resolution subspace classification by majority voting • Motivation: • Features from one single resolution are not sufficient • Finer resolution: salient features are less addressed • Coarser resolution: loss of information • For features from different resolutions, different sets of salient features are addressed • They are equally important for classification • Algorithm details: • Multi-resolution feature extraction • Gabor wavelets----multi-scale and multi-orientation analysis • Subspace feature extraction • PCA subspace feature extraction • KDA subspace feature extraction • Nearest prototype classification in every resolution and majority voting for classification results from different resolutions

; GABOR WAVELETS ANALYSIS • Why Gabor wavelets: • A joint spatial frequency representation • Extract the position and orientation of both global and local features as well as preserving frequency information. • What is Gabor wavelets: • A convolution of the image with a family of Gabor kernels • All Gabor kernels are generated by a mother wavelet by dilation and rotation • Mother wavelet: a plane wave generated from a complex exponential and restricted by a Gaussian envelope

PCA V.S. KDA • Why subspace analysis: • Extract the most discriminating information • Reduce the dimensionality • PCA: • Linear transformation • The first M eigenvectors of the samples’ covariance matrix • Selects the directions that have most variance • Why not PCA: • Not capable of extracting the non-linear structure • Not necessarily the best discriminating features for classification • KDA: • Non-linear variant of LDA • Finds the projection according to the Fisher’s criterion, which maximizes the Rayleigh coefficient

; where: is called kernel. (Gaussian kernel is used here) PCA V.S. KDA (Contd.) • Rayleigh coefficient: • Introduce kernel:

SECOND LEVEL: Structural landmark analysis by bunch graph template matching • Motivation: • To refine the estimate from the first level • Geometric structure is able to catch the small difference between neighboring poses • Bunch graph: • Geometric relationship between salient facial points is used • For each pose, a model bunch graph is constructed • Nodes: salient facial points • Edges: distance information between nodes • The bunch graph for the testing image is compared with a subset of the model bunch graphs • The model template that results the highest similarity score determines the final pose estimate ;

3X3 5X5 PCA Subspace EXPERIMENTAL EVALUATION

3X3 5X5 KDA Subspace EXPERIMENTAL EVALUATION PCA: 85.16% PCA: 97.71%

nement SECOND LEVEL EVALUATION

58.02% SECOND LEVEL EVALUATION

CONCLUSION AND DISCUSSION • Visual cues characterizing facial pose have unique multi-resolution spatial frequency and structural signatures • In the first level, the statistical multi-resolution subspace analysis gives the pose estimation with an uncertainty of ±15 degree, 90.32% accuracy is achieved • In the second level, the structural details are exploited to eliminate the uncertainty, 58.02% accuracy is achieved • In the first level, the face registration is done manually, automatic face registration by facial landmark detection algorithm is under investigation and some promising preliminary results have been obtained

THANK YOU!

J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard, T. Moeslund and M. M. Trivedi

J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard, T. Moeslund and M. M. Trivedi

Presentation Transcript

J-M R. D-BTP