Estimating 3D Facial Pose in Video with Just Three Points

Estimating 3D Facial Pose in Video with Just Three Points Ginés García Mateos, Alberto Ruiz García Dept. de Informática y Sistemas P.E. López-de-Teruel, A.L. Rodriguez, L. Fernández Dept. Ingeniería y Tecnología de Computadores University of Murcia - SPAIN

Introduction (1/3) • Main objective: to develop a new method to estimate the 3D pose of the head of a human user: • Estimation through a video sequence • Working with the minimum necessary information: a 2D location of the face • A very simple method, without training, running in real-time: fast processing • Under realistic conditions: robust to facial expressions, light, movements • Robustness preferred to accuracy

Introduction (2/3) • 3D pose estimation using 3D tracking… 3D morphable mesh Active Appearance Model http://cvlab.epfl.ch/research/body http://www.lysator.liu.se/~eru/research/ Cylindrical Models Shape & texture models http://www.merl.com/projects/3Dfacerec/ www.cs.bu.edu/groups/ivc/html/research_list.php

Introduction (3/3) • In short, we want to obtain something like this: • The result is 3D location (x, y, x), and 3D orientation (roll, pitch, yaw): 6 D.O.F.

Index of the presentation • Overview of the proposed method • 2D facial detection and location • 2D face tracking • 3D Facial pose estimation • 3D Position • 3D Orientation • Experimental results • Conclusions

Overview of the Proposed Method • The key idea: separate the problems of 2D tracking and 3D pose estimation. 3D Pose estimation 2D Face tracking 2D Face detection The proposed 3D pose estimator could use any 2D facial tracker • Introducing some assumptions and simplifications, pose is extracted with very little information.

0 20 40 100 y 125 60 PH(x) 150 175 200 80 225 100 20 40 60 80 x 75 100 125 150 175 200 225 PV(y) 2D Face Detection, Location and Tracking Using I.P. • We use a method based on integral projections (I.P.), which is simple and fast. • Definition of I.P.: average of gray levels of an image along rows and columns. PVi : [ymin, ..., ymax] → R Given by: PVi(y) := i(·, y) PHi : [xmin, ..., xmax] → R Given by: PHi(x) := i(x, ·) i(x, y)

2D Face Detection with I.P. Global view of the I.P. face detector Step 2. Horizontal projection of the candidates Step 1. Vertical projections by strips Step 3. Grouping of the candidates Inputimage PVface Final result PHeyes

2D Face Detection with I.P. • To improve the results, we combine two face detectors: combined detector. Face Detector 1. Face Detector 2. Final detection Look for candidates Verify face candidates result Haar + AdaBoost[Viola and Jones, 2001] Integral Projections[Garcia et al, 2007]

ROC curves on UMU FaceDB (737 img./853 faces) 1 1 0.8 0.8 0.6 0.6 % detected faces % detected faces detectadas detectadas 0.4 0.4 0.2 0.2 caras caras % % 0.2 0.4 0.6 0.8 1 1.2 0.005 0.01 0.05 0.1 0.5 1 % false positives % false positives Detector IntProj Haar NeuralNet TemMatch Cont IP+Haar Haar+IP Det. ratio F.P.=0.5 84,2% 91,8% 88,6% 39,0% 24,8% 88,6% 96,1% TimePIV 2,6Gh 85 ms 293 ms 2338 ms 389 ms 120 ms 97 ms 296 ms 2D Face Detection with I.P. [Garcia et al, 2007]

PHojos(x) PH’ojos(x) 0 50 100 150 5 0 10 15 20 25 30 200 x 250 5 0 10 15 20 25 30 x 0 0 0 0 5 10 10 5 y 10 20 MVface(y) 10 20 y MHeyes(x) 15 15 30 30 0 10 50 20 20 50 150 250 50 150 250 20 60 100 140 20 60 100 140 30 y PV’face(y) PV’eyes(y) PVeyes(y) PVface(y) 40 50 10 20 30 40 x 60 100 200 2D Face Location with I.P. Global view of the 2D face locator Step 3. Horizontal alignment Step 2. Vertical alignment Step 1. Orientation estimation Input image and face Final result 100 150 200 250

2D Face Location with I.P. Location accuracy of the 2D face locator IntProj NeuralNet EigenFeat 323,6 ms 20,5 ms Av. time PIV 2.6Gh 1,7 ms

FACE TRACKING Initial face Prediction of Face Motion model detection&location new position relocation update Frame t+1 Correct tracking Lost face Step 1. Step 3. Step 2. Step 0. Vertical Orientation Horizontal alignment estimation alignment Prediction -20 -20 PHeyes(x) PH’eyes(x) 0 0 0 0 100 100 5 5 20 20 150 150 y 10 10 40 40 200 200 15 15 y 60 60 0 20 40 60 0 20 40 60 20 20 x x 50 150 250 50 150 250 25 25 PVface(y) PV’face(y) 30 30 50 150 250 50 150 250 PVeyes(y) PV’eyes(y) 2D Face Tracking with I.P.

2D Face Tracking with I.P. • Sample result of the proposed tracker. (e1x, e1y) = location of left eye; (e2x, e2y) = right eye; (mx, my) = location of the mouth 320x240 pixels, 312 frames at 25fps, laptop webcam

3D Facial Pose Estimation • In theory, 3 points should be enough to solve the 6 degrees-of-freedom (if focal length and face geometry are known). • But… • Location errors are high in the mouth for non-frontal faces. • Some assumptions are introduced to avoid the effect of this error.

3D Facial Pose Estimation • Fixed body assumption: fixed user’s body, moving the head  3D position is estimated in the first frame; 3D orientation in the following frames. • A simple perspective projection model is used to estimate 3D position.

3D Position Estimation • f: focal length (known) • (cx,cy): tracked center of the face p= (px,py,pz) (0,0,0) cx= (e1x+e2x+mx)/3 cy= (e1y+e2y+my)/3

3D Position Estimation • We have: cx/f = px/pz ; cy/f = py/pz • Where: cx= (e1x+e2x+mx)/3; cy= (e1y+e2y+my)/3 • So: px= (e1x+e2x+mx)/3·pz/f py= (e1y+e2y+my)/3·pz/f • The depth of the face, pz, is computed with: pz= f·t/r, where r is the apparent face size* and t is the real size. * For more information, see the paper. .

Estimation of Roll Angle • Roll angle can be approximately associated with the 2D rotation of the face in the image. e2y − e1y roll = arctan e2x − e1x roll = -43,7º roll = -2,8º roll = 15,9º roll = 34,6º • This equation is valid in most practical situations, but it is not precise in all cases.

Y (dxt,dyt) (dx0,dy0) i X r i Estimation of Pitch and Yaw • The head-neck system can be modeled as a robotic arm, with 3 rotational DOF. TOP VIEW ORTHOGRAPHIC VIEW FRONT VIEW Y Y Y X b b yaw c a pitch X X Z Z roll b b Z i • In this model, any point of the head lies in a sphere its projection is related to pitch and yaw.

Estimation of Pitch and Yaw • rw: radius of the sphere where the center of the eyes lies. • ri: radius of the circle where that sphere is projected. • (dx0, dy0): initial center of eyes. • (dxt, dyt): current center of eyes  rw= sqrt(a2+c2)  ri= rw·f/pz • ((e1x+e2x)/2, (e1y+e2y)/2) i i i Y Y Y (dx1,dy1) (dx0,dy0) (dx0,dy0) (dx0,dy0) (dx2,dy2) i i i X X X r i r i r i Initial frame pitch= 0, yaw= 0 Instant t = 2 Instant t = 1

Y (dxt,dyt) (dx0,dy0) i X r i Estimation of Pitch and Yaw • In essence, we have a problem of computing altitude and latitude for a given point in a circle. • The center of the circle is: (dx0, dy0 − a·f/pz) • So we have: pitch = arcsin • And: yaw = arcsin dyt − (dy0 − a · f/pz) - arcsin a/c ri dxt − dx0 ri · cos(pitch + arcsin(a/c))

Experimental Results (1/7) • Experiments carried out: • Off-the-shelf webcams. • Different individuals. • Variations in facial expressions and facial elements (glasses). • Studies of robustness, efficiency, comparison with a projection-based 3D estimation algorithm. • In a Pentium IV at 2.6Gh: ~5 ms file reading, ~3 ms tracking, ~0.006 ms pose estimation

Experimental Results (2/7) • Sample input video: bego.a.avi 320x240 pixels, 312 frames at 25fps, laptop webcam

Experimental Results (3/7) • 3D pose estimation results 320x240 pixels, 312 frames at 25fps, laptop webcam

Experimental Results (4/7) Proposed method Projection-based Proposed method Pitch Projection-based

Experimental Results (5/7) • Range of working angles… • Approx. ±20º in pitch and ±40º in yaw. • The 2D tracker is not explicitly prepared for profile faces!

Experimental Results (6/7) • With glasses and without glasses

Experimental Results (7/7) • When fixed-body assumption does not hold • Body/shoulder tracking could be used to compensate body movement.

Conclusions (1/3) • Our purpose was to design a fast, robust, generic and approximate 3D pose estimation method: • Separation of 2D tracking and 3D pose. • Fixed-body assumption. • Robotic head model. • 3D position is computed in the first frame. • 3D orientation is estimated in the rest of frames. • Estimation process is very simple, and avoids inaccuracies in the 2D tracker.

Conclusions (2/3) • Future work: using the 3D pose estimator in a perceptual interface.

Conclusions (3/3) • The simplifications introduced lead to several limitations of our system, but in general… • Human anatomy of the head/neck system could be used in 3D face trackers. • The human head cannot move independently of the body! • Taking advantage of these anatomical limitations could simplify and improve current trackers.

Last • This work has been supported by the project Consolider Ingenio-2010 CSD2006-00046, and TIN2006-15516-C04-03. • Sample videos: http://dis.um.es/~ginesgm/fip • Grupo PARP web page: http://perception.inf.um.es/ Thank you very much

Estimating 3D Facial Pose in Video with Just Three Points

Estimating 3D Facial Pose in Video with Just Three Points

Presentation Transcript

Estimating Human Shape and Pose from a Single Image

Three Points

Pose Estimation and Segmentation of People in 3D Movies

3D Video Overview

POSe

More than just points

Pose Estimation Using Four Corresponding Points

3d Pose Detection

Graphing Points in Three Variables

Three Points

Three Dimensional (3D)

Three-dimensional Shapes (3D)

Creating Simulations with POSE

Just Pose

BLOODPIT - THREE POINTS

Three points for today

Gagnon makes three points

3D HUMAN BODY POSE ESTIMATION BY SUPERQUADRICS

3d Animation Video

Creating Simulations with POSE

3D Walkthrough Video