900 likes | 1.09k Vues
3D Human Face Reconstruction and Expression Modelling. Alexander Woodward. Outline. Aim System overview Related work 3D face reconstruction Expression modelling Contributions and future work. Overview. Aim: Integrated system for 3D face reconstruction and expression modelling
E N D
3D Human Face Reconstruction and Expression Modelling Alexander Woodward
Outline • Aim • System overview • Related work • 3D face reconstruction • Expression modelling • Contributions and future work
Overview • Aim: Integrated system for 3D face reconstruction and expression modelling • Vision based not graphics based • Low cost and self-contained • Results can be applied to: • Biometrics and security • Biomedical visualisation • Computer and video games • Film • Teleconferencing • Human computer interaction
System overview 3D reconstruction Static Dynamic Active & passive binocular stereo 3D video scanner Active structured lighting Active photometric stereo 3D data Expression modelling Video based Marker based motion capture Sequences from 3D video scanner Muscle inverse kinematics
Related work • Complete systems for face reconstruction and animation are uncommon • High hardware requirements • Data acquisition, motion capture and animation systems are often provided as disparate packages or only as a service, cf. a stand-alone solution • At least 9 prominent projects aimed toward complete systems • Excluding in-house solutions • Large body of work in 3D face research • 3D reconstruction, expressions, motion capture Department of Computer Science
Related work • Borshukov et al(2003 – 2007) Playable Universal Capture approach • 3D scanner, marker based tracking, optical flow, video texture • Ma et al(2007, 2008) Capture face reflectance • 3D scanner, photometric stereo, motion capture • Light stage – 156 LED lights over an icosahedron • Image Metrics Inc. & U Sth Carolina Graphics Lab (2008) Digital Emily project • Light stage captures geometry and reflectance • 33 expressions captured; creates an animation rig • Performance data mapped to the 3D face
3D reconstruction 3D reconstruction Static Dynamic Active & passive binocular stereo 3D video scanner Active structured lighting Active photometric stereo 3D data Expression modelling Video based Marker based motion capture Sequences from 3D video scanner Muscle inverse kinematics
3D reconstruction requirements • Off-the-shelf hardware, no special properties • Cameras, PC, projector • Low acquisition time – faces move, esp. children • Controlled lighting • Vision based • New algorithms • Useful for any type of object 8
Static 3D reconstruction 3D reconstruction Static Dynamic Active & passive binocular stereo 3D video scanner Active structured lighting Active photometric stereo 3D data Expression modelling Video based Marker based motion capture Sequences from 3D video scanner Muscle inverse kinematics
Static 3D reconstruction • Evaluated approaches: 1. Active & passive binocular stereo 2. Active structured lighting • 3. Active photometric stereo • Ground truth data: 3D scanner • Evaluate effectiveness • Accuracy, time complexity • Determine best approach for dynamic • 3D reconstruction system • 12 algorithms • Database of 15 faces • Alternative test set • Focus on stereo algorithms • Compared to Middlebury, algorithms rank differently for faces • Projected patterns improve and level out performance
Active binocular stereo • Strip colour pattern: much higher accuracy • SAD correlation algorithm: SAD - without pattern SAD - with strip pattern Strip pattern 80% 92% • Pattern colour should contrast strongly on skin
Statistical results + Grad. + Strip BP + Grad. + Strip CM DPM + Grad. + Strip 73% 77% 89% 88% 89% 92% 79% 84% 92% GC SAD SDPS + Grad. + Strip + Grad. + Strip + Grad. + Strip 77% 83% 92% 80% 85% 92% 89% 90% 93% FCV Shapelet Four Path Gray code 69% 54% 71% 97% Ground truth
Dynamic 3D reconstruction 3D reconstruction Static Dynamic Active & passive binocular stereo 3D video scanner Active structured lighting Active photometric stereo 3D data Expression modelling Video based Marker based motion capture Sequences from 3D video scanner Muscle inverse kinematics
Dynamic 3D reconstruction • Reconstruction at video rates →3D video! • From static reconstruction best results: • ‘One shot’ active illumination + Symmetric Dynamic Programming (SDPS) • Project pattern every other frame to get a clean texture (2) (3) (1) Monochrome stereo pair of video cameras + 3rd colour web camera obtains colour texture.
Colour texture generation • Low resolution colour information combined with high resolution luminance information • Next step: colour video cameras → + Colour image (reprojected into same reference frame) Monochrome image Final texture
Marker based expression modelling 3D reconstruction Static Dynamic Active & passive binocular stereo 3D video scanner Active structured lighting Active photometric stereo 3D data Expression modelling Video based Marker based motion capture Sequences from 3D video scanner Muscle inverse kinematics
Marker based expression modelling • Data driven: • Stereo web-cameras, face markers. • Head motion - rigid • Expressions - non-rigid • Tracked 3D points • Unique 3D face model mapping • Virtual muscle animation • 17 active muscles • Muscle inverse kinematics (IK) – Jacobian Transpose
Example videos Happiness – easiest to reproduce Surprise
Anger – needs teeth! Disgust – pursing of mouth & closing of eyes not represented
Video based expression modelling 3D Reconstruction Static Dynamic Active & passive binocular stereo 3D video scanner Active structured lighting Active photometric stereo 3D data Expression modelling Video based Marker based motion capture Sequences from 3D video scanner Muscle inverse kinematics
3D video based expression modelling Image blending Novel face expressions from multiple video sequences Interactive Low preparation Not data driven Dense depth data – cf. marker system Video based → realistic 3D movement and texture Reconstruction data directly used for expression modelling Sub-region masks 11 control points Control points 23
Synthetic expression results Sadness: lower face region, anger: right eye region, surprise: left eye region Happiness: lower face region, surprise: left and right eye regions 24
Synthetic expression results Fear: lower face region, happiness: right eye region, anger: left eye region Disgust: lower face region, anger: left and right eye regions
Contributions 3D face reconstruction and expression modelling system Unique tool-set Low-cost, off-the-shelf Vision based To 3D face reconstruction: Extensive reconstruction comparison Face database Dynamic reconstruction system for 3D video: SDPS + pattern To expression modelling: Marker based performance capture system Muscle based IK animation system, unique mapping approach Video based expression system – realistic, less flexible 26
Future work and perspective • Many areas for future research • Refine hardware - better reconstructions ( low-cost? ) • Markerless motion capture - face ( feature ) tracking • Statistical analysis on video data • Active appearance model (AAM) • New animation system (out of scope) • Full body → complete character • Synergy of computer vision and computer graphics! • Physical models for animation • Computer vision tools • Especially 3D video & markerless motion capture
Universal expressionsEkman - 1987 Sadness Anger Happiness Fear Disgust Surprise Recognisable in every culture! Used as exemplar expressions to judge my results
Types of binocular stereo algorithm • Local vs global optimisation • WTA • SAD, SSD • Chen-Medioni – • local method with explicit surface constraints • Seed propagation approach • Dynamic programming – 1D optimisation • SDPS – markov chain • DPM • Cubic algorithms – 2D optimisation • Markov random field • Energy minimisation • Graph-cut (KZ1, RoyCox), Belief Propagation,
Types of photometric stereo algorithm • Experiment focused on integration methods • Assumes C² continuity – i.e. a smooth second derivative • Local optimisation – based on curve integrals • Four path integration • Shapelet • Explicit summation of basis functions • Global optimisation • FCV – Frankot Chellappa Variant
Body modelling and animation • Body: generic skinned animation • Skeletal hierarchy, fully articulated • The bones of the hand • Each bone of the skeleton has a region of influence, denoted in green • Movement of the forearm • The body model with underlying skeleton Department of Computer Science
Interactive personalised avatar creator • Input photograph • RBF mapping
Results Department of Computer Science
3D video based expression system overview • Acquire sequences of individual expressions using dynamic 3D face reconstruction system. • Expression sequences start from a neutral state. • Test subject’s head remains in the same position for every sequence • A reference texture and depth map are taken from the neutral expression and used as the base for all image regions • 11 control points are manually annotated on video sequences. • Future work to automate this process. • Six sub-regions manually defined on the face. • A sub-region’s texture and depth updated by dragging a control point residing in it and its currently chosen expression sequence. Department of Computer Science
System conclusions • Sinusoidal interpolation instead of a linear one. This roughly models the biphasic nature of skin • Realistic animations are created as motion is derived from 3D video sequences of real-life test subjects. • A user can create unnatural but interesting looking expressions that can convey a comical feel • Texture maps sourced from video sequences solves the loss of detail in the marker based approach • However, apart from the control points that were manually specified, no points on the face surface are tracked • Results could be refined by improving the quality of 3D video reconstruction. 21 October 2014 Department of Computer Science 41
Test subject placement • Subject can be placed with knowledge of required view area, sensor size, and camera lens:
RBF mapping approach • Radial Basis Functions • User specified point correspondences on generic model and 3D face data • Specify divergences between data • For each dimension (in 3D) • Find RBF approximation of (1D) displacements within the 3D space of specified points. • Using this RBF approximation all 3D points from the generic model can be mapped to the 3D face data proportions
Rigid and non-rigid motion Anchor markers: Rigid orientation: Remove rigid motion by using transpose of orientation and centre of gravity of anchors
Muscle inverse kinematics • Forward kinematics is the calculation of a new position g of an end effector by specifying updates to parameters of a kinematic chain • Inverse kinematics is the calculation of parameters for a kinematic chain to meet a desired goal position g, when starting from an initial position e. • Kinematic chain consists of joints • Each joint has DOF’s – its animitable parameters, • E.g. 3-DOF for position, 1-DOF for orientation around one axis (position of joint implied through kinematic chain transformation)
Jacobian Transpose approach • FK: • IK: e = current end effector position g = goal end effector position d = change in end effector position First order estimate in positional change: Change in parameters: Jacobian Transpose estimate:
Estimate assured to move closer to the goal g: Always moving in a direction less than 90 degrees from d Department of Computer Science