Image Compression and Graphics: More Than a Sum of Parts?

Image Compression and Graphics:More Than a Sum of Parts? Bernd Girod Collaborators:Peter Eisert, Marcus Magnor, Prashant Ramanathan, Eckehard Steinbach (all Stanford), Thomas Wiegand (HHI) Image, Video, and Multimedia Systems Group Information Systems Laboratory Stanford University

2048 triangles Can 3-D Geometry Help to Compress Images? Conjecture: 3-d geometry models help compression, if a single 3-D model captures the dependencies between many views (or frames of a sequence).

Outline of this Talk • Compression of many simultaneous views (e.g. light-fields) • Encoding view-dependent texture maps with 4-d wavelets • Hierarchical image-domain light-field coder • Why image-domain encoding is (usually) superior to texture-map encoding • Model-based compression of talking head sequences • Modeling and estimation of facial expressions • Avatars • Incorporate synthetic video into motion-compensated hybrid coding

Multi-View Image Capture • Coding schemes suitable for • 2-plane parametrization • Hemispherical image arrangement • (arbitrary recording positions)

Camera views: No correlation between corresponding pixels Align Views by Mapping onto Object Surface View- dependent texture map: Strong correlation between corresponding texels

3-D Reconstruction from Many Views Volumetric Reconstruction • processes all views simultaneously • exploits texture and silhouette information • yields solid 3-D voxel model • Subdivide object’s bounding box into voxels • Generation of multiple hypotheses for each voxel • Hypothesis elimination by projecting visible voxels into light-field images • Iterate over all voxels until remaining hypotheses are “photo-consistent”

voxel model 128 triangles 512 triangles 2048 triangles 8192 triangles Surface Representation • Initial octahedral geometry • Geometry refinement • determinevertex normals • move vertices to model surface • subdivide triangles

Warp of each image into a view-dependent texture map • Texture map correlated in 4-d • Interpolate missing texels • 4D Haar Wavelet Transform Texture Map Encoding with 4-d Wavelets • Arrange images into 2-d array • Embedded encoding of wavelet coefficients (4D-SPIHT)

Results: Wavelet Texture Map Encoder Reconstruction quality in luminance PSNR (dB)

Results: Wavelet Texture Map Encoder

28.6 dB 0.0076 bpp 26.3 Kbytes 36.6 dB 0.213 bpp 736 Kbytes Progressive Decoding

Given: geometry model, reference images • Render geometry for reference images and prediction image • For each pixel: determine triangle, coordinates ? • Find corresponding pixels in reference images • Copy & average visible pixels Align Views by Model-aided Prediction

Hierarchical Image Coding Order • project camera positions on hemisphere • subdivide into 4 quadrants • INTRA-encode corner images • encode center image • image prediction • residual error coding • encode mid-side images • subdivide into sub-quadrants • encode center and mid-side images • subdivide repeatedly

Residual-Error DCT Coder Residual-Error Decoder Image Buffer Compressed Geometry Model Geometry Decoder 3-D Geometry Reconstruction Geometry Coder Model-aided Image-Domain Light-Field Coder Light-Field Image I[u,v] DCT Coefficients - Multiframe Disparity Compensation Disparity Map Generation

Picture Quality original Mouse light field 257 RGB images, 384x288 pixels 81.3 Mbytes compressed 300:1 0.077 bpp (267 KBytes) 37.9 dB PSNR

Texture Model-aided Model-aided vs. Texture Coding

40 % 70 % Natural vs. Synthetic Image Set

9 dB 7 dB 2 dB 2 dB Inaccurate Geometry

Model-based videophone

Modeling of Facial Expressions • Head geometry composed of 101 triangular B-spline patches • Facial expressions by superposition of 66 FAPs (Facial Animation Parameters) according to MPEG-4 standard • FAPs act on control points of triangular B-spline patches

Estimation of Facial Expressions Displacement field constrained by FAPs Linearize for small FAPs Optical flow constraint equation • Solve overdetermined system by linear regression • Apply iteratively in analysis-synthesis loop • Incorporate spatial resolution pyramid

Results: Peter Original Synthesized Sequence: Peter, 230 frames, CIF resolution, 25 fps Compressed 25,000:1 1.2 kbps - 32.8 dB PSNR

Results: Eckehard Original Synthesized Sequence: Eckehard CIF resolution, 25 fps 1.1 kbps, 32.6 dB PSNR

Results: Peter as Eckehard Original Synthesized Sequence: Peter, 230 frames, CIF resolution, 25 fps

Results: Eckehard as Peter Original Synthesized Sequence: Eckehard CIF resolution, 25 fps

Results: Peter as Akiyo Original Synthesized Sequence: Peter, 230 frames, CIF resolution, 25 fps

. . . But, What About Unknown Objects? Original Synthesized Sequence: Clap 1.2 kbps

Model- based Coder Model-based Decoder FAPs Model-Aided Coding:Incorporating Synthetic Video into MC Hybrid Coding Coder Control Control data (incl. motion vectors) Input Video Intraframe DCT Coder DCT coefficients e - Intraframe Decoder Multiframe Motion Compensation Decoder

R-D-Optimal Mode Decision Selection Mask minimizing D+lR Predicted frame Previous decoded frame Synthesized frame

Results: Peter H.263 (TMN-10) @ 12 kbps Model-Aided Coder @ 12 kbps Sequence: Clap, 8.33 fps, CIF resolution

Results: Akiyo H.263 (TMN-10) @ 10 kbps Model-Aided Coder @ 10 kbps Sequence: Akiyo, 10 fps, CIF resolution

~ 35% ~ 40% R-D Performance of Model-Aided Coder Sequence: Peter Sequence: Akiyo

Conclusion Can 3-d geometry help to compress images? YES . . . . . . IF many views of the same 3-D object/scene shall be compressed. • Applications in • Multiview image coding (light-field compression) • Compression of video sequences • Very high compression ratios (100:1 . . . 25,000:1) • Require accurate vision algorithms for 3-d reconstruction • Image-domain compression more resilient against inaccurate geometry and hence more practical than texture-map encoding

. . . THE END

Image Compression and Graphics: More Than a Sum of Parts?