Efficient Prediction Structure for Multi-view Video Coding

Efficient Prediction Structure for Multi-view Video Coding Philipp Merkle, AljoschaSmolicKarstenMüller, Thomas Wiegand CSVT 2007

Outline • Multi-view video coding (MVC) introduction • Requirements and test conditions for MVC • Prediction structures • Experimental results • Conclusion

MVC Introduction • MVC: Multi-view Video Coding • Multi-view video (MVV): A system that uses multiple camera views of the same scene is called. • Usage: 3DTV, free viewpoint video(FVV), etc.

Requirements for MVC • Temporal random access • View random access • Scalability • Backward compatibility • Quality consistency • Parallel processing

Temporal and inter-view correlation temporal/inter-view mixed mode temporal/inter-view mixed mode Temporal T Inter-view T T

Temporal and inter-view correlation analysis • H.264/AVC encoder was used with the following settings: • Motion compensation block size of 16*16 • Search range of ±32 pixels • Lagrange parameter (λ) of 29.5 • denotes the decrease of the average in comparison to temporal prediction only.

Temporal and inter-view correlation analysis (cont’d) • Simply including temporal and inter-view prediction modes

Lagrangian cost function • Lagrangian cost function: • D denotes distortion. • R denotes number of bits to transmit all components of the motion vector. • For each block in a picture, algorithm chooses MVwithin a search rage that minimizes . • The distortion in the subject macroblock B is calculated by: (1) (2) (3)

Test data and test conditions • 1D camera: Ballroom, Exit, Rena, Race1, Uli, (line) Breakdancers (arched) • 2D camera: Flamenco2 (cross), AkkoKayo (array) • Use 5 to 16 camera views • Target high quality TV-type video (640*480 or 1024*768) then limited channel communication-type video.

Knowledge – hierarchical B picture, QP cascading • Hierarchical B picture, key picture, non-key picture: • QP cascading : [1] key picture key picture [1] “Analysis of hierarchical B pictures and MCTF”, ICME 2006, IEEE International Conference on Multimedia and Expo, Toronto, Ontario, Canada, July 2006

Knowledge – DPB size • Decoded Picture Buffer (DPB) size is increased to:[2] Memory-efficient reordering of multi-view input for compression [2] “Efficient Compression of Multi-view Video Exploiting Inter-view Dependencies Based on H.264/AVC”, ICME 2006, IEEE International Conference on Multimedia and Expo, Toronto, Ontario, Canada, July 2006

Two tasks • To adapt the multi-view prediction schemes to the specific camera arrangements of the test data sets. • To adapt the prediction structures to the random access specification.

Prediction structure • Simulcast coding structure • To allow synchronization and random access, all key pictures are coded in intra mode.

Prediction structure (cont’d) • The first view is called base view (remains the I frame).

Prediction structure (cont’d) • Alternative structures of inter-view for key pictures Linear camera arrangement 2D Camera array KS_IPP KS_PIP KS_IBP KS_IPP KS_PIP KS_IBP

Prediction structure (cont’d) • Inter-view prediction for key and non-key pictures AS_IPP mode

Experimental results – objective evaluation Average coding gains compared with anchor coding Ballroom test result

Experimental results – subjective evaluation • Different bit-rates were selected for the different data sets. Ballroom test result Race1 test result

Experimental results – subjective evaluation • AS_IBP outperforms the anchors significantly. • The gain decreases slightly with higher bit-rates. Average results over all test sequences

Influence of camera density • Using Rena sequence, and consisting of 16 linear arranged cameras with a 5 cm distance between two adjacent cameras • Repeated for each shifted set of 9 adjacent cameras • The structure are applied to every time instance of the MVV sequence without temporal prediction.

Results of experiments on camera density • Coding gain increases with decreasing camera distance and decreasing reconstruction quality.

Results of experiments on camera density (cont’d) • Results of average per camera rate relative to the one camera case(→) • Alarger QP value leads to a larger coding gain

Conclusion • Resulting multi-view prediction: achieving significant coding gains and being highly flexible. • Parallel processing is supported by the presented sequential processing approach. • Problems: • Large disparities between the different views of multi-view video sequences • Illumination and color inconsistencies across views

Efficient Prediction Structure for Multi-view Video Coding

Efficient Prediction Structure for Multi-view Video Coding

Presentation Transcript

Distributed Video Coding

Overview of Multi-view Video Coding

Video Coding

Structure Prediction

Efficient Coding Schemes for Flash Memories

Video Coding on Multi-core Graphics Processors

Prediction-based coding

Research on the Motion Estimation Algorithm in the Multi-View Video Coding

Efficient Scalable Video Compression by Scalable Motion Coding

Protein structure prediction: the customer view

Video coding

High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Video Coding

Efficient Prediction Structure for Multi-view Video Coding

Video coding

Fine-granular Motion Matching for Inter-view Motion Skip Mode in Multi-view Video Coding

Video Coding Concept

Video Coding Concept

Structure Prediction

Spread Spectrum-based Multi-bit Watermarking for Free-view Video

Source-Channel Prediction in Error Resilient Video Coding

Structure Prediction