A Pragmatic Spatial-Random-Access-Enabled Video Coding Scheme

A Pragmatic Spatial-Random-Access-Enabled Video Coding Scheme Piyush Agrawal EE398A – Project Presentation

High resolution video - challenges • Rise of high resolution videos • Better digital imaging sensors • Increasing storage capacity • Algorithms and systems for stitching ultra-high resolution videos using multiple cameras • Challenges in using such videos • Lack of network bandwidth • Lack of high resolution display screens • Solution: Interactive Region-of-Interest video streaming

Agenda • Spatial-random-access enabled video coding • Related work • Proposed schemes • Experimental results • Discussion on pros and cons of different schemes

Spatial-random-access enabled video coding – why? • One way of providing interactive region-of-interest streaming • Decode entire high resolution video on the fly, for each user • Crop relevant part of the high resolution frame • Encode the relevant part again and transmit • Drawbacks • Multiple encodings required • Not scalable with increasing no. of simultaneous viewers • Required • A scheme which performs encoding only once and then can serve any user, any no. of times

Related work – Eusipco’07 Source: Aditya Mavlankar, Pierpaolo Baccichet, David Varodayan, and Bernd Girod. Optimal slice size for streaming regions of high resolution video with virtual pan/tilt/zoom functionality. In Proc. 15th European Signal Processing Conference (EUSIPCO07), pages 1275–1279, 2007.

Related work – PCS’09 Source: Aditya Mavlankar, Peer-to-Peer video streaming with interactive region-of-interest, PhD Dissertation (un-published) • 85% reduction in storage requirements, compared to Eusipco’07 scheme

Drawbacks • Not compliant to any video coding standards • Require custom encoder and decoder • Decoder complexity • Cross resolution layer dependencies • Scaling operation for rendering each frame • Difficult to implement multi-thread/process/CPU parallel encoder for real-time encoding • Entire frame processed as a whole

Proposed scheme - ViewXtreme

Parallel encoder

Optimization to ViewXtreme scheme using Adaptive Skip Mode

How to detect static segments? • Consecutive frame differencing • Calculate mean pixel difference value • If below a fixed threshold, declare as static • Smoothing • Video shot in bad lighting conditions – too much noise • Leads to high frame difference even with no “actual” motion • Apply Gaussian smoothing filter to each tested frame • How to find the fixed threshold • MSE of 1 gives PSNR = 48 dB • Consider two consecutive frames as original and reconstructed signal respectively • PSNR of 48 dB means the two signals look alike, i.e no motion between the two frames • Other ideas • Structural Similarity Index Measure (SSIM)

Experimental setup • Compare 4 schemes • ViewXtreme • ViewXtreme Adaptive Skip • UpwardPredictionOnly (EUSIPCO’07) • BE-LTMMCP (PCS’09) • Test video: 600 frames, classroom scene • Results only for highest resolution layer (1920x1080) • Slice size: 480x270 pixels • QP for base layer = 27 • Effects performance of BE-LTMMCP and UpwardPredictionOnly schemes • GOP size = 30 frames • Effects performance of ViewXtreme and ViewXtreme Adaptive Skip schemes • Encoded video: 30 frames per second

Coding efficiency

Coding efficiency – benefits of skip mode

Encoding speed Encoding done on a quad core machine, with 4GB RAM

Pros and cons of proposed schemes • Pros • Standard compliant encoder and decoder • Simplified decoder • Highly parallel encoder possible using off-the-shelf encoding tools • Significantly better (~66%) coding efficiency, leading to small network bandwidth required • Cons • Expectedto provide lower degree of spatial-temporal-random access as compared to other schemes • Use of motion compensated prediction coding Can we confirm this?

A deeper dive into random access • Logical operations performed to render a random frame • Download bits required to decode the single random frame • Decode bits and create the reconstructed frame in memory • Render reconstructed frame on the client’s display • Rendering of reconstructed frame (step 3) – independent of coding scheme – can be ignored • ViewXtreme and BE-LTMMCP schemes differ in step 1 and 2

Differences • BE-LTMMCP • Each frame independent of another frame (on same resolution layer) • Encoded bits corresponding to only the single random frame to be downloaded • Decoding of single frame needed • Mean size of a random frame can be estimated from bits per pixel for different quality levels • ViewXtreme • No. of required bits (to be downloaded) depend on GOP structure

ViewXtreme – dependence on GOP • Frames to be downloaded • Frame 1: 1 I-frame • Frame 4: 1 I-frame + 2 P-frames + 1 B-frame • Frame 6: 1 I-frame + 4 P-frames + 1 B-frame • All frames of a GOP equally likely to be requested • Estimate no. of bits to be downloaded if median frame is requested • No B-frames used in experiments, GOP size = 30 frames • For 15th frame: 1 I-frame + 14 P-frames required • Mean size of a single I-frame and P-frame measured in experiments

Data needed to render single random frame

Effect of decoding multiple frames • For a random frame to be displayed • On average, 15 frames to be decoded (GOP size = 30) • Benchmark on a single core (2.4Ghz) client • Decoding rate upto 500 fps for a 480x270 pixel video encoded using H.264 • Time needed to decode 15 frames = 15 * 1/500 seconds = 30 msec • Less than inter-frame interval of 33 msec (for playing video at 30 fps) • Decoding time negligible compared to data download time • Conclusion: ViewXtreme scheme provides higher degree of spatial-temporal random access

Conclusions • Proposed 2 new coding schemes for spatial-random-access • Compared with two state-of-the-art schemes • Showed that the proposed schemes outperform other schemes in terms of • Coding efficiency • Standard compliance • Encoder and decoder complexity • Degree of spatial-temporal random access • Future work • Better ways of detecting static segments • Better architectural designs for encoders running on commodity machines

Acknowledgements • Prof. Bernd Girod • Mina Makar • Aditya Mavlankar • Derek Pang Thank You!

A Pragmatic Spatial-Random-Access-Enabled Video Coding Scheme

A Pragmatic Spatial-Random-Access-Enabled Video Coding Scheme

Presentation Transcript

Distributed Video Coding

Video Coding

Wyner-Ziv Coding of Light Fields for Random Access

Random access memory

A Data Embedding Scheme for H.263 Compatible Video Coding

Random Access Files

Random Access Memory

Random Access protocols

Random Access Networks

Video coding

Two-Dimensional Channel Coding Scheme for MCTF-Based Scalable Video Coding

Video Coding

RANDOM ACCESS TECHNIQUES

Random Access

Spatial-enabled Mining in Oracle

Video coding

Video Coding Concept

Video Coding Concept

Video Coding Standards

Random File Access

Short Distance Intra Coding Scheme for High Efficiency Video Coding

Random File Access