Bi-level Video: Video Communication at Very Low Bit Rates

Bi-level Video:Video Communication at Very Low Bit Rates Jiang Li; Gang Chen; Jizheng Xu; Yong Wang; Hanning Zhou; Keman Yu; King To Ng; Heung-Yeung Shum

OUTLINE • INTRODUCTION (Wireless Video) • Scalable Video Coding • Introduction (Bi-level Video) • Approaches • Experimental Result • Conclusion

INTRODUCTION • Qos requirements for delivery of real-time video • Bandwidth: minimum bandwidth requirements (e.g. 28 kb/s) • Delay constraints (e.g. 1 s) • Upper limits on bit error rate (e.g. 1%)

INTRODUCTION (cont.) • Wireless Channel • Unreliability • Noise, Multipath and shadowing makes BER high • Bandwidth Fluctuation • Mobile terminal moves between different networks • Multipath fading, co-channel interference, noise disturbances… • Heterogeneity • Receivers may be different in terms of latency requirements, visual quality requirements, processing capabilities, power limitations, and bandwidth limitations…

INTRODUCTION (cont.) • Heterogeneity (video conferencing) Receiver Receiver Unicast video distribution using multiple point-to-point connections link1 link2 Sender Receiver Receiver Receiver

INTRODUCTION (cont.) Receiver Receiver Multicast video distribution using point-to-multipoint transmission Sender link1 link2 Receiver Receiver Receiver Lack of flexibility in multicast: Receiver may be different in terms of latency requirements, visual quality requirements, processing capabilities, power limitations, and bandwidth limitations…

INTRODUCTION (cont.) • Solutions • Scalable Video Coding • Network-Aware Adaptation of End Systems • Knowing the current status of network resources (e.g. bit error condition, available bandwidth) • Adapting video streams based on network status • Adaptive Qos Support from Networks • Adapting video streams during periods of QoS fluctuations and handoffs

Scalable video coding basic Enhancement 1 Enhancement 2

An example for multicast Scalable video coding (cont.) basic Enhancement 1 Enhancement 2

Scalable video coding (cont.) • Scalable video coding mechanisms • SNR Scalability • Degraded quality • Spatial Scalability • Smaller image size • Temporal Scalability • Lower frame rate

Introduction for Bi-level Video • Problems on Wireless Networks at Vary Low Bit Rates(mobile phone, palm-size PC …) • Low bandwidth (e.g. 9.6k bps) • Weak computational power • Short battery lifetime • Limited display capability (only support for several colors or black/white display)

Introduction for Bi-level Video (cont.) • Using MPEG1/2/4 or H.261/263 • Not smooth • Look like a collection of color blocks (only basic colors are preserved)

Introduction for Bi-level Video (cont.) • Outline features of scenes are more important than basic colors of blocks • Bi-level image A gray-scale image Bi-level image

Approaches • We generate a bi-level image sequence from a gray-scale image sequence by thresholding • Key problem • Noises and flickers in the bi-level image sequence must be filtered out (∵consuming too many bits) • How to convert a bi-level image sequence and process it so that it is easy to be compressed and still keeps acceptable perception quality

Approaches (cont.) step1 step2 step4 step3

Approaches (cont.) • Static Region Detection and Duplication • Detecting static regions in the current frame and duplicate their pixels from the previous fame • Filtering noise and flickers out • Bit-rate control

Approaches (cont.) • Dissimilarity threshold • The threshold of the difference between corresponding pixels in two successive frames • The higher the dissimilarity threshold is, the more pixels are viewed as being similar to corresponding pixels in the previous frame, and the lower bit-rate the generated bit stream is 5 Thresh = 2 2 5 Frame j Thresh = 6 2 Frame j-1 Frame j

Approaches (cont.) • Calculating the difference between the j & j-1 frames • Laplacian (拉普拉辛) of a pixel • The second derivative of intensity at that pixel • Lj (x,y) =

Approaches (cont.) • Calculating the difference between the j & j-1 frames • If the Laplacian of a region remains unchanged, the region is most likely static • Sum of absolute differences (SAD) of Laplacian of pixels in a square surrounding the target pixel (x,y)

Approaches (cont.) • Calculating the difference between the j & j-1 frames Lj(x,y) – Lj-1(x,y) SADj(x,y)

Approaches (cont.) • Calculating the difference between the j & j-1 frames • If SADj(x,y) ≤ td (dissimilarity threshold), the pixel is marked as static motion motion motion static Static (white) and motion (black) regions To avoid misidentified static regions goback

Approaches (cont.) • Adaptive Thresholding • Used to convert a gray-scale image to a bi-level image • Using Ridler’s Iterative Selection method • e.g. sequence {1,3,5,6,7,9,12,13} t1 = (1+3+5+6+7+9+12+13)/8 = 7 tb = (1+3+5+6)/4 = 3.75 to = (9+12+13)/3 = 11.33 t2 = (3.75+11.33)/2 = 7.54 tb = (1+3+5+6+7)/4 = 5.5 to = (9+12+13)/3 = 11.33 t3 = (5.5+11.33)/2 = 8.415 tb = (1+3+5+7)/4 = 5.5 to = (9+12+13)/3 = 11.33 t4 = (5.5+11.33)/2 = 8.415 ∴ t = 8.415 Take threshold = t + tc goback

Approaches (cont.) • Adaptive Context-based Arithmetic Encoding • Confidence level • The difference between the gray-scale value of the pixel and the threshold • Pixels with their gray-scale value near the threshold could be determined as either black or white • For those pixels with their absolute values of confidence level less than the half-width of threshold band, the bi-level values of the pixels are assigned according to the indexed probability in the probability table • Coding the whole frame rather than lots of blocks goback

Approaches (cont.) • Rate Control time = t frame rate = n Buffer size B = Imax + 4r/n Imax : the maximum number of bits per frame that is allowed to be sent to the buffer r : maximum video bit-rate n : specified frame rate p Imax+4r/n p p I I Assign every group rt bits tn-1 p-frames bp=(rt-bi)/(tn-1) bi bits p p p p p p p p p p I bp bits bp bits I

Approaches (cont.) • Rate Control 15% 15% Overflow, increase f to 9 Increase f by 1 Decrease f by 1

Experimental Results (c) Salesman Complex background (a) Akiyo Little head motion (b) Grandma Large head motion ( MPEG4 video clips ) (e) Wang Little head motion (d) Chen Large head motion (ordinary clips captured from real scenes using PC digital cameras)

Experimental Results (cont.) Complex background Large head motion

Experimental Results (cont.) • H.263+ VS. bi-level video

Experimental Results (cont.) (a) Akiyo (c) Salesman Complex but stable background H.263+ consumes few bits since the difference of two successive frames is almost negligible

Conclusion • An I-frame in bi-level video is much smaller than in conventional DCT-based videos • short start-up time for streaming video • Can insert more I-frame • Transmission errors quickly be recovered • Provide a smooth perception of motion even in low frame conditions

Conclusion (cont.) • Clearer shape, smoother motion, shorter initial latency, cheaper computation cost • The number of each pixel is reduced to 1 • The coding need not estimate and store motion vectors, thus reduces computational cost and coding bit-rate. • Static region detection reduces flicker effects and therefore improves coding efficiency • The coding system works well in handheld PCs, palm-size PCs, mobile phones that possess only a small display screen, very limited computation capability and transmission bandwidth

Conclusion (cont.) • Microsoft portrait • A very low bit-rate video conferencing software • http://research.microsoft.com/~jiangli/portrait/

Bi-level Video: Video Communication at Very Low Bit Rates