Concepts of Multimedia Processing and Transmission

Concepts of Multimedia Processing and Transmission IT 481, Lecture #7 Dennis McCaughey, Ph.D. 19 March, 2007

Direct Video Broadcast (DVB) Systems Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Processing of The Streams in The Set-Top Box (STB) Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Multimedia CommunicationsStandards and Applications Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Video Coding Standards • ITU H.261 for Video Teleconference (VTC) • ITU H.263 for VTC over POTS • ITU H.262 for VTC over ATM/broadband and digital TV networks • ISO MPEG-1 for movies on CDROM (VCD) • 1.2 Mbps for video coding and 256 Kbps for audio coding • ISO MPEG-2 for broadcast quality video on DVD • 2-15 Mbps allocated for audio and video coding • Low-bit rate telephony over POTS • 10 Kbps for video and 5.3 Kbps for audio • Internet and mobile communication: MPEG-4 • Very Low Bit Rate (VLBR) code to be compatible with H.263 • Multimedia content description interface MPEG-7 • Description schemes and description definition language for integrated multimedia search engine Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

History • H.261: • First video coding standard, targeted for video conferencing over ISDN. Uses block-based hybrid coding framework with integer-pixel MC • H.263: • Improved quality at lower bit rate, to enable video conferencing/telephony below 54 kbps (modems, desktop conferencing) • Half-pixel MC and other improvement • MPEG-1 video • Video on CD and video on the Internet (good quality at 1.5 mbps) • Half-pixel MC and bidirectional MC • MPEG-2 video • SDTV/HDTV/DVD (4-15 mbps) • Extended from MPEG-1, considering interlaced video Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

H.261 Video Coding Standard • For video-conferencing/video phone • Video coding standard in H.320 (VTC over switched phone network) which is an umbrella recommendation • Low delay (real-time, interactive) • Slow motion in general • For transmission over ISDN • Fixed bandwidth: px64 Kbps, p=1,2,…,30 • Video Format: • CIF (352x288, above 128 Kbps) - Common Interface Format • QCIF (176x144, 64-128 Kbps) - Quarter CIF • 4:2:0 color format, progressive scan • Published in 1990 • Each macroblock can be coded in intra- or inter-mode • Periodic insertion of intra-mode to eliminate error propagation due to network impairments • Integer-pixel accuracy motion estimation in inter-mode Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

H.261 Encoder • F: Loop filter; P: motion estimation and compensation • Loop filter: apply low-pass filter to smooth the quantization noise in previously reconstructed frames before motion estimation and compensation Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Picture Frames - Overview • Three frame types: I-Picture (Intra-frame picture), P-Picture (Inter-frame predicted picture) and B-Picture (Bi-directional predicted- interpolated pictures) • I-Picture is being coded by intra-frame coding. When encoding I-Picture, we only reduce the spatial redundancy in the picture without referencing other pictures. The coding process is much similar to JPEG Standard. So encoding I-Picture is less complex than P-frame and B-frame The basic coding unit is a 8 by 8 matrix block. A macroblock is consists of six block: 4 block of luminance (Y) , one block of Cb chrominance, and one block of Cr chrominance Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Frame Types • Intracoded Frames -> I-Frames • Level of compression is relatively small 10:1 to 20:1 • Present at regular intervals to limit extent of errors • Number of frames between I-frames is known as the Group of pictures (GOP) • 10:1 to 20:1 compression ratio • Intercoded Frames • Predicted Frames-> P-Frames • Significant compression level achieved here • Errors are propagated • 20:1 to 30:1 compression ratio • Bidirectional Frames -> B-Frames • Highest levels of compression achieved • B-frames are not used for prediction, thus errors are not propagated • 30:1 to 50:1 compression ratio IT 481, Fall 2006

Macro Blocks & Color Sub-sampling Schemes A macroblock consists of 4 8x8 pixel blocks Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Sub-sampling of Chrominance Information • Transforming (R,G,B)->(Y,Cb,Cr) provides two advantages: • 1)The human visual system (HVS) is more sensitive to Y component than the Cb or Cr components. • 2) Cb and Cr are far less correlated with Y than R with G, R with Blue and Blue with G, thus reducing TV transmission bandwidths. • Cb and Cr both require far less bandwidth and can be sampled more coarsely (Shannon). • By doing so we can reduce data without affecting visual quality from a personal view. Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Color Space Conversion • In general , each pixel in a picture consists of three components : R (Red), G (Green), B (Blue). (R,G,B) must be converted to (Y,Cb,Cr) in MPEG-1 before processing • We can view the color value of each pixel from RGB color space , or YCbCr color space • Because (Y,Cb,Cr) is less correlated than (R,G,B), coding using (Y,Cb,Cr) components is more efficient. • (Y,U,V) can also be used to denote (Y,Cb,Cr), however it most appropriately represents the analog TV equivalent Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

RGB Image IT 481, Fall 2006

Compressed Image (QSF=24) IT 481, Fall 2006

Luminance Plane (Y) IT 481, Fall 2006

Blue Chrominance Plane (Cb) IT 481, Fall 2006

Red Chrominance Plane (Cr) IT 481, Fall 2006

Red IT 481, Fall 2006

Green IT 481, Fall 2006

Blue IT 481, Fall 2006

DCT (discrete cosine transform) • DCT is used to convert data from the spatial domain to data in frequency domain. The higher frequency coefficients can be more coarsely quantized without a perceived loss of image quality due to the fact that the HVS is less sensitive to the higher frequencies and they contain less energy. • The DCT coefficient at location (0,0) is called DC coefficient and the other values we call them AC coefficients. In general, we use large quantization step in quantizing the higher AC coefficients. Higher precision is required for the DC term in order to avoid blocking in the reconstructed image. • In MPEG-1, we use 8*8 DCT. By using this transform we can convert a 8 by 8 pixel block to another 8 by 8 block. In general most of the energy(value) is concentrated to the top-left corner. • After quantizing the transformed matrix, most data in this matrix may be zero, then using zig-zag order scan and run length coding can achieve a high compression ratio. Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Transform Coding (TC) • Pack the signal energy into as few transform coefficients as possible • The DCT yields nearly optimal energy concentration • A 2-dimensional DCT with block size of 8x8 pixels is commonly used in today’s image coder • Transform is followed by quantization and entropy coding Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

2D DCT and IDCT u, v, x, y = 0, 1,2, ….,7 Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

DCT Scan Modes • The zigzag scan used in MPEG-1 is suitable for progressive images where frequency components have equal importance in each horizontal and vertical direction. (Frame pictures only) • In MPEG-2, an alternate scan is introduced because interlaced images tend to have higher frequency components in the vertical direction. Thus, the scanning order weighs more on the higher vertical frequencies than the same horizontal frequencies. Selection between these two zigzag scan orders can be made on a picture basis. (Frame and field pictures allowed) Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Motion Compensation • Try to match each block in the actual picture to content in the previous picture. Matching is made by shifting each of the 8 x 8 blocks of the two successive pictures pixel by pixel each direction -> Motion vector • Subtract the two blocks -> Difference block • Transmit the motion vector and the difference block Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Quantization • In MPEG-1, a matrix called the quantizer ( Q[i,j] ) defines the quantization step. If ( X[i,j] ) is the DCT matrix with the same size as Q[i,j], X[i,j] is divided by Q[i,j]*QSF to obtain the quantized value matrix Xq[i,j] . QSF is the Quantization Scale Factor • Quantization Equation : • Xq[i,j] = Round( X[i,j]/(Q[i,j] *QSF)) • Inverse Quantization (dequantize) is to reconstruct original value. • Inverse Quantization Equation : • X'[i,j]=QSF*Xq[i,j]*Q[i,j] • The difference between actual value and reconstructed value from quantized value is called the quantization error. In general if we carefully design Q[i,j], visual quality will not be affected. Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Quantization (cont’d) Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Average Distribution of AC Coefficients IT 481, Fall 2006

MPEG (Moving Picture Expert Group) • Established in January 1988 • Operated in the framework of the Joint ISO/IEC Technical Committee • ISO: International Organization for Standardization • IEC: International Electro-technical Commission • First meeting was in May 1988 with 25 experts participated • Grown to 350 experts from 200 companies in some 20 countries • As a rule, MPEG meets in March, July and November & could be more often as needed Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

MPEG-1 – Coding of Moving Pictures and Associated Audio • Request for Proposal (RFP) July 1989 • Adopted in 1993 • Coding of audiovisual signal at 1.5 Mbps • Audio coding is separate from speech at 256 Kbps/per channel PCM • Five parts: systems, video, audio, conformance testing and software simulation Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

MPEG-1 Overview • In MPEG-1, video is represented as a sequence of pictures, and each picture is treated as a two-dimensional array of pixels (pixels) • The color of each pixel is consists of three components : Y (luminance), Cb and Cr (two chrominance components) • Composite video, aka baseband video or RCA video, is the analog waveform that conveys the image data in a conventional National Television Standards Committee (NTSC) television signal • Composite video contains chrominance (hue and saturation) and luminance (brightness) information, along with synchronization and blanking pulses • In order to achieve high compression ratio, MPEG-1 must use hybrid coding techniques to reduce both spatial redundancy and temporal redundancy Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

MPEG-1 Overview • Audio/video on CD-ROM (1.5 Mbps, CIF: 352x240) • Maximum: 1.856 mbps, 768x576 pixels • Start late 1988, test in 10/89, Committee Draft 9/90 • ISO/IEC 11172-1~5 (Systems, video, audio, compliance, software). • Prompted explosion of digital video applications: MPEG1 video CD and downloadable video over Internet • Software only decoding, made possible by the introduction of Pentium chips, key to the success in the commercial market • MPEG-1 Audio • Offers 3 coding options (3 layers), higher layer have higher coding efficiency with more computations • MP3 = MPEG1 layer 3 audio Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

MPEG-2 vs. MPEG-1 • MPEG-2 is a superset of MPEG-1. • Generally, MPEG-1 is used for CD-ROM or Video CD (VCD) and MPEG-2 is used for broadcast or DVD. • One current difference between MPEG-1 and MPEG-2 is that MPEG-2 has implemented variable bit rate. • MPEG-2 also is what’s known as a closed format, meaning that a license fee must be paid to use the decoding algorithms, where MPEG-1 can be implemented free of charge. Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

MPEG2 vs. MPEG1 (cont’d) • MPEG1 only handles progressive sequences specified by Source Input Format (SIF). • MPEG2 is targeted primarily at interlaced, as opposed to progressive for MPEG-1, sequences and at higher resolution. • Different DCT modes and scanning methods are developed for interlaced sequences. • More sophisticated motion estimation methods (frame/field prediction mode) are developed to improve estimation accuracy for interlaced sequences. • MPEG2 has various scalability modes. • MPEG2 has various profiles and levels, each combination targeted for a different application Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

MPEG Encoding • • • • • • I1 B1 B2 B3 P1 B4 B5 B6 P2 B7 B8 B9 I2 • Frame Types I Intra Encode complete image, similar to JPEG P Forward Predicted Motion relative to previous I and P’s B Backward Predicted Motion relative to previous & future I’s & P’s Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Frame Reconstruction (I & P Frames Only) I1 I1+P1 I1+P1+P2 I2 • • • • • • updates P1 P2 • I frame complete image • P frames provide series of updates to most recent I frame Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Using Forward-Backward Prediction • If only forward prediction is used, there are uncovered areas (such as block behind car in Frame N) for which we may not be able to find a good match from the previous reference picture (Frame N-1). • On the other hand, backward prediction can properly predict these uncovered areas since they are available in the future reference picture, i.e. frame N+1 in this example. • New objects such as an airplane moving into the picture, cannot be predicted from the previous picture, but can be predicted from the future picture. Forward Prediction Backward Prediction Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Frame Reconstruction (cont’d) I1 I1+P1 I1+P1+P2 I2 • • • • • • Interpolations B1 B2 B3 B4 B5 B6 B7 B8 B9 • B frames interpolate between frames represented by I’s & P’s Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Transmission Order of the Frames Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Intra-frame Encoding Process • Decomposing image to three components in RGB space • Converting RGB to YCbCr • Dividing image into several macroblocks (each macroblock has 6 blocks , 4 for Y, 1 for Cb, 1 for Cr) • DCT transformation for each block • After DCT transform , Quantizing each coefficient • Then use zig-zag scan to gather AC value Use DPCM to encode the DC value, then use VLC to encode it • Use RLE to encode the AC value, then use VLC to encode it IT 481, Fall 2006

I-Picture Encoding Flow Chart Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

IT 481, Fall 2006

Inter-frame Coding • The kind of pictures that are using the intra-frame coding technique are P pictures and B pictures • Coding of the P pictures is more complex than for I pictures, since motion-compensated macroblocks may be constructed • The difference between the motion compensated macroblock and the current macroblock is transformed with a 2-dimensional DCT giving an array of 8 by 8 transform coefficients. • The coefficients are quantized to produce a set of quantized coefficients. The quantized coefficients are then encoded using a run-length value technique. Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Inter-frame Encoding Process • Decomposing image to three components in RGB space • Converting RGB to YCbCr • Perform motion estimation to record the difference between the encoding frame and the reference frame stored within the frame buffer • Dividing image into several macroblocks (each macroblock has 6 blocks , 4 for Y, 1 for Cb, 1 for Cr) • DCT transformation for each block • Quantizing each coefficient • Use zig-zag scan to gather AC value • Reconstruct the frame and store it to the frame buffer if necessary • DPCM is applied to encode the DC value, then use VLC to encode it • Use RLE to encode the AC value, then use VLC to encode it Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Predictive Coding • Predictivecoding is a technique to reduce statistical redundancy. That is based on the current value to predict next value and code their difference (called prediction error). If we predict next value more precisely, then the prediction error will be small. • So we can use less bits to encode prediction error than actual value. In MPEG-1, we use DPCM (Difference Pulse Coded Modulation) techniques which is a kind of predictive coding.And it is only used in DC coefficient Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Motion Compensation (MC) And Motion Estimation (ME) • Motion Estimation is to predict a block of pixels' value in next picture using a block in current picture. The location difference between these blocks is called Motion Vector. And the difference between two blocks is called prediction error. • In MPEG-1, encoder must calculate the motion vector and prediction error. When decoder obtain these information , it can use this information and current picture to reconstruct the next picture. • We usually call this process as Motion Compensation.In general, motion compensation is the inverse process of motion Estimation Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Motion Estimation (ME) Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

Motion Compensation (MC) Slide: Courtesy, Hung Nguyen IT 481, Fall 2006

P-Frame Encoding: Macroblock Structure IT 481, Fall 2006

Concepts of Multimedia Processing and Transmission