Multimedia Systems Lecture 8 – MPEG Video and Audio

Multimedia Systems Lecture 8 – MPEG Video and Audio

MPEG General Information • Goal: data compression 1.2 Mbps • MPEG defines video, audio coding and system data streams with synchronization • MPEG information • Aspect ratios: 1:1 (CRT), 4:3 (NTSC), 16:9 (HDTV) • Refresh frequencies: (23.975, 24, 25, 29.97, 30, 50, 59.94, 60) Hz

1- MPEG Image Preparation (Resolution and Dimension) • MPEG defines exactly format: • Image consists of three components: Luminance and two chrominance components (2:1:1) • Resolution of luminance comp: X1 ≤ 768; Y1 ≤ 576 pixels • Pixel precision is 8 bits for each component

MPEG Image Preparation - Blocks • Each image is divided into macro-blocks. • Macro-block : 16x16 pixels for luminance; 8x8 for each chrominance component. • Macro-blocks are useful for Motion Estimation. • No MCUs which implies sequential non-interleaving order of pixels values.

2- MPEG Video Processing • Intra frames (I frames) • are coded without using information about other frames. • is treated as still image. • Predictive frames (P frames) • encode using information from previous I and/or P reference frame. • Bi-directional frames (B frames) • encode from previous and following I and/or P frames.

Selecting I, P, or B Frames • Heuristics • change of scenes should generate I frame • limit B and P frames between I frames • B frames are computationally intense

MPEG Video I-Frames I-frames – points of random access in MPEG stream I-frames use 8x8 blocks defined within Macro-block as in JPEG No quantization table for all DCT coefficients, only quantization factor

MPEG Video P-Frames Motion Estimation Method Predictive coded frames require information of previous I and/ or P frame for encoding/decoding For Temporary Redundancy we determine last P or I frame that is most similar to the block under consideration

Motion Computation for P-Frames • Predictive search • Look for match window within a given search window • Match window – macro-block • Search window – arbitrary window size depending how far away are we willing to look • Displacement of two match windows is expressed by motion vector

Matching Methods • SSD metric: • Minimum error represents best match • must be below a specified threshold • error and perceptual similarity not always correlated

Example of Finding Minimal SSD

Syntax of P-Frame Addr: address the syntax of P frame Type: INTRA block is specified if no good match was found Quant: quantization value per macro-block (vary quantization to fine-tune compression) Motion Vector: a 2D vector used for motion compensation provides offset from coordinate position in target image to coordinates in reference image CBP(Coded Block Pattern): bit mask indicates which blocks are present

MPEG Video B-Frames Bi-directionally Predictive-coded frames

3- MPEG Video Quantization • AC coefficients of B/P frames are usually large values, I frames have smaller values • Thus, MPEG adjust quantization accordingly. • If data rate increases over threshold, then quantization enlarges step size (increase quantization factor Q). • If data rate decreases below threshold, then • quantization decreases Q and is performed with • finer granularity.

MPEG Audio Encoding • MPEG audio coding is compatible with the coding of audio data used for Compact Disc Digital Audio (CD-DA) and Digital Audio Tape (DAT). • Characteristics: • Precision 16 bits per sample value. • Sampling frequency: 32KHz, 44.1 KHz, 48 KHz. • Three quality levels (layers) are defined with different encoding and decoding complexity.

MPEG Audio Encoding Steps

MPEG Audio Filter Bank • Filter bank divides input into multiple sub-bands • (32 equal frequency sub-bands).

MPEG Audio Psycho-acoustic Model • MPEG audio compresses by removing acoustically irrelevant parts of audio signals. • Takes advantage of human auditory systems inability to hear quantization noise under auditory masking. • Auditory masking: occurs whenever the presence of a strong audio signal makes a temporal or spectral neighborhood of weaker audio signals imperceptible.

MPEG/audio divides audio signal into frequency sub-bands that approximate critical bands. Then we quantize each sub-band according to the audibility of quantization noise within the band

MPEG Audio Bit Allocation • This process determines number of code bits allocated to each sub-band based on information from the psycho-acoustic model • Algorithm: • Compute mask-to-noise ratio: MNR=SNR-SMR • Standard provides tables that give estimates for SNR resulting from quantizing to a given number of quantizer levels • Get MNR for each sub-band • Search for sub-band with the lowest MNR • Allocate code bits to this sub-band. • If sub-band gets allocated more code bits than appropriate, look up new estimate of SNR and repeat step 1

MPEG Audio Comments • Precision of 16 bits per sample is needed to get good SNR ratio. • Noise we are getting is quantization noise from the digitization process. • For each added bit, we get 6 dB better SNR ratio. • Masking effect means that we can raise the noise floor around a strong sound because the noise will be masked away. • Raising noise floor is the same as using fewer bits and using less bits is the same as compression.

MPEG-4 Video

Original MPEG-4 • Conceived in 1992 to address very low bit rate audio and video (64 Kbps). • Required quantum leaps in compression • beyond statistical- and DCT- based techniques • committee felt it was possible within 5 years • Quantum leap did not happen

The “New” MPEG-4 • Support object-based features for content. • Enable dynamic rendering of content • defer composition until decoding • Support convergence among digital video, synthetic environments, and the Internet.

MPEG-4 VS. MPEG-1 & MPEG-2 • MPEG-4 has audiovisual objects (AVOs), previous don’t. • MPEG-4 defines coded representation of objects, such as text, • graphics, animated human bodies, previous don’t. • MPEG-4 supports hierarchical combination of AVOs to make • audiovisual scenes; support of dynamic tree-like structure of • AVOs that can be changed through user interaction. • MPEG-4 supports scene description.

MPEG-4 VS. MPEG-1 & MPEG-2 • MPEG-4 supports fundamental video coding extensions: • Object-based scene layering and separate coding and decoding of layers; • Shape-adaptive DCT coding; • Object-based tool box for motion prediction; • MPEG-4 standard specifies how AVOs are bundled into one or • more elementary streams, characterized by QoS for • transmission. • Flexible multiplexing of elementary streams. • Definition of access units • Symmetric error codes

MPEG-4 Components • Systems – defines architecture, multiplexing structure, syntax. • Video – defines video coding algorithms for animation of synthetic and natural hybrid video (Synthetic/Natural Hybrid Coding). • Audio – defines audio/speech coding, Synthetic/Natural Hybrid Coding such as MIDI and text-to-speech synthesis integration.

MPEG-4 Components • Conformance Testing – defines compliance requirements for MPEG-4 bit stream and decoders. • Technical report. • DSM-CC Multimedia Integration Framework – defines Digital Storage Media . • Command & Control Multimedia Integration Format – specifies merging of broadcast, interactive and conversational multimedia for set-top-boxes and mobile stations.

MPEG-4 Example

Media Objects • An object is called a media object • real and synthetic images; analog and synthetic audio; animated faces; interaction. • Compose media objects into a hierarchical representation • form compound, dynamic scenes.

Composition Scene character furniture sprit voice desk globe

Composition (cont.) • Encode objects in separate channels • encode using most efficient mechanism • transmit each object in a separate stream • Composition takes place at the decoder,rather than at the encoder • requires a binary scene description (BIFS) • BIFS is low-level language for describing: • hierarchical, spatial, and temporal relations

MPEG-4 Rendering

Interaction as Objects • Change colors of objects. • Toggle visibility of objects. • Navigate to different content sections. • Select from multiple camera views • change current camera angle • Standardizes content and interaction • e.g., broadcast HDTV and stored DVD

Hierarchical Model • Each MP4 movie composed of tracks • each track composed of media elements (one reserved for BIFS information); • each media element is an object; • each object is a audio, video, sprite, etc. • Each object specifies its: • spatial information relative to a parent; • temporal information relative to global timeline.

Synchronization • Global timeline (high-resolution units) • e.g., 600 units/sec • Each continuous track specifies relation • e.g., if a video is 30 fps, then a frame should be displayed every 20 units • Others specify start/end time

MPEG-4 Audio • Bit-rate 2-320kbps • Sampling up to 96Khz • Scalable for variable rates • MPEG-4 defines set of coders • Parametric Coding Techniques: low bit-rate 2-6kbps, 8kHz sampling frequency • Code Excited Linear Prediction: medium bit-rates 6-24 kbps, 8 and 16 kHz sampling rate • Time Frequency Techniques: high quality audio 16 kbps and higher bit-rates, sampling rate > 7 kHz

The End

Multimedia Systems Lecture 8 – MPEG Video and Audio