1 / 39

Multimedia Systems Lecture 8 – MPEG Video and Audio

Multimedia Systems Lecture 8 – MPEG Video and Audio. MPEG General Information. Goal: data compression 1.2 Mbps MPEG defines video, audio coding and system data streams with synchronization MPEG information Aspect ratios: 1:1 (CRT), 4:3 (NTSC), 16:9 (HDTV)

ahmed-haney
Télécharger la présentation

Multimedia Systems Lecture 8 – MPEG Video and Audio

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multimedia Systems Lecture 8 – MPEG Video and Audio

  2. MPEG General Information • Goal: data compression 1.2 Mbps • MPEG defines video, audio coding and system data streams with synchronization • MPEG information • Aspect ratios: 1:1 (CRT), 4:3 (NTSC), 16:9 (HDTV) • Refresh frequencies: (23.975, 24, 25, 29.97, 30, 50, 59.94, 60) Hz

  3. 1- MPEG Image Preparation (Resolution and Dimension) • MPEG defines exactly format: • Image consists of three components: Luminance and two chrominance components (2:1:1) • Resolution of luminance comp: X1 ≤ 768; Y1 ≤ 576 pixels • Pixel precision is 8 bits for each component

  4. MPEG Image Preparation - Blocks • Each image is divided into macro-blocks. • Macro-block : 16x16 pixels for luminance; 8x8 for each chrominance component. • Macro-blocks are useful for Motion Estimation. • No MCUs which implies sequential non-interleaving order of pixels values.

  5. 2- MPEG Video Processing • Intra frames (I frames) • are coded without using information about other frames. • is treated as still image. • Predictive frames (P frames) • encode using information from previous I and/or P reference frame. • Bi-directional frames (B frames) • encode from previous and following I and/or P frames.

  6. Selecting I, P, or B Frames • Heuristics • change of scenes should generate I frame • limit B and P frames between I frames • B frames are computationally intense

  7. MPEG Video I-Frames I-frames – points of random access in MPEG stream I-frames use 8x8 blocks defined within Macro-block as in JPEG No quantization table for all DCT coefficients, only quantization factor

  8. MPEG Video P-Frames Motion Estimation Method Predictive coded frames require information of previous I and/ or P frame for encoding/decoding For Temporary Redundancy we determine last P or I frame that is most similar to the block under consideration

  9. Motion Computation for P-Frames • Predictive search • Look for match window within a given search window • Match window – macro-block • Search window – arbitrary window size depending how far away are we willing to look • Displacement of two match windows is expressed by motion vector

  10. Matching Methods • SSD metric: • Minimum error represents best match • must be below a specified threshold • error and perceptual similarity not always correlated

  11. Example of Finding Minimal SSD

  12. Syntax of P-Frame Addr: address the syntax of P frame Type: INTRA block is specified if no good match was found Quant: quantization value per macro-block (vary quantization to fine-tune compression) Motion Vector: a 2D vector used for motion compensation provides offset from coordinate position in target image to coordinates in reference image CBP(Coded Block Pattern): bit mask indicates which blocks are present

  13. MPEG Video B-Frames Bi-directionally Predictive-coded frames

  14. 3- MPEG Video Quantization • AC coefficients of B/P frames are usually large values, I frames have smaller values • Thus, MPEG adjust quantization accordingly. • If data rate increases over threshold, then quantization enlarges step size (increase quantization factor Q). • If data rate decreases below threshold, then • quantization decreases Q and is performed with • finer granularity.

  15. MPEG Audio Encoding • MPEG audio coding is compatible with the coding of audio data used for Compact Disc Digital Audio (CD-DA) and Digital Audio Tape (DAT). • Characteristics: • Precision 16 bits per sample value. • Sampling frequency: 32KHz, 44.1 KHz, 48 KHz. • Three quality levels (layers) are defined with different encoding and decoding complexity.

  16. MPEG Audio Encoding Steps

  17. MPEG Audio Filter Bank • Filter bank divides input into multiple sub-bands • (32 equal frequency sub-bands).

  18. MPEG Audio Psycho-acoustic Model • MPEG audio compresses by removing acoustically irrelevant parts of audio signals. • Takes advantage of human auditory systems inability to hear quantization noise under auditory masking. • Auditory masking: occurs whenever the presence of a strong audio signal makes a temporal or spectral neighborhood of weaker audio signals imperceptible.

  19. MPEG/audio divides audio signal into frequency sub-bands that approximate critical bands. Then we quantize each sub-band according to the audibility of quantization noise within the band

  20. MPEG Audio Bit Allocation • This process determines number of code bits allocated to each sub-band based on information from the psycho-acoustic model • Algorithm: • Compute mask-to-noise ratio: MNR=SNR-SMR • Standard provides tables that give estimates for SNR resulting from quantizing to a given number of quantizer levels • Get MNR for each sub-band • Search for sub-band with the lowest MNR • Allocate code bits to this sub-band. • If sub-band gets allocated more code bits than appropriate, look up new estimate of SNR and repeat step 1

  21. MPEG Audio Comments • Precision of 16 bits per sample is needed to get good SNR ratio. • Noise we are getting is quantization noise from the digitization process. • For each added bit, we get 6 dB better SNR ratio. • Masking effect means that we can raise the noise floor around a strong sound because the noise will be masked away. • Raising noise floor is the same as using fewer bits and using less bits is the same as compression.

  22. MPEG-4 Video

  23. Original MPEG-4 • Conceived in 1992 to address very low bit rate audio and video (64 Kbps). • Required quantum leaps in compression • beyond statistical- and DCT- based techniques • committee felt it was possible within 5 years • Quantum leap did not happen

  24. The “New” MPEG-4 • Support object-based features for content. • Enable dynamic rendering of content • defer composition until decoding • Support convergence among digital video, synthetic environments, and the Internet.

  25. MPEG-4 VS. MPEG-1 & MPEG-2 • MPEG-4 has audiovisual objects (AVOs), previous don’t. • MPEG-4 defines coded representation of objects, such as text, • graphics, animated human bodies, previous don’t. • MPEG-4 supports hierarchical combination of AVOs to make • audiovisual scenes; support of dynamic tree-like structure of • AVOs that can be changed through user interaction. • MPEG-4 supports scene description.

  26. MPEG-4 VS. MPEG-1 & MPEG-2 • MPEG-4 supports fundamental video coding extensions: • Object-based scene layering and separate coding and decoding of layers; • Shape-adaptive DCT coding; • Object-based tool box for motion prediction; • MPEG-4 standard specifies how AVOs are bundled into one or • more elementary streams, characterized by QoS for • transmission. • Flexible multiplexing of elementary streams. • Definition of access units • Symmetric error codes

  27. MPEG-4 Components • Systems – defines architecture, multiplexing structure, syntax. • Video – defines video coding algorithms for animation of synthetic and natural hybrid video (Synthetic/Natural Hybrid Coding). • Audio – defines audio/speech coding, Synthetic/Natural Hybrid Coding such as MIDI and text-to-speech synthesis integration.

  28. MPEG-4 Components • Conformance Testing – defines compliance requirements for MPEG-4 bit stream and decoders. • Technical report. • DSM-CC Multimedia Integration Framework – defines Digital Storage Media . • Command & Control Multimedia Integration Format – specifies merging of broadcast, interactive and conversational multimedia for set-top-boxes and mobile stations.

  29. MPEG-4 Example

  30. MPEG-4 Example

  31. Media Objects • An object is called a media object • real and synthetic images; analog and synthetic audio; animated faces; interaction. • Compose media objects into a hierarchical representation • form compound, dynamic scenes.

  32. Composition Scene character furniture sprit voice desk globe

  33. Composition (cont.) • Encode objects in separate channels • encode using most efficient mechanism • transmit each object in a separate stream • Composition takes place at the decoder,rather than at the encoder • requires a binary scene description (BIFS) • BIFS is low-level language for describing: • hierarchical, spatial, and temporal relations

  34. MPEG-4 Rendering

  35. Interaction as Objects • Change colors of objects. • Toggle visibility of objects. • Navigate to different content sections. • Select from multiple camera views • change current camera angle • Standardizes content and interaction • e.g., broadcast HDTV and stored DVD

  36. Hierarchical Model • Each MP4 movie composed of tracks • each track composed of media elements (one reserved for BIFS information); • each media element is an object; • each object is a audio, video, sprite, etc. • Each object specifies its: • spatial information relative to a parent; • temporal information relative to global timeline.

  37. Synchronization • Global timeline (high-resolution units) • e.g., 600 units/sec • Each continuous track specifies relation • e.g., if a video is 30 fps, then a frame should be displayed every 20 units • Others specify start/end time

  38. MPEG-4 Audio • Bit-rate 2-320kbps • Sampling up to 96Khz • Scalable for variable rates • MPEG-4 defines set of coders • Parametric Coding Techniques: low bit-rate 2-6kbps, 8kHz sampling frequency • Code Excited Linear Prediction: medium bit-rates 6-24 kbps, 8 and 16 kHz sampling rate • Time Frequency Techniques: high quality audio 16 kbps and higher bit-rates, sampling rate > 7 kHz

  39. The End

More Related