Unraveling Codecs & Formats for Multimedia Systems

INF SERV – Media Storage and Distribution Systems: Data Formats and Codecs 1/9 – 2003

Why codecs and formats? • Codecs (coders/decoders) • Determine how information is represented • Important for servers and distribution systems • Required sending speed • Amount of loss allowed • Buffers required • … • Formats • Determine how data is stored • Important for servers and distribution systems • Where is the data? • Where is the data about the data?

Media data

Medium: "Thing in the middle“ here: means to distribute and present information Media affect human computer interaction The mantra of multimedia users Speaking is faster than writing Listening is easier than reading Showing is easier than describing Media data

Time-independent media Text Graphics Discretemedia Time-dependent media Audio Video Animation Continuousmedia Interdependant media Multimedia "Continuous" refers to the user’s impression of the data, not necessarily to its representation Combined video and audio is multimedia - relations must be specified Dependence of Media

Properties of a Multimedia System • Flexibility • Provide mechanisms to handle all kinds of media, in particular, discrete and continuous media • A VCR and a desktop publishing system for text and graphics are no multimedia systems • An editor with voice annotation is a multimedia system • Integration • Independent media storage • Computer-controlled media combination • Definition A multimedia system is characterized by theintegrated computer-controlled handling of independent discrete and continuous media

Multimedia: Not Your Ordinary Data • Multimedia is different from traditional digital data: • High data volume • Continuous streaming • Several related streams • Quality of services

High Data Volume • Throughput: • Higher volume than for traditional data • Longer transactions than for traditional data • Requires • Performance and bandwidth • Resource management techniques • Compression • Typical values • Uncompressed video: 140 – 216 Mbit/s • Uncompressed audio (CD): 1.4 Mbit/s • Uncompressed speech: 64 Kbit/s • Compressed audio & video (VoD): down to 1.2 – 4 Mbit/s • Compressed audio & video (Conf.): down to 128 Kbit/s • Compressed speech: down to 6.2 Kbit/s

Coding for distribution

Compression - Necessity • E.g., video sequence • 25 images/sec. • PAL standard • 3 byte/pixel • YUV (luminance + 2 chrominance values) • RGB (red-green-blue values) • Image resolution 640 * 480 pixel • Data rate = 640 * 480 * 3 Byte * 25/s = 23040000 byte/s ~ 22 MByte/s • Approx. 1/100 stream over ADSL • Approx. 1/16 stream over Ethernet • Approx. 1/2 stream over Fast Ethernet • Compression is necessary

Compression – General Requirements • Dependence on application type: • Dialoguemode • Retrievalmode

Compression – Mode Dependent Requirements • Dialogue and retrieval mode requirements: • Synchronization of audio, video, and other media • Dialogue mode requirements: • End-to-end delay < 150ms • Compression and decompression in real-time • Symmetric • Retrieval mode requirements: • Fast forward and backward data retrieval • Random access within 1/2 s • Asymmetric • We look mainly at retrieval mode!

Compression Categories

Basic Encoding Steps

Run-Length Coding • Assumption • Long sequences of identical symbols • Example

Bit-Plane Coding • Assumption • Even longer sequences of identical bits • Example 10,0,6,0,0,3,0,2,2,0,0,2,0,0,1,0, … ,0,0 (absolute) 0,x,1,x,x,1,x,0,0,x,x,1,x,x,0,x, … ,x,x (sign bits) 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, … ,0,0 (MSB) 0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0, … ,0,0 (MSB-1) 1,0,1,0,0,1,0,1,1,0,0,1,0,0,0,0, … ,0,0 (MSB-2) 0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0, … ,0,0 (MSB-3) (0,1) (2,1) (0,0)(1,0)(2,0)(1,0)(0,0)(2,1) (5,0)(8,1) Up to 20% savings over run-length coding can be achieved

Huffman Coding • Assumption • Some symbols occur more often than others • E.g., character frequencies of the English language • Fundamental principle • Frequently occurring symbols are coded with shorter bit strings

Huffman Coding • Example • Characters to be encoded: • A, B, C, D, E • Probability to occur: • p(A)=0.3, p(B)=0.3, p(C)=0.1, p(D)=0.15, p(E)=0.15

Huffman • Table and example of application to data stream

JPEG • “JPEG”: Joint Photographic Expert Group • International Standard: • For digital compression and coding of continuous-tone still images: • Gray-scale • Color • Since 1992 • Joint effort of: • ISO/IEC JTC1/SC2/WG10 • Commission Q.16 of CCITT SGVIII • Compression rate of 1:10 yields reasonable results

JPEG • Very general compression scheme • Independence of • Image resolution • Image and pixel aspect ratio • Color representation • Image complexity and statistical characteristics • Well-defined interchange format of encoded data • Implementation in • Software only • Software and hardware • “Motion JPEG” for video compression • Sequence of JPEG-encoded images

JPEG • Sequence of compression steps • Different resolutions possible • Lossy or lossless mode • lossless compression factor ~1,6:1 • Symmetrical codec

JPEG – Baseline Mode: Quantization • Use of quantization tables for the DCT-coefficients • Map interval of real numbers to one integer number • Allows to use different granularity for each coefficient

JPEG – 4 Modes of Compression

Motion JPEG • Use series of JPEG frames to encode video • Pro • Lossless mode – editing advantage • Frame-accurate seeking – editing advantage • Arbitrary frame rates – playback advantage • Arbitrary frame skipping – playback advantage • Scaling through progressive mode – distribution advantage • Min transmission delay = 1/framerate – conferencing advantage • Supported by popular frame grabbers • Contra • Series of JPEG-compressed images • No standard, no specification • Worse, several competing quasi-standards • No relation to audio • No inter-frame compression

International Standard Video codec for video conferences at p x 64kbit/s (ISDN): Real-time encoding/decoding, max. signal delay of 150ms Constant data rate Intraframe coding DCT as in JPEG baseline mode Interframe coding, motion estimation Search of similar macroblock in previous image and compare Position of this macroblock defines motion vector Difference between similar macroblocks H.261 (px64)

International Standard: Compression of audio and video for playback (1.5 Mbit/s): Real-time decoding Sequence of I-, P-, and B-Frames: Random access at I-frames at P-frames: i.e. decode previous I-frame first at B-frame: i.e. decode I and P-frames first MPEG (Moving Pictures Expert Group)

MPEG-2 • From MPEG-1 to MPEG-2 • Improvement in quality • From VCR to TV to HDTV • No CD-ROM based constraints • Higher data rates • MPEG-1: about 1.5 MBit/s • MPEG-2: 2-100 MBit/s • Evolution • 1994: International Standard • Also later known as H.262 • Prominent role for digital TV in DVB (digital video broadcasting) and DVD (digital video disk) • Commercial MPEG-2 realizations available

MPEG-2 • Beyond MPEG-1: • Higher quality encoding • Higher data rates • Interleaved modes • Use cases • Broadcast quality production • DVB-T: Terrestrial • DVB-S: Satellite • DVB-C: Cable • Program Stream • for post-processing, storage, and DVD distribution • Transport Stream • for broadcasting, error resilience • Scaling: • Signal to Noise Ration (SNR) scaling - progressive compression error correcting codes • Spatial scaling - several pixel resolutions • Temporal scaling - frame dropping

MPEG-4 • MPEG-4 (ISO 14496) originally • Targeted at systems with very scarce resources • To support applications like • Mobile communication • Videophone and E-mail • Max. data rates and dimensions (roughly) • Between 4800 and 64000 bits/s • 176 columns x 144 lines x 10 frames/s • Further demand • To provide enhanced functionality to allow for analysis and manipulation of image contents

MPEG-4 • Hence: find standardized ways to • Represent units of aural, visual or audiovisual content • audio/visual objects" or AVOs • object coding independent of other objects, surroundings and background • natural and synthetic objects • Compose these objects together • i.e. creation of compound objects that form audiovisual scenes • Multiplex and synchronize the data associated with AVOs • for transportation over network channels providing a QoS (Quality-of-Service) • Interact with the audiovisual scene generated at the decoder’s site

MPEG-4: Scope • Definition of • „System Decoder Model“ • specification for decoder implementations • Description language • binary syntax of an AV object’s bitstream representation • scene description information • Corresponding concepts, tools and algorithms, especially for • content-based compression of simple and compound audiovisual objects • manipulation of objects • transmission of objects • random access to objects • animation • scaling • error robustness

MPEG-4: Scope • Targeted bit rates for video and audio: • VLBV core • „Very Low Bit-rate Video“ • 5 - 64 Kbit/s • image sequences with CIF resolution and up to 15 frames/s • Higher-quality video • 64 Kbit/s - 4 Mbit/s • quality like digital TV • Natural audio coding • 2 - 64 Kbit/s

MPEG-4: Video and Image Encoding • Encoding / decoding of • Rectangular images and video • coding similar to MPEG-1/2 • motion prediction • texture coding • Images and video of arbitrary shape • as done in conventional approach • 8x8 DCT or shape-adaptive DCT • plus coding of shape and transparency information • Encoder • Must generate timing information • speed of the encoder clock = time base • desired decoding times and/or expiration times • by using time stamps attached to the stream • Can specify the minimum buffer resources needed for decoding

MPEG-4: Composition of Scenes • Scene description includes: • Tree to define hierarchical relationships between objects • Objects’ positions in space and time • by converting the objects’ local coordinate system into a global coordinate system • Attribute value selection • e.g. pitch of sound, color, texture, animation parameters • Description based on some VRML concepts • VRML = „Virtual Reality Modeling Language“ • Interaction with scenes • e.g. change viewing point, drag object, start/stop streams, select language

MPEG-4: Example of a Composition

MPEG-4: Synthetic Objects • Visual objects: • Virtual parts of scenes • e.g. virtual background • Animation • e.g. animated faces • Audio objects: • „Text-to-speech“ • speech generation from given text and prosodic parameters • face animation control • „Score driven synthesis“ • music generation from a score • more general than MIDI • Special effects

MPEG-4: Error Handling • Mobile communication: • Low bit-rate (< 64 Kbps) • Error-prone • MPEG-4 concepts for error handling: • Resynchronization • enables receiver to „tune in“ again • based on markers within bitstream • Data recovery • enables receiver to reconstruct lost data • encode data in an error-resilient manner • Error concealment • enables receiver to bridge gaps in data • e.g. by repeating parts of old frames

Network-aware coding

Network-aware coding • Adapt to reality of the Internet • Content • Is created once, off-line • Is sent many times, under different circumstances • No guarantees concerning • Throughput • Jitter • Packet loss • Sending rate • Must adhere to rules • Often: don’t send more than TCP would • Can’t send at the best available encoding rate

Approaches • Simulcast • Scalable coding • SNR Scalability • Temporal Scalability • Spatial Scalability • Fine Grained Scalability • Multiple Description Coding

3 simulcast rates Simulcast • Choose a set of sending rates • During content creation • Encode content in best possible quality below that sending rate • During transmission • Choose version with the best admissable quality Best possible quality at possible sending rate Quality Single rate codec Sending rate

Enhancement layer Best possible quality at possible sending rate Quality Base layer Sending rate Scalable coding • Typically used asLayered coding • A base layer • Provides basic quality • Must always be transferred • One or moreenhancement layers • Improve quality • Transferred if possible

Temporal Scalability • Frames can be dropped • In a controlled manner • Frame dropping does not violate dependancies • Low gain example: B-frame dropping in MPEG-1

SNR Scalability • SNR – signal-to-noise ratio • Idea • Base layer • Is regularly DCT encoded • A lot of data is removed using quantization • Enhancement layer is regularly DCT encoded • Run Inverse DCT on quantized base layer • Subtract from original • DCT encode the result • If enhancement layer arrives at client • Add base and enhancement layer before running Inverse DCT

73 72 61 75 83 -1 2 -12 10 Spatial Scalability • Idea • Base layer • Downsample the original image (code only 1 pixel instead of 4) • Send like a lower resolution version • Enhancement layer • Subtract base layer pixels from all pixels • Send like a normal resolution version • If enhancement layer arrives at client • Decode both layers • Add layers Base layer Less data to code Enhancement layer Better compression due to low values

Fine Grained Scalability • Idea • Cut of compressed tail bits of samples • Base layer • As in SNR coding • Enhancement layer • Use bit-plane coding for enhancement layerinstead of run-level coding • Cut tail bits off until data rate is reached

Best possible quality at possible sending rate Goal of FGS Quality Sending rate Fine Grained Scalability (0,1) (2,1) (0,0)(1,0)(2,0)(1,0)(0,0)(2,1) (5,0)(8,1)

Multiple Description Coding • Idea • Encode data in two streams • Each stream has acceptable quality • Both streams combined have good quality • The redundancy between both streams is low • Problem • The same relevant information must exist in both streams • Old problem: started for audio coding in telephony • Currently a hot topic

Multimedia File Formats

Unraveling Codecs & Formats for Multimedia Systems

Unraveling Codecs & Formats for Multimedia Systems

Presentation Transcript

Data Formats and Tools

EPC Data Formats

Spatial Data Formats

Data Formats

Gretina data flow and formats

Data Archival, Exchange and Seismic Data Formats

Data Compression II (Codecs and Container Formats)‏

Audio Codecs

Data Formats and Codecs

MTEM data formats

Codecs and RTP payload formats in SDPng

Data Formats and Codecs

Spatial Data Formats

Data Formats

Comparing SPI and SSI Data Formats

ISUAL Data Formats and Science Data Processing

Video Codecs

Other formats for data

Interoperable Data Formats

Data Formats and Codecs

Data Storage Formats

MODIS Land Data Formats and Access