Data Formats and Codecs

INF 5070 – Media Storage and Distribution Systems Data Formats and Codecs 30/8 – 2004

Why codecs and formats? • Codecs (coders/decoders) • Determine how information is represented • Important for servers and distribution systems • Required sending speed • Amount of loss allowed • Buffers required • … • Formats • Determine how data is stored • Important for servers and distribution systems • Where is the data? • Where is the data about the data?

Media data

Medium: "Thing in the middle“ here: means to distribute and present information Media affect human computer interaction The mantra of multimedia users Speaking is faster than writing Listening is easier than reading Showing is easier than describing Media data

Time-independent media Text Graphics Discretemedia Time-dependent media Audio Video Continuousmedia Interdependant media Multimedia "Continuous" refers to the user’s impression of the data, not necessarily to its representation Combined video and audio is multimedia - relations must be specified Dependence of Media

Dependence of Media • Defined by the presentation of the data, not its representation • Discrete media • Text • Graphics • Video stills (image displayed by pausing a video stream) • Continuous media • Audio • Video • Animation • Ticker news (continuously scrolling text) • Multimedia • Multiplexed audio and video

Properties of a Multimedia System • Flexibility • Provide mechanisms to handle all kinds of media, in particular, discrete and continuous media • A VCR and a desktop publishing system for text and graphics are no multimedia systems • An editor with voice annotation is a multimedia system • Integration • Independent media storage • Computer-controlled media combination • Definition A multimedia system is characterized by theintegrated computer-controlled handling of independent discrete and continuous media

Coding for distribution

Compression - Necessity • E.g., video sequence • 25 images/sec. • PAL standard • 3 byte/pixel • YUV (luminance + 2 chrominance values) • RGB (red-green-blue values) • Image resolution 640 * 480 pixel • Data rate = 640 * 480 * 3 Byte * 25/s = 23040000 byte/s ~ 22 MByte/s • Approx. 1/16 stream over Ethernet • Approx. 1/2 stream over Fast Ethernet • Compression is necessary

Compression – General Requirements • Dependence on application type: • Dialoguemode • Retrievalmode

Compression – Mode Dependent Requirements • Dialogue and retrieval mode requirements: • Synchronization of audio, video, and other media • Dialogue mode requirements: • End-to-end delay < 150ms • Compression and decompression in real-time • Symmetric • Retrieval mode requirements: • Fast forward and backward data retrieval • Random access within 1/2 s • Asymmetric • We look mainly at retrieval mode!

Compression Categories

Basic Encoding Steps

Run-Length Coding • Assumption • Long sequences of identical symbols • Example

End of plane No 0s before a 1 Bit-Plane Coding • Assumption • Even longer sequences of identical bits • Example 10,0,6,0,0,3,0,2,2,0,0,2,0,0,1,0, … ,0,0 (absolute) 0,x,1,x,x,1,x,0,0,x,x,1,x,x,0,x, … ,x,x (sign bits) 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, … ,0,0 (MSB) 0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0, … ,0,0 (MSB-1) 1,0,1,0,0,1,0,1,1,0,0,1,0,0,0,0, … ,0,0 (MSB-2) 0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0, … ,0,0 (MSB-3) (0,1) (2,1) (0,0)(1,0)(2,0)(1,0)(0,0)(2,1) (5,0)(8,1) • Up to 20% savings over run-length coding can be achieved

Huffman Coding • Assumption • Some symbols occur more often than others • E.g., character frequencies of the English language • Fundamental principle • Frequently occurring symbols are coded with shorter bit strings

Huffman Coding • Example • Characters to be encoded: • A, B, C, D, E • Probability to occur: • p(A)=0.3, p(B)=0.3, p(C)=0.1, p(D)=0.15, p(E)=0.15

Huffman • Table and example of application to data stream

JPEG • “JPEG”: Joint Photographic Expert Group • International Standard: • For digital compression and coding of continuous-tone still images: • Gray-scale • Color • Since 1992 • Joint effort of: • ISO/IEC JTC1/SC2/WG10 • Commission Q.16 of CCITT SGVIII • Compression rate of 1:10 yields reasonable results

JPEG • Very general compression scheme • Independence of • Image resolution • Image and pixel aspect ratio • Color representation • Image complexity and statistical characteristics • Well-defined interchange format of encoded data • Implementation in • Software only • Software and hardware

JPEG • Sequence of compression steps • Different resolutions possible • Lossy or lossless mode • lossless compression factor ~1,6:1 • Symmetrical codec

JPEG – Baseline Mode: Quantization • Use of quantization tables for the DCT-coefficients • Map interval of real numbers to one integer number • Allows to use different granularity for each coefficient

JPEG – 4 Modes of Compression

Motion JPEG • Use series of JPEG frames to encode video • Pro • Lossless mode – editing advantage • Frame-accurate seeking – editing advantage • Arbitrary frame rates – playback advantage • Arbitrary frame skipping – playback advantage • Scaling through progressive mode – distribution advantage • Min transmission delay = 1/framerate – conferencing advantage • Supported by popular frame grabbers • Contra • Series of JPEG-compressed images • No standard, no specification • Worse, several competing quasi-standards • No relation to audio • No inter-frame compression

International Standard Video codec for video conferences at p x 64kbit/s (ISDN): Real-time encoding/decoding, max. signal delay of 150ms Constant data rate Intraframe coding DCT as in JPEG baseline mode Interframe coding, motion estimation Search of similar macroblock in previous image and compare Position of this macroblock defines motion vector Difference between similar macroblocks H.261 (px64)

International Standard: Compression of audio and video for playback (1.5 Mbit/s): Real-time decoding Sequence of I-, P-, and B-Frames: Random access at I-frames at P-frames: i.e. decode previous I-frame first at B-frame: i.e. decode I and P-frames first MPEG (Moving Pictures Expert Group)

MPEG-2 • From MPEG-1 to MPEG-2 • Improvement in quality • From VCR to TV to HDTV • No CD-ROM based constraints • Higher data rates • MPEG-1: about 1.5 MBit/s • MPEG-2: 2-100 MBit/s • Evolution • 1994: International Standard • Also later known as H.262 • Prominent role for digital TV in DVB (digital video broadcasting) and DVD (digital video disk) • Commercial MPEG-2 realizations available

MPEG-2 • Beyond MPEG-1: • Higher quality encoding • Higher data rates • Interleaved modes • Use cases • Broadcast quality production • DVB-T: Terrestrial • DVB-S: Satellite • DVB-C: Cable • Program Stream • for post-processing, storage, and DVD distribution • Transport Stream • for broadcasting, error resilience • Scaling: • Signal to Noise Ration (SNR) scaling - progressive compression error correcting codes • Spatial scaling - several pixel resolutions • Temporal scaling - frame dropping

MPEG-4 • MPEG-4 (ISO 14496) originally • Targeted at systems with very scarce resources • To support applications like • Mobile communication • Videophone and E-mail • Max. data rates and dimensions (roughly) • Between 4800 and 64000 bits/s • 176 columns x 144 lines x 10 frames/s • Further demand • To provide enhanced functionality to allow for analysis and manipulation of image contents

MPEG-4 • Hence: find standardized ways to • Represent units of aural, visual or audiovisual content • audio/visual objects" or AVOs • object coding independent of other objects, surroundings and background • natural and synthetic objects • Compose these objects together • i.e. creation of compound objects that form audiovisual scenes • Multiplex and synchronize the data associated with AVOs • for transportation over network channels providing a QoS (Quality-of-Service) • Interact with the audiovisual scene generated at the decoder’s site

MPEG-4: Scope • Definition of • „System Decoder Model“ • specification for decoder implementations • Description language • binary syntax of an AV object’s bitstream representation • scene description information • Corresponding concepts, tools and algorithms, especially for • content-based compression of simple and compound audiovisual objects • manipulation of objects • transmission of objects • random access to objects • animation • scaling • error robustness

MPEG-4: Scope • Targeted bit rates for video and audio: • VLBV core • „Very Low Bit-rate Video“ • 5 - 64 Kbit/s • image sequences with CIF resolution and up to 15 frames/s • Higher-quality video • 64 Kbit/s - 4 Mbit/s • quality like digital TV • Natural audio coding • 2 - 64 Kbit/s

MPEG-4: Video and Image Encoding • Encoding / decoding of • Rectangular images and video • coding similar to MPEG-1/2 • motion prediction • texture coding • Images and video of arbitrary shape • as done in conventional approach • 8x8 DCT or shape-adaptive DCT • plus coding of shape and transparency information • Encoder • Must generate timing information • speed of the encoder clock = time base • desired decoding times and/or expiration times • by using time stamps attached to the stream • Can specify the minimum buffer resources needed for decoding

MPEG-4: Composition of Scenes • Scene description includes: • Tree to define hierarchical relationships between objects • Objects’ positions in space and time • by converting the objects’ local coordinate system into a global coordinate system • Attribute value selection • e.g. pitch of sound, color, texture, animation parameters • Description based on some VRML concepts • VRML = „Virtual Reality Modeling Language“ • Interaction with scenes • e.g. change viewing point, drag object, start/stop streams, select language

MPEG-4: Example of a Composition

MPEG-4: Synthetic Objects • Visual objects: • Virtual parts of scenes • e.g. virtual background • Animation • e.g. animated faces • Audio objects: • „Text-to-speech“ • speech generation from given text and prosodic parameters • face animation control • „Score driven synthesis“ • music generation from a score • more general than MIDI • Special effects

MPEG-4: Error Handling • Mobile communication: • Low bit-rate (< 64 Kbps) • Error-prone • MPEG-4 concepts for error handling: • Resynchronization • enables receiver to „tune in“ again • based on markers within bitstream • Data recovery • enables receiver to reconstruct lost data • encode data in an error-resilient manner • Error concealment • enables receiver to bridge gaps in data • e.g. by repeating parts of old frames

Network-aware coding

Network-aware coding • Adapt to reality of the Internet • Content • Is created once, off-line • Is sent many times, under different circumstances • No guarantees concerning • Throughput • Jitter • Packet loss • Sending rate • Must adhere to rules • Often: don’t send more than TCP would • Can’t send at the best available encoding rate

Approaches • Simulcast • Scalable coding • SNR Scalability • Temporal Scalability • Spatial Scalability • Fine Grained Scalability • Multiple Description Coding

3 simulcast rates Simulcast • Choose a set of sending rates • During content creation • Encode content in best possible quality below that sending rate • During transmission • Choose version with the best admissable quality Best possible quality at possible sending rate Quality Single rate codec Sending rate

Enhancement layer Best possible quality at possible sending rate Quality Base layer Sending rate Scalable coding • Typically used asLayered coding • A base layer • Provides basic quality • Must always be transferred • One or moreenhancement layers • Improve quality • Transferred if possible

Temporal Scalability • Frames can be dropped • In a controlled manner • Frame dropping does not violate dependancies • Low gain example: B-frame dropping in MPEG-1

73 72 61 75 83 -1 2 -12 10 Spatial Scalability • Idea • Base layer • Downsample the original image (code only 1 pixel instead of 4) • Send like a lower resolution version • Enhancement layer • Subtract base layer pixels from all pixels • Send like a normal resolution version • If enhancement layer arrives at client • Decode both layers • Add layers Base layer Less data to code Enhancement layer Better compression due to low values

DS DCT Q VLC - - DCT Q VLC + + DCT Q VLC Spatial Scalability raw video base layer DS enhancement layer enhancement layer 2 DS - downsampling DCT – discrete cosine transformation Q – quantization VLC – variable length coding

SNR Scalability • SNR – signal-to-noise ratio • Idea • Base layer • Is regularly DCT encoded • A lot of data is removed using quantization • Enhancement layer is regularly DCT encoded • Run Inverse DCT on quantized base layer • Subtract from original • DCT encode the result • If enhancement layer arrives at client • Add base and enhancement layer before running Inverse DCT

SNR Scalability DCT Q VLC raw video base layer - IQ + enhancement layer Q VLC DCT – discrete cosine transformation Q – quantization IQ – inverse quantization VLC – variable length coding

Fine Grained Scalability • Idea • Cut of compressed tail bits of samples • Base layer • As in SNR coding • Enhancement layer • Use bit-plane coding for enhancement layerinstead of run-level coding • Cut tail bits off until data rate is reached

Best possible quality at possible sending rate Goal of FGS Quality Sending rate Fine Grained Scalability MSB (0,1) MSB-1 (2,1) MSB-2 (0,0)(1,0)(2,0)(1,0)(0,0)(2,1) MSB-3 (5,0)(8,1) …

Fine Grained Scalability DCT Q VLC raw video base layer - IQ + enhancement layer Q BC DCT – discrete cosine transformation Q – quantization IQ – inverse quantization VLC – variable length coding BC – bitplane coding

Data Formats and Codecs