1 / 39

CS598kn Basic Concepts of Audio, Video and Compression

CS598kn Basic Concepts of Audio, Video and Compression. Klara Nahrstedt 01/21/2005. Content. Introduction on Multimedia Audio encoding Video encoding Compression. Video on Demand. Video On Demand: (a) ADSL vs. (b) cable. Multimedia Files. A movie may consist of several files.

Télécharger la présentation

CS598kn Basic Concepts of Audio, Video and Compression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS598kn Basic Concepts of Audio, Video and Compression Klara Nahrstedt 01/21/2005

  2. Content • Introduction on Multimedia • Audio encoding • Video encoding • Compression CS 598kn - Advanced Topics in Multimedia Systems

  3. Video on Demand Video On Demand: (a) ADSL vs. (b) cable CS 598kn - Advanced Topics in Multimedia Systems

  4. Multimedia Files A movie may consist of several files CS 598kn - Advanced Topics in Multimedia Systems

  5. Multimedia Issues • Analog to digital • Problem: need to be acceptable by ears or eyes • Jitter • Require high data rate • Large storage • Compression • Require real-time playback • Scheduling • Quality of service • Resource reservation CS 598kn - Advanced Topics in Multimedia Systems

  6. Audio • Sound is a continuous wave that travels through the air. • The wave is made up of pressure differences. CS 598kn - Advanced Topics in Multimedia Systems

  7. How do we hear sound? • Sound is detected by measuring the pressure level at a point • When an acoustic signal reaches the otter- ear (Pinna), the generated wave will be transformed into energy and filtered through the middle-ear. The inner-ear (Cochlea) transforms the energy into nerve activity. • In similar way, when an acoustic wave strikes a microphone, the microphone generates an electrical signal, representing the sound amplitude as a function of time. CS 598kn - Advanced Topics in Multimedia Systems

  8. Basic Sound Concepts • Frequency represents the number of periods in a second (measured in hertz, cycles/second) • Human hearing frequency range: 20 Hz - 20 kHz (audio), voice is about 500 Hz to 2 kHz. • Amplitude of a sound is the measure of displacement of the air pressure wave from its mean. CS 598kn - Advanced Topics in Multimedia Systems

  9. Computer Representation of Audio • Speech is analog in nature and it is converted to digital form by an analog-to-digital converter (ADC). • A transducer converts pressure to voltage levels. • Convert analog signal into a digital stream by discrete sampling • Discretization both in time and amplitude (quantization) CS 598kn - Advanced Topics in Multimedia Systems

  10. Audio Encoding (1) Audio Waves Converted to Digital • electrical voltage input • sample voltage levels at intervals to get a vector of values: (0, 0.2, 0.5, 1.1, 1.5, 2.3, 2.5, 3.1, 3.0, 2.4,...) • A computer measures the amplitude of the waveform at regular time intervals to produce a series of numbers (samples). • The ADC process is governed by four factors: sampling rate, quantization, linearity, and conversion speed. binary number as output CS 598kn - Advanced Topics in Multimedia Systems

  11. Audio Encoding (2) • Sampling Rate: rate at which a continuous wave is sampled (measured in Hertz) • Examples: CD standard - 44100 Hz, Telephone quality - 8000 Hz • The audio industry uses 5.0125 kHz, 11.025 kHz, 22.05 kHz, and 44.1 kHz as the standard sampling frequencies. These frequencies are supported by most sound cards. • Question: How often do you need to sample a signal to avoid losing information? CS 598kn - Advanced Topics in Multimedia Systems

  12. Audio Encoding (3) • Answer: It depends on how fast the signal is changing,. Real Answer: twice per cycle (this follows from Nyquist sampling theorem) • Nyquist Sampling Theorem: If a signal f(t) is sampled at regular intervals of time and at a rate higher than twice the highest significant signal frequency, then the samples contain all the information of the original signal. • Example: CD's actual sampling frequency is 22050 Hz, but because of Nyquist's Theorem, we need to sample the signal twice, therefore the sampling frequency is 44100Hz. CS 598kn - Advanced Topics in Multimedia Systems

  13. Audio Encoding (4) • The best-known technique for voice digitization is Pulse-Code Modulation (PCM). • PCM is based on the sampling theorem. • If voice data are limited to 4000 Hz, then PCM samples 8000 samples/second which is sufficient for the input voice signal. • PCM provides analog samples which must be converted to digital representation. Each of these analog samples must be assigned a binary code. Each sample is approximated by being quantized as explained above. CS 598kn - Advanced Topics in Multimedia Systems

  14. Audio Encoding (5) • Quantization (sample precision): the resolution of a sample value. • Samples are typically stored as raw numbers (linear PCM format) or as logarithms (u-law or A-law) • Quantization depends on the number of bits used measuring the height of the waveform • Example: 16-bit CD quality quantization results in over 65536 values • Audio Formats are described by the sample rate and quantization • Voice quality: 8-bit quantization, 8000 Hz u-law mono (8kBytes/s) • 22 kHz 8-bit linear mono (22 kBytes/second) and stereo (44 kBytes/s) • CD quality 16-bit quantization, 44100 Hz linear stereo (176.4 kBytes/s = 44100 samples x 16 bits/sample x 2 (two channels)/8000) CS 598kn - Advanced Topics in Multimedia Systems

  15. Audio Formats • Audio formats are characterized by four parameters: • Sample Rate: sampling frequency per second • Encoding: Audio data representation • U-law – CCITT G.711 standard for voice data in telephone companies (USA, Canada, Japan) • A-law – CCITT G.711 standard for voice data in telephony elsewhere (Europe,…) • A-law and u-law are sampled at 8000 samples/second with precision of 12 bits and compressed to 8 bit samples. • Linear PCM – uncompressed audio where samples are proportional to audio signal voltage • Precision: number of bits used to store each audio sample. • Channel: Multiple channels of audio may be interleaved at sample boundaries CS 598kn - Advanced Topics in Multimedia Systems

  16. Basic Concepts of Video • Visual Representation – Video Encoding • Objective is to offer the viewer a sense of presence in the scene and of participation in the events portrayed • Transmission • Video signals are transmitted to a receiver through a single television channel • Digitalization • Analog to digital conversion, sampling of gray/color level - Quantization CS 598kn - Advanced Topics in Multimedia Systems

  17. Visual Representation (1) • Video Signals are generated at the output of a camera by scanning a two-dimensional moving scene and converting them into a one–dimensional electric signal • A moving scene is a collection of individual images, where each scanned picture generates a frame of the picture • Scanning starts at the top-left corner of the picutre and ends at the bottom-right. • Aspect Ratio – ratio of picture width and height • Pixel – discrete picture element – digitized light point in a frame • Vertical Frame Resolution – number of pixels in picture height • Horizontal Frame Resolution – number of pixels in picture width • Spatial Resolution – Vertical x Horizontal Resolution • Temporal Resolution – Rapid succession of different frames CS 598kn - Advanced Topics in Multimedia Systems

  18. Visual Representation (2) • Continuity of Motion • Minimum 15 frames per second • NTSC – 29.97 Hz repetition rate, 30 frames/sec • PAL – 25 HZ, 25 frames/sec • HDTV – 59.94 Hz, 60 frames/sec • Flicker Effect • is a periodic fluctuation of brightness perception. To avoid this effect, we need at least 50 refresh cycles per second. This is done by display devices using display refresh buffer • Picture Scanning • Progressive scanning – single scanning of a picture • Interlaced scanning – the frame is formed by scanning two pictures at different times, with the lines interleaved, such that two consecutive lines of a frame belong to alternative field (scan odd and even lines separately) • NTSC TV uses interlaced scanning to trade-off vertical with temporal resolution • HDTV and computer displays are high spatio-temporal videos and use progressive scanning. CS 598kn - Advanced Topics in Multimedia Systems

  19. Video Color Encoding (3) • During the scanning, a camera creates three signals: RGB (red, greed and blue) signals. • For compatibility with black-and-white video and because of the fact that the three color signals are highly correlated, a new set of signals of different space are generated. • The new color systems correspond to the standards such as NTCS, PAL, SECAM. • For transmission of the visual signal we use three signals: 1 luminance (brightness- basic signal) and 2 chrominance (color signals). • YUV Signal (Y = 0.3R+0.59G+0.11B, U = (B-Y) x 0.493, V=(R-Y) x 0.877) • Coding ratio between components Y,U,V 4:2:2 • In NTSC signal the luminance and chrominance signals are interleaved; • The goal at the receiver is : (1) separate luminance from chrominance components, and (2) avoid interference between them (cross-color, cross luminance) CS 598kn - Advanced Topics in Multimedia Systems

  20. Basic Concepts of Image Formats • Important Parameters for Captured Image Formats: • Spatial Resolution (pixels x pixels) • Color encoding (quantization level of a pixel: e.g., 8-bit, 24-bit) • Examples: `SunVideo' Video Digitizer Board allows pictures of 320 by 240 pixels with 8-bit gray-scale or color resolution. • For a precise demonstration of image basic concepts try the program xv which displays images and allows to show, edit and manipulate the image characteristics. • Important Parameters for Stored Image Formats: • Images are stored as a 2D array of values where each value represents the data associated with a pixel in the image (bitmap or a color image). • The stored images can use flexible formats such as the RIFF (Resource Interchange File Format). RIFF includes formats such as bitmats, vector-representations, animations, audio and video. • Currently, most used image storage formats are GIF (Graphics Interchange Format), XBM (X11 Bitmap), Postscript, JPEG (see compression chapter), TIFF (Tagged Image File Format), PBM (Portable Bitmap), BMP (Bitmap). CS 598kn - Advanced Topics in Multimedia Systems

  21. Digital Video • The process of digitizing analog video involves • Filtering, sampling, quantization • Filtering is employed to avoid the aliasing artifacts of the follow-up sampling process • Filtered luminance and chrominance signals are sampled to generate a discrete time signal • Digitization means sampling gray/color levels in the frame at MxN array of points • The minimum rate at which each component (YUV) can be sampled is the Nyquist rate and corresponds to twice the signal bandwidth • Once the points are sampled , they are quantized into pixels; i.e., the sampled value is mapped into integer. The quantization level depends on how many bits do we allocate to present the resulting integer (e.g., 8 bits per pixel, or 24 bits per pixel) CS 598kn - Advanced Topics in Multimedia Systems

  22. Digital Transmission Bandwidth • Bandwidth requirements for Images: • Raw Image Transmission Bandwidth:= size of the image:= spatial resolution x pixel resolution; • Compressed Image Transmission Bandwidth: = depends on the compression scheme(e.g., JPEG) and content of the image; • Symbolic Image Transmission bandwidth:= size of the instructions and variables carrying graphics primitives and attributes. • Bandwidth Requirements for Video: • Uncompressed Video Bandwidth:= image size x frame rate; • Compressed Video Bandwidth:= depends on the compression scheme(e.g., Motion JPEG, MPEG) and content of the video (scene changes). • Example: Assume the following video characteristics - 720,000 pixels per image(frame), 8 bits per pixel quantization, and 60 frames per second frame rate. The Video Bandwidth:= 720,000 pixels per frame x 8 bits per pixel x 60 fps • which results in HDTV data rate of 43,200,000 bytes per second = 345.6 Mbps When we use MPEG compression, the bandwidth goes to 34 Mbps wit some loss in image/video quality. CS 598kn - Advanced Topics in Multimedia Systems

  23. Compression Classification • Compression is important due to limited bandwidth • All compression systems require two algorithms: • Encoding at the source • Decoding at the destination • Entropy Coding • Lossless encoding • Used regardless of media’s specific characteristics • Data taken as a simple digital sequence • Decompression process regenerates data completely • Examples: Run-length coding, Huffman coding, Arithmetic coding • Source Coding • Lossy encoding • Takes into account the semantics of data • Degree of compression depends on data content • Examples: DPCM, Delta Modulation • Hybrid Coding • Combined entropy coding with source coding • Examples: JPEG, MPEG, H.263, … CS 598kn - Advanced Topics in Multimedia Systems

  24. Compression (1) Uncompressed Picture Picture Preparation Picture Processing Adaptive Feedback Quantization Entropy Encoding CS 598kn - Advanced Topics in Multimedia Systems

  25. Compression (2) • Picture Preparation • Analog-to-digital conversion • Generation of appropriate digital representation • Image division into 8x8 blocks • Fix the number of bits per pixel • Picture Processing (Compression Algorithm) • Transformation from time to frequency domain (e.g., Discrete Cosine Transformation – DCT) • Motion vector computation for motion video • Quantization • Mapping real numbers to integers (reduction in precision) • Entropy Coding • Compress a sequential digital stream without loss CS 598kn - Advanced Topics in Multimedia Systems

  26. Compression (3)(Entropy Encoding) • Simple lossless compression algorithm is the Run-length Coding, where multiple occurring bytes are grouped together as Number-OccuranceSpecial-CharacterCompressed-Byte. For example, 'AAAAAABBBBBDDDDDAAAAAAAA' can be encoded as `6!A5!B5!D8!A', where `!' is the special character. The compression ratio is 50% (12/24*100%). • Fixed-length Coding – each symbol gets allocated the same number of bits independent of frequency (L ≥ log2(N)), N – number of symbols • Statistical Encoding – each symbol has a probability of frequency (e.g., P(A) = 0.16, P(B) = 0.51, P(C) = 0.33) • The theoretical minimum average number of bits per codeword is known as Entropy (H). According to Shannon H = ΣPilog2Pi bits per codeword CS 598kn - Advanced Topics in Multimedia Systems

  27. Huffman Coding P(ACB) = 1 0 1 P(AC) = 0.49 P(B) = 0.51 0 1 Symbol Code A 00 C 01 B 1 P(A) = 0.16 P (C) = 0.33 CS 598kn - Advanced Topics in Multimedia Systems

  28. JPEG: Joint Photographic Experts Group • 6 major steps to compress an image: (1) block preparation, (2) DCT (Discrete Cosine Transform) transformation, (3) quantization, (4) further compression via differential compression, (5) zig-zag scanning and run-length coding compression, (6) Huffman coding compression • Quantization step represents the lossy step where we loose data in a non-invertible fashion. • Differential compression means that we consider similar blocks in the image and encode only the first block and for the rest of the similar blocks, we encode only differences between the previous block and current block. The hope is that the difference is a much smaller value, hence we need less bits to represent it. Also often the differences end up close to 0 and can be very well compressed by the next compression - run-length coding. • Huffman compression is a lossless statistical encoding algorithm which takes into account frequency of occurrence (not each byte has the same weight) CS 598kn - Advanced Topics in Multimedia Systems

  29. JPEG: Block Preparation RGB input data and block preparation Eyes responds to luminance (Y) more than chrominance (I and Q) CS 598kn - Advanced Topics in Multimedia Systems

  30. Image Processing • After image preparation we have • Uncompressed image samples grouped into data units of 8x8 pixels • Precision 8bits/pixel • Values are in the range of [0,255] • Steps in image processing • Pixel values are shifted into the range [-128,127] with center 0 • DCT maps values from time to frequency domain • S(u,v)= ¼ C(u)C(v)ΣΣS(x,y)cos(2x+1)uπ/16 cos(2y+1)vπ/16 • S(0,0) – lowest frequency in both directions – DC coefficient – determines the fundamental color of the block • S(0,1), …, S(7,7) – AC coefficients CS 598kn - Advanced Topics in Multimedia Systems

  31. Quantization • Goal of quantization is to throw out bits • Consider example: 1011012 = 45 (6bits) ; we can truncate this string to 4 bits: 10112 = 11 or to 3 bits 1012 = 5 (original value 40) or 1102 = 6 (original value 48) • Uniform quantization is achieved by dividing DCT coefficient value S(u,v) by N and round the result. • JPEG uses quantization tables CS 598kn - Advanced Topics in Multimedia Systems

  32. Entropy Encoding • After image processing we have quantized DC and AC coefficients • Initial step of entropy encoding is to map 8x8 plane into 64 element vector using Zig-Zag • DC Coefficient Processing – use • Difference coding • AC Coefficient Processing – apply • Run-length coding • Apply Huffman Coding on DC and • AC coefficients CS 598kn - Advanced Topics in Multimedia Systems

  33. MPEG: Motion Picture Experts Group • MPEG-1 was designed for video recorder-quality output (320x240 for NTSC) using the bit rate of 1.2 Mpbs. • MPEG-2 is for broadcast quality video into 4-6Mbps (it fits into the NTSC or PAL broadcast channel) • MPEG takes advantage of temporal and spatial redundancy. Temporal redundancy means that two neighboring frames are similar, almost identical. • MPEG-2 output consists of three different kinds of frames that have to be processed: • I (Intracoded) frames - self-contained JPEG-encoded still pictures • P (Predictive) frames - Block-by-block difference with the last frame • B (Bidirectional) frames - Differences with the last and next frames CS 598kn - Advanced Topics in Multimedia Systems

  34. The MPEG Standard • I frames - self-contained, hence they are used for fast forward and rewind operations in VOD applications • P frames code interfame differences. The algorithm searches for similar macroblocks in the current and previous frame, and if they are only slightly different, it encodes only the difference and the motion vector in order to find the position of the macroblock for decoding. • B frames - encoded if three frames are available at once: the past one, the current one and the future one. Similar to P frame, the algorithm takes a macroblock in the current frame and looks for similar macroblocks in the past and future frames. • MPEG is suitable for stored video because it is an asymmetric lossy compression. The encoding takes long time, but the decoding is very fast. • The frames are delivered at the receiver in the dependency order rather than display order, hence we need buffering to reorder the frames. CS 598kn - Advanced Topics in Multimedia Systems

  35. MPEG/Video I-Frames • I frames (intra-coded images) • MPEG uses JPEG compression algorithm for I-frame encoding • I-frames use 8x8 blocks defined within a macro-block. On these blocks, DCT is performed. Quantization is done by a constant value for all DCT coefficients, it means no quantization tables exist as it is the case in JPEG CS 598kn - Advanced Topics in Multimedia Systems

  36. MPEG/Video P-Frames • P-frames (predictive coded frames) requires previous I-frame and/or previous P-frame for encoding and decoding • Use motion estimation method at the encoder • Define match window within a given search window. Match window corresponds to macro-block, search window corresponds to an arbitrary window size depending how far away we are willing to look. • Matching methods: • SSD correlation uses SSD = Σi(xi-yi)2 • SDA correlation uses SAD = Σi|xi-yi| CS 598kn - Advanced Topics in Multimedia Systems

  37. MPEG/Video B-Frame • B-frames (bi-directionally predictive-coded frames) require information of the previous and following I and/or P-frame + - [ ½ x ( ) ] = DCT + Quant. + RLE 0001110 Motion Vectors (two) Huffman Coding CS 598kn - Advanced Topics in Multimedia Systems

  38. MPEG/Audio Encoding • Precision is 16 bits • Sampling frequency is 32 KHz, 44.1 KHz, 48 KHz • 3 compression methods exist: Layer 1, Layer 2, Layer 3 (MP3) Decoder is accepting layer 2 and layer 1 32 kbps-320kpbs, target 64 kbps Decoder is accepting layer 1 32kbps-384kbps, target 128 kbps 32kbps-448kbps, target 192 kbps Layer 3 Layer 2 Layer 1 CS 598kn - Advanced Topics in Multimedia Systems

  39. MPEG/System Data Stream • Video is interleaved with audio. • Audio consists of three layers • Video consists of 6 layers • (1) sequence layer • (2) group of pictures layer (Video Param, Bitstream Param, …) • (3) picture layer (Time code, GOP Param, …) • (4) slice layer (Type, Buffer Param, Encode Param, ….) • (5) macro-block layer (Qscale, …) • (6) block layer (Type, Motion Vector, Qscale,…) CS 598kn - Advanced Topics in Multimedia Systems

More Related