Mastering Image Compression Techniques for Efficient Data Storage

Chapter 4: Compression(Part 2) Image Compression

Acknowledgement • Some figures and pictures are taken from: The Scientist and Engineer's Guide to Digital Signal Processing by Steven W. Smith

Lossy compression • Motivations: • Uncompressed images, video and audio data are huge, e.g., in HDTV, bit rate easily exceeds 1Gbps. • Lossless methods (Huffman, Arithmetic, LZW) are inadequate for images and video because the spatial and/or temporal redundancy of pixel values are not exploited. • Special characteristics of human perception (e.g., more sensitive to low spatial frequencies) should be taken advantage of to achieve a higher compression ratio.

Spatial sensitivity a higher spatial frequency requires a larger contrast

Vector quantization (VQ) • A general lossy compression technique • Scalar quantization: 3,200,134 ~ 3M • VQ: a generalization of scalar quantization: subjects to be quantized are vectors. • VQ can be viewed as a form of pattern recognition where an input pattern (a vector) is approximated by one of a predetermined set of standard patterns. “Doesn’t quantization mean round the figure? So how can people get slim with it?” Benny.

k C Q N Vector Vector quantization (Def’n) • A vector quantizer Q of dimension k and size N is a mapping from a vector in a k-dimensional Euclidean space into a finite set C containing N output or reproduction points, called code vectors. • C: the codebook (with N vectors).

Vector quantization (Def’n) • The rate of Q is r = (log2N)/k = number of bits per vector component used to represent the input vector. • Two issues: • how to match a vector to a code vector (pattern recognition), • how to set the codebook.

Searching the codebook • Given a vector, we need to search the codebook (finding an index) for a code vector that gives the minimum distortion. • Squared error distortion: vector to be coded code vector the ith component of vector x

cell Q find the centroid cell training set Codebook training • Get a large sample of data (the training set). • Pick an initial set of code vectors. • Partition the training set into cells. • Use the cells to tune the codebook. • Repeat.

Codebook training • Step 1 • Given a training set, X, with M vectors • Let d = the mean square distortion measure • Let the iteration index be j and set j=1 • Select an initial codebook C0 • Set initial distortion d0 = infinity • Pick a convergence threshold E

Codebook training • Step 2 • Optimally encode all vector x within X using Cj-1 • Assign x to cell Pi,j-1 if x is quantized as yi,j-1 where yi,j-1is the i-th codevector in Cj-1 • Compute dj = sum of all vector distortions • If (dj-1 – dj ) / dj < E then quit with codebook = Cj-1; otherwise go to step 3.

Codebook training • Step 3 • Update the codevectors asyi,j = the average of all the vectors assigned to cell Pi,j-1 (i.e., the centroid). • j++; go to Step 2.

Codebook training (illustration) codebook training vectors

Codebook training (illustration) codebook d([25,10,24],[25,33,40]) = 785 d([25,10,24],[13,53,61]) = 3362 d([25,10,24],[20,88,30]) = 6145 d([25,10,24],[21,10,24]) = 16 training vectors

Codebook training (illustration) codebook [30,30,30]+[28,28,29]+[28,29,28] = [28,29,29] 3 training vectors

Codebook training (illustration)

VQ and image compression • A simple way of applying VQ to image compression is to decompose an image into a number of (say) 22 blocks. Each block then derives a 4-element vector. • Instead of encoding the pixel values of a block, one trains a code book and encodes a block by an index into the code book. • To train a code book, a number of images of similar nature are used • e.g., facial images are used to train a code book for compressing facial images

Image & video compression • JPEG: spatial redundancy removal in intra-frame coding. • H.261 and MPEG: both spatial and temporal redundancy removal in intra-frame and inter-frame coding.

Sub-sampling techniques • Sub-sample to compress. Interpolation techniques are used upon reconstruction of the original data. • Sub-sampling results in information loss. However, the loss is acceptable by the virtue of the physiological characteristics of human eyes. • Chromatic sub-sampling: Human eye is more sensitive to changes in brightness than to color changes. Very often, RGB values are transformed to Y’CBCR values. The chroma components are then sub-sampled to reduce the data requirement.

Chromatic sub-sampling • 4:2:2 sub-sample color signals horizontally by a factor of 2 (CCIR 601 standard). • 4:1:1 sub-sample horizontally by a factor of 4. • 4:2:0 sub-sample in both dimensions by a factor of 2. • 4:2:0 is often used in JPEG and MPEG.

Chromatic sub-sampling(notation) chroma horizontal sampling luma horizontal sampling reference either same as the 2nd digit; or 0, indicating that CB and CR are vertically sub-sampled at a factor of 2. 4:2:2

Example: a frame with pixel dimensions of 720  480:

JPEG compression • JPEG stands for “Joint Photographic Experts Group”. • JPEG is commonly used to refer to a standard for compressing and encoding continuous-tone still images. • adjustable compression/quality • 4 modes of operations: • Sequential (line-by-line) (baseline implementation) • Progressive (blur-to-clear) • Lossless (pixel-for-pixel) • Hierarchical (multiple resolutions)

Uncom- pressed picture Compressed picture picture preparation picture processing quantization entropy encoding JPEG (steps) 1. Preparation • includes analog-to-digital conversion. Image can be separated in Y’CBCR components to facilitate sub-sampling on the chrominance components. The image is segmented into 88 blocks. 2. Processing • sophisticated algorithms, such as transformation from time to frequency domain using DCT.

1/4 1/2 1/2 1/2 1/4

JPEG (steps) 3. Quantization • map real-number values from the previous step to integers. This process results in loss of precision, but achieves data compression. • It specifies the granularity of the mapping, allowing control of the precision carried in the compressed data. • Different levels of quantization are applied to the luminance and chrominance components, exploiting the sensitivity of human perception.

JPEG (steps) 4. Entropy encoding • It compresses a sequential data stream without loss. Steps of zigzag scan to linearize the data. Predictive encoding and RLE are used to encode the DC and AC components. Finally, Huffman scheme to encode the data.

JPEG (schematic diagram) CB Y’CBCR CR

Image preparation • Each image consists of a number of components (e.g., RGB, Y’CBCR). • Divide each component into 8  8 blocks. • Each block is a “data unit” subject to DCT transformation. • The values in a block are shifted from unsigned integers with range [0, 2p-1] to signed integers with range [-2p-1, 2p-1-1]. • e.g., in 8-bit mode, the range [0,255] is shifted to [-128,127].

DCT (Discrete Cosine Transform) • An 8  8 image block is a 2D function f(x,y)(0  x, y  7) in spatial domain. 7 6 5 y 4 3 2 1 x 0 0 1 2 3 4 5 6 7

DCT (Discrete Cosine Transform) • We define 64 basis functions for frequency variables u, v (0  u, v  7) in a 2-dimensional space: • e.g.,

DCT (Discrete Cosine Transform) • These are wave functions of successively increasing frequencies. (Imagine them as undulating surfaces of increasingly frequent ups and downs.) • Given a 2D function (imagine it as a 2D surface), one can decompose it into a linear combination of these wave functions. • So, DCT is a frequency (uv coordinates) representation of a spatial (xy coordinates) function.

A 1-D example

u0v0 u2v2 u5v1 u1v0 u6v3 u0v1 Some 2-D Basis Functions y u1v1 x

y x Some 2-D basis functions with quantized values

DCT • The 64 (8  8) DCT basis functions (top view) are:

DCT coefficients- example - An 8  8 block DCT coefficients after transformation in x,y co-ordinates in u,v co-ordinates

DCT • From the original spatial function f(x,y), extract the frequency components by multiplying f(x,y) with these basis functions.

DCT • The result is a function F(u,v) in frequency domain, 64 (8  8) coefficients representing the 64 frequency components of the original image function. • Of the 64 coefficients, F(0,0) is due to the basis function of u,v = 0, a flat wave function. F(0,0) is also known as the DC-coefficient. • The other coefficients are called the AC-coefficients.

DCT • The DC component determines the fundamental gray (color) intensity of the 8  8 pixels. The AC components add the intensity variation to the pixel values to give the original image function. • Typical image consists of large regions of single intensity and color. DCT thus concentrates most of the signal in the lower spatial frequencies. Many of the high-frequency coefficients are of very low values. Entropy encoding applied to the DCT would normally achieve high data reduction.

IDCT • The inverse of DCT (IDCT) takes the 64 DCT coefficients and reconstructs a 64-point output image by summing the basis signals. • The result is a summation of all the frequency components, yielding a reconstruction of the original image. (Imagine adding up the respective undulating surfaces to yield the original surfaces.)

for the “eye” block

8 16 24 32 40 48 56 64 DCT 100 -52 0 -5 0 -2 0 0.4 truncate 100 -52 0 -5 0 -2 0 0.4 IDCT 8 15 24 32 40 48 57 63 DCT • A 1-D example to illustrate the decomposition and reconstruction.

Mastering Image Compression Techniques for Efficient Data Storage

Mastering Image Compression Techniques for Efficient Data Storage

Presentation Transcript

Chapter 4 Part 2

Chapter 4 Part 2

Chapter 2 part #4 Operator

CHAPTER 4

1984- Part 2 Chapter 4

Chapter 4 3-4

Chapter 4

Chapter 4

Chapter 4: Part 2

Chapter 4- Part 2

VIDEO COMPRESSION FUNDAMENTALS, part 2

Chapter 4

Chapter 4 Part 2

Chapter 4 Part 2

Chapter 4 - Part 2

Chapter 4: Part 2

Chapter 4 Part 2

Chapter 4

Chapter 4 Part 2

Chapter 4