4C8Dr. David Corrigan Jpeg and the DCT
2D DCT DCT Basis Functions
Each band is the same size and there are 64 bands in total so the entropy is
Slow DCT • Sledgehammer implementation for 8 point DCT • Each row multiply requires 8 multiplications and 7 additions • So for all 8 rows requires 64 multiplications and 56 additions. • For the full 2D transform it is 1024 mults and 896 adds per 8x8 block!!!
Fast DCT • Exploit Symmetry • So split Matrix T into two parts...
Fast DCT • split Matrix T into two parts, change y...
Fast DCT 8 “adds/subtractions” to calculate the RHS vectors and 2 x 16 multiplications and 2 x 12 additions for the matrix multiplications. = 32 adds and 32 multiplications Compare with 56 adds and 64 mults from before.
Fast DCT This sub-matrix can be simplified with symmetry again!
Fast DCT So the 16 mults and 12 adds for the 4 x 4 matrix multiplication can be replaced with 4 adds/subs to calculate the RHS vectors and 2 x 4 mults and 2 x 2 adds to do the matrix multiplications. In total that requires 8 mults and 8 adds for this operation and a further 8 add/subs and 16 mults and 12 adds from before. That is 28 adds and 24 mults in total
Fast DCT We can rewrite this operation as So the 4 mults and 2 adds are replaced with 2mults and 2 adds. So in total we have 2 adds and 2 mults and 2 adds and 4 mults for the non-symmetric 2*2 matrix mult + a further 4 adds and 12 adds and 16 mults for the non-symmetric 4*4 matrix mult + a further 8 adds which is 28 adds and 22 mults in total
JPEG and Colour Images • JPEG uses YCBCRcolourspace. • The chrominance channels are usually downsampled. • There are 3 commonly used modes • 4:4:4 – no chrominance subsampling • 4:2:2 – Every 2nd column in the chrominance channels are dropped. • 4:2:0 – Every 2nd column and row is dropped. • The DCT is applied separately on each channel.
Subjectively Weighted Quantisation • In JPEG it is standard to apply different thresholds to different bands
Subjectively Weighted Quantisation • These values are obtained by perceptual tests. • A user is asked to view an image of a particular size on at specified distance from the screen. • Usually a expressed as a proportion of the screen height. • User is presented with an image and is asked to increase the gain of a given band until he/she just notices a difference in the image. • Note typically a flat grey image is used to avoid masking effects caused by edges and texture • The set of form the quantisation matrix.
Subjectively Weighted Quantisation • Lower Frequency Bands are assigned lower step sizes. • There is a slight drop of in step size from the DC coefficient to low frequency coefficients. • The step sizes for the chrominance channels increase faster than for luminance.
Comparing Different Quantisations JPEG Uncompressed Qstep = Qlum
Comparing Different Quantisations Qstep = Qlum PSNR = 32.9 dB
Comparing Different Quantisations JPEG Uncompressed Qstep = 2 * Qlum PSNR = 30.6 dB
Comparing Different Quantisations Qstep = 15 Qstep = Qlum Qstep = 15 PSNR = 37.6 dB
Comparing Different Quantisations Qstep = 30 Qstep = Qlum Qstep = 30 PSNR = 33.4 dB
Comparing Different Quantisations PSNR indicates better quality for Qstep = 30 over Qstep = Qlum but this clearly is not true from a subjective analysis. Qstep = 30 Qstep = Qlum Qstep = 30 PSNR = 33.4 dB PSNR = 32.9 dB
Comparing Different Quantisations Using the subjectively weighted Quantisation instead of a fixed quantisation step size achieves higher levels of compression for equivalents levels of quality.
Challenges of JPEG Coding • Minimise average codeword length • RLC to encode the zeros. • we must take adavantage of spatial and inter-band correlations. • we need to consider how we order the data • Minimise the coding overhead • minimise the size of the huffmancodetable • we need to reduce the number of symbols we encode • This can affect optimality • Correct for Synchronisation Errors
JPEG Coding • The most obvious way might seem to code each band separately • ie. Huffman with RLC like we suggested with the Haar Transform. • We could get close to the entropy • This is not the way it is coded because • It would require 64 different codes. High cost in computation and storage of codebooks. • It ignores the fact that the zero coefficients occur at the same positions in multiple bands.
JPEG Coding • Instead we code each block separately • A block contains 64 coefficients, one from each band. • Each block contains 1 DC coefficient (from the top left band) and 63 AC coefficients • Two codebooks are used in total for all the blocks, one for the DC coefficients and the other for the AC coefficients. • At the end of each Block we insert an End Of Block (EOB) symbol in the datastream
Data Ordering • Each block covers is a 8x8 grid of coeffs • A Zig-Zag scan converts them into a 1D stream. • As most non-zero values occur in the top left corner using a Zig-Zag scan maximises the lengths zero runs so improves efficiency of RLC
Zig-Zag Scan Example Non-Zero values are at the top left corner of the block Zig-Zag scan concentrates the non-zero coefficients at the start of the stream -13, -3, 6, 0, 0, 2, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 36 more zeros, the end Typical DCT Block Coefficients
Coding the DC Coefficients Differential Coding
Coding the DC Coefficients This value is actually the difference between the dc coefficient of the current and previous blocks Typical DCT Block Coefficients
Coding DC Coefficients • There is potentially a large number of levels to encode. • Up to 4096 depending on the quantization step size. • We break down the symbol value into a size index pair
Coding DC Coefficients • So if the DC value is -13 • The size is 4 • The index is 0010 • In JPEG only the size is encoded using Huffman • The index is uncoded, efficiency is not dramatically affected. • Only 12 codes required in huffman table • Table size is 16 + 12 = 28 bytes
if , expressed as a binary number if expressed as a binary number. So if then and which is 00100 if value = 32 then size = 6 and index = 100000 The number of bits in the index value is always equal to the value of size.
Coding the AC Coefficients Size/Index Pair for DC coefficient The length of the run and the value of the coeff after it are strongly correlated 40010, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end The block usually ends with a long run of zeros Typical DCT Block Coefficients
Coding the AC Coefficients • Code/Size Correlations • High coeffs follow short runs and low coeffs follow long runs • Final run of zeros • These don’t need to be coded • Just tell the encoder that there are no more non-zero coefficients and move onto the next block.
Symbols Run/Coefficient Symbols eg. 0, 0, 9 is a run of 2 zeros followed by a 9 However we represent 9 using the size/index format from the dc coeffs 9 has a size of 4 and an index 1001 So we code the run/size pair (2,4) and the index 1001 is appended to the stream
Symbols • Run/Size Symbols • All possible combinations of runs from 0->15 and size from 1->10 • 160 total symbols • Huffman Codes are used for each symbol • Index values are not coded further
Special Symbols • ZRL • Used to represent a run of 16 zeros • Used when the run of zeros is greater than 15 • Eg. 17 zeros, 14 - is coded as (ZRL) (1,4) 1110 • EOB • Inserted when a block ends with a run of zeros In total there are 160 run/size symbols and 2 special symbols 162 symbols to 2 encode codetable is 16 + 162 = 178 bytes
Coding Example -13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end DC Coefficient is -13. The size is 4 and the index is 0010 Typical DCT Block Coefficients Current Stream State: 40010
Coding Example -13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end The first ac value is -3. That is a run of 0 zeros followed by -3. -3 has size 2 and index 0000 Therefore the run/size pair is (0,2) Current Stream State: 40010 (0,2) 00
Coding Example -13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end The next ac value is 6. That is a run of 0 zeros followed by 6. 6 has size 3 and index 110 Therefore the run/size pair is (0,3) Current Stream State: 40010 (0,2) 00 (0,3) 110
Coding Example -13, -3, 6, 2 zeros, 2, 3 zeros, -1, 17 zeros, 1, 36 more zeros, the end The next ac value to encode is a run of 2 zeros followed by a ac coefficient 2. 2 has size 2 and index 10 Therefore the run/size pair is (2,2) Current Stream State: 40010 (0,2) 00 (0,3) 110 (2,2) 10