exercise in the previous class

prob. 0.363 0.174 0.143 0.098 0.087 0.069 0.045 0.021 0 A B C D E F G H 1 exercise in the previous class • binary Huffman code? • average codeword length? 0.363×1+0.174×3+...+0.021×5=2.660 1.000 0.637 0.359 0.278 0.185 0.135 0.066 0.363 A 0.174 B 0.143 C 0.098 D 0.087 E 0.069 F 0.045 G 0.021 H 0 100 110 1010 1011 1110 11110 11111

prob. 0.363 0.174 0.143 0.098 0.087 0.069 0.045 0.021 A B C D E F G H exercise in the previous class • 4-ary Huffman code? [basic idea] join four trees • we may have #trees < 4 in the final round. • with one “join”, 4 – 1 = 3 trees disappear. • add dummy nodes, start with 3k+1 nodes. ? dummy 1.000 d 0.320 a 0.066 b c 0.363 A 0.174 B 0.143 C 0.098 D 0.087 E 0.069 F 0.045 G 0.021 H 0 * 0 * a b c da db dc dda ddb

today’s class • basic properties needed for source coding • uniquely decodable • immediately decodable • Huffman code • construction of Huffman code • extensions of Huffman code • theoretical limit of the “compression” • related topics today

today’s class (detail) • Huffman codes are good, but how good are they? • Huffman codes for extended information sources • possible means (手段) to improve the efficiency • Shannon’s source coding theorem • the theoretical limit of efficiency • some more variations of Huffman codes • blocks of symbols with variable block length math. algorithm math. algorithm

0 0 0 1 1 1 how should we evaluate Huffman codes? good code • immediately decodable...“use code trees” • small average codewordlength (ACL)  It seems that Huffman’s algorithm gives a good solution. To see that Huffman codes are really good, we discuss a mathematical limit of the ACL • ...under a certain assumption (up to the slide 11) • ...in the general case (Shannon’s theorem)

theoretical limit under an assumption assumption • the encoding is done in a symbol-by-symbol manner • define one codeword for each symbol of the source S • S produces M symbols with probabilities p1, ..., pM Lemma (restricted Shannon’s theorem): • for any code, the ACL ≥ H1(S) • a code with ACL ≤H1(S)+1 is constructible H1(S) is the borderline of “possible” and “impossible”.

Shannon’s lemma (bad naming...) To prove the restricted Shannon’s theorem a small technical lemma (Shannon’s lemma) is needed. Shannon’s lemma (シャノンの補助定理） For any non-negative numbers q1, ..., qM with q1 +...+ qM≤ 1, with the equation holds if and only if pi = qi. remind: p1, ..., pM are symbol probabilities and p1 +...+ pM= 1

proof (sketch) left hand side – right hand side= y = – logex 1 O y = 1 – x the equation holds iffqi/pi = 1

proof of the restricted Shannon’s theorem: 1 for any code, the average codeword length ≥ H1(S) Let l1, ..., lM be the length of codewords, and define . • Kraft: • Shannon’s Lemma: • the ACL • We have shown that L ≥ H1(S).

proof of the restricted Shannon’s theorem: 2 a code with average codeword length ≤H1(S)+1 is constructible Choose integers l1, ..., lM so that . • The choice makes , and ... Kraft’s inequality • We can construct a code with codeword length l1, ..., lM, whose ACL is

the lemma and the Huffman code Lemma (restricted Shannon’s theorem): • for any code, the ACL ≥ H1(S) • a code with ACL ≤H1(S)+1 is constructible We can show that, for a Huffman code, • L≤H1(S) + 1 • there is no symbol-by-symbol code whose ACL is smaller than L. proof ... by recursion on the size of code trees • A Huffman code is said to be a compact code.

symbol A B average prob. 0.8 0.2 C1 0 1 1.0 C2 1 0 1.0 A A A B C C 10 11 11 0 0 0 coding for extended information sources The Huffman code is the best symbol-by-symbol code, but... • the ACL  1 • not good for encoding binary information sources If we encode severalsymbols in a block, then... • the ACL per symbolcan be < 1 • good for binary sources also A B A C C A 10 110 01

block Huffman coding message ABCBCBBCAA... • fixed-length (equal-, constant-) • variable-length (unequal-) • block partition • run-length “block” operation blocked message ABCBCBBCAA... Huffman encoding codewords 01 10 001 1101...

fixed-length block Huffman coding • ACL: 0.6×1+0.3×2+ 0.1×2 = 1.4 bit for one symbol prob. 0.6 0.3 0.1 codeword 0 10 11 A B C blocks with two symbols • ACL: 0.36×1+ ... + 0.01×6 = 2.67 bit, but this is for two symbols • 2.67 / 2 = 1.335 bit for one symbol prob. 0.36 0.18 0.06 0.18 0.09 0.03 0.06 0.03 0.01 codeword 0 100 1100 101 1110 11110 1101 111110 111111 AA AB AC BA BB BC CA CB CC improved!

block coding for binary sources • ACL: 0.8×1+ 0.2×1 = 1.0 bit for one symbol prob. 0.8 0.2 codeword 0 1 A B blocks with two symbols • ACL: 0.64×1+ ... + 0.04×3 = 1.56 bit for two symbols • 1.56 / 2 = 0.78 bit for one symbol prob. 0.64 0.16 0.16 0.04 codeword 0 10 110 111 AA AB BA BB improved!

block size 1 2 3 : ACL per symbol 1.0 0.78 0.728 : the block length prob. 0.512 0.128 0.128 0.032 0.128 0.032 0.032 0.008 codeword 0 100 101 11100 110 11101 11110 11111 blocks with three symbols • ACL: 0.512×1+ ... + 0.008×5 = 2.184 bit for three symbols • 2.184 / 3 = 0.728 bit for one symbol AAA AAB ABA ABB BAA BAB BBA BBB larger block size  more compact

block code and extension of information source What happens if we increase the block length further? Observe that... • a block code defines a codewordfor each block pattern. • one block = a sequence of n symbols of S = one symbol of Sn, the n-th order extension of S  restricted Shannon’s theorem is applicable: H1(Sn) ≤ Ln < H1(Sn) + 1 • Ln = the ACL for n symbols • for one symbol of S,

Shannon’s source coding theorem • H1(Sn) / n ... the n-th order entropy of S (→ Apr. 12) • If n goes to the infinity... Shannon’s source coding theorem: for any code, the ACL ≥ H(S) a code with ACL ≤H(S) + εis constructible

what the theorem means • Shannon’s source coding theorem: • for any code, the ACL ≥ H(S) • a code with ACL ≤H(S) + ε is constructible • Use block Huffman codes, and you can approach to the limit. • You never overcome the limit. however. prob. 0.8 0.2 block size 1 2 3 : ACLper symbol 1.0 0.78 0.728 : 0.723 + ε A B H(S) = 0.723

remark 1 Why block codes give smaller ACL? • fact 1: the ACL is minimized by a real-numbersolution • if P(A) = 0.8, P(B) = 0.2, then we want l1 and l2 with... s.t. • fact 2: the length of a codeword must be an integer ...loss! ...gain! s.t. and integers frequent loss, seldom gain...

remark 1 (cnt’d) • the gap between the ideal and the realcodeword lengths: ... is an integer approximation of • the gap is weighted by the probability… the weighted gap long block  many symbols  small probabilities  small weighted gaps  close to the ideal ACL p

today’s class (detail) • Huffman codes are good, but how good are they? • Huffman codes for extended information sources • possible means (手段) to improve the efficiency • Shannon’s source coding theorem • the theoretical limit of efficiency • some more variations of Huffman codes • blocks of symbols with variable block length math. algorithm math. algorithm

practical issues (問題) of block coding • Theoretically saying, the block Huffman codes are the best. • From practical viewpoint, there are several problems: • We need to know the probability distribution in advance. (this will be discussed in the next class) • We need a large table for the encoding/decoding. • if one byte is needed to record one entry of the table... • 256 byte table, if block length = 8 • 64 Kbyte table, if block length = 16 • 4 Gbytetable, if block length = 32

use blocks with variable-length prob. 0.512 0.128 0.128 0.032 0.128 0.032 0.032 0.008 codeword 0 100 101 11100 110 11101 11110 11111 If we define blocks so that they have the same length, then ... • some blocks have small probabilities • those blocks also need codewords If we define blocks so that they have similar probabilities, then ... • length differ from block by block • the table has little useless blocks AAA AAB ABA ABB BAA BAB BBA BBB prob. 0.512 0.128 0.16 0.2 codeword 0 100 101 11 AAA AAB AB B

definition of block patterns Block patterns must be defined so that... the patterns can represent (almost) all symbol sequences. • bad example: block pattern = {AAA, AAB, AB} AABABAAB AAB AB AAB AABBBAAB AAB ? two different approaches are well-known; • block partition approach • run-length approach

AA AB B 0.64 0.16 0.2 A B 0.8 0.2 define patterns with block partition approach • prepare all blocks with length one • partition the block with the largest probability by appending one more symbol • go to 2 Example: P(A) = 0.8, P(B) = 0.2 codewords AAA AAB AB B 0.512 0.128 0.16 0.2 0 100 101 11

how good is this? to determine the average codeword length, assume that n blocks are produced from S: AAA AAB AB B 0.512 0.128 0.16 0.2 0 100 101 11 S AAA AB AAA B AB ... 0 101 0 11 101 ... encode • 0.512n×1 + 0.128n×3 ... • = 1.776n bits • 0.512n×3 + 0.128n×3 ... • = 2.44n symbols 2.44n symbols are encoded to 1.776n bits  the average codewordlength is 1.776n / 2.44n = 0.728 bit (almost the same as the block length = 8, p. 16, but small table)

define patterns with run-length approach run = a sequence of consecutive (連続の) identical symbol Example: divide a message into runs of “A”: A B B A A A A A B A A A B run of length = 3 run of length = 1 run of length = 0 run of length =5 The message is constructible if the lengths of runs are given.  define blocks as runs of various length

upper-bound the run-length small problem? ... there can be very long run  put an upper-bound limit : run-length limited (RLL) coding upper-bound = 3 • ABBAAAAABAAAB is represented as • one “A” followed by B • zero“A” followed by B • three or more “A”s followed by B • two“A”s followed by B • three or more “A”s followed by B • zero “A” followed by B run length 0 1 2 3 4 5 6 7 : representation 0 1 2 3+0 3+1 3+1 3+3+0 3+3+1 :

run-length Huffman code • Huffman code defined to encode the length or runs • effective when there is strong bias on the symbol probabilities p(A) = 0.9, p(B) = 0.1 run length 0 1 2 3 or more block pattern B AB AAB AAA prob. 0.1 0.09 0.081 0.729 codeword 10 110 111 0 • ABBAAAAABAAAB: 1, 0, 3+, 2, 3+, 0 ⇒ 110 10 0 111 0 10 • AAAABAAAAABAAB: 3+, 1, 3+, 2, 2 ⇒ 0 110 0 111 111 • AAABAAAAAAAAB: 3+, 0, 3+, 3+, 2 ⇒ 0 10 0 0 111

example of various block coding • S: memoryless & stationary. P(A) = 0.9, p(B) = 0.1 • the entropy of S is H(S) = –0.9log20.9 – 0.1log20.1=0.469 bit symbol A B prob. 0.9 0.1 codeword 0 1 • code 1: a naive Huffman code average codeword length = 1 • code 2: fixed-length (3bit) average codeword length = 1.661/3symbols = 0.55/symbol AAA AAB ABA ABB 0.729 0.081 0.081 0.009 0 100 110 1010 BAA BAB BBA BBB 0.081 0.009 0.009 0.009 1110 1011 11110 11111

example of various block coding (cnt’d) with n blocks... • 0.1n×1 + ... + 0.478n×7 = 5.215n symbols • 0.1n×3 + ... + 0.478n×1 = 2.466n bits the average codeword length per symbol = 2.466 / 5.215 = 0.47 • code 3: run-length Huffman (upper-bound = 8) length 0 1 2 3 prob. 0.1 0.09 0.081 0.073 codeword 110 1000 1001 1010 length 4 5 6 7+ prob. 0.066 0.059 0.053 0.478 codeword 1011 1110 1111 0

summary of today’s class • Huffman codes are good, but how good are they? • Huffman codes for extended information sources • possible means (手段) to improve the efficiency • Shannon’s source coding theorem • the theoretical limit of efficiency • some more variations of Huffman codes • blocks of symbols with variable block length

exercise • Write a computer program to construct a Huffman code for a given probability distribution. • Modify the above program so that it can handle fixed-length block coding. • Give distribution, change the block length, and observe how the average codeword length changes according to the change.

exercise in the previous class