Chapter 4

Chapter 4 Variable – Length Huffman Codes

Unique Decodability We must always be able to determine where one code word ends and the next one begins. Counterexample: Suppose: s1 = 0 s2 = 1 s3 = 11 s4 = 00 0011 = s4s3 or s1s1s3 Unique decodability means that any two distinct sequences of symbols (of possibly differing lengths) result in distinct code words. 4.1, 2

Instantaneous Codes 1 1 1 s1 = 0 s2 = 10 s3 = 110 s4 = 111 s4 No code word is the prefix of another. By reading a continuous sequence of code words, one can instantaneously determine the end of each code word. Consider the reverse: s1 = 0 s2 = 01 s3 = 011 s4 = 111 0111……111 is uniquely decodable, but the first symbol cannot be decoded without reading all the way to the end. 0 0 0 decoding tree s1 s2 s3 4.3

Constructing Instantaneous Codes comma code: s1 = 0 s2 = 10 s3 = 110 s4 = 1110 s5 = 1111 modification: s1 = 00 s2 = 01 s3 =10 s4 = 110 s5 = 111 Decoding tree 0 1 0 1 0 1 s1 = 00 s2 = 01 s3 = 10 0 1 Notice that every code word is located on the leaves s4 = 110 s5 = 111 4.4

Kraft Inequality Theorem: There exists an instantaneous code for S where each symbol s S is encoded in radix r with length |s| if and only if Proof: () By induction on maximal length (depth) of the decoding tree, max{|s|: s  S}. For simplicity, pick r = 2 (the binary case). By IH, the leaves of T0, T1 satisfy the Kraft inequality. Basis: n = 1 Induction: n > 1 Prefixing one symbol at top of tree increases all the lengths by one, so 0 1 0 1 or 0,1 s1 s1 s2 T0 T1 <n <n 4.5

Basis: n = 1 Induction: n > 1 Same argument for radix r: 0 r 1 0 r1 …… …… s1 ………… sr T0 Tr-1 at mostr at mostr subtrees IH so adding at most r of these together gives ≤ 1 ⁯ Inequality in the binary case implies that not all internal nodes have degree 2, but if a node has degree 1, then clearly that edge can be removed by contraction. 4.5

Kraft Inequality (cont) Assign leaves to code words of length 1. Use remaining nodes as roots of further sub-trees, and proceed left-to-right, systematically via the greedy method. The only way this method could fail is if it runs out of nodes, but that would mean the Kraft sum exceeded 1. r 1 0 …… () Construct a code via decoding trees. Number the symbols s1,…,sq so that l1≤ … ≤ lq and assume the Kraft sum is: Ex: r = 2 1,3,3,3 r = 2 1,2,3,3 r = 2 1,2,2,3 0 1 0 1 0 1 1 0 1 0 1 0 0 1 0 0 1 ½ + ¼ + ¼ + ⅛ = 9⁄8 not used ½ + ⅛ + ⅛ + ⅛ = ⅞ ½ + ¼ + ⅛ + ⅛ = 1 4.5

Shortened Block Codes 0 s1 1 s1 0 0 s2 1 With exactly 2m symbols, we can form a set of code words each of length m:b1 …… bmbi {0,1}. This is a complete binary decoding tree of depth m. With < 2m symbols, we can chop off branches to get modified (shortened) block codes. 1 0 s2 0 1 0 s3 1 1 s3 0 s4 0 1 s4 1 s5 s5 Ex 2 Ex 1 4.6

McMillan Inequality Theorem: Suppose we have a uniquely decodable code in radix r of lengths l1 ≤ … ≤ lq . Then Idea: Show that uniquely decodable codes satisfy the same bounds as instantaneous codes. Use a multinomial expansion to see that Nk = the number of ways n symbols can form a coded message of length k, which must be ≤ rk since the code is unique (no. of messages must be less than no. of codewords). violated as n approaches infinity since lq is fixed. ⁯ Conclusion: We do not lose generality by considering instantaneous codes 4.7

Huffman Codes Goal: minimize the average message coded length given probabilities of the various symbols. WLOG p1≥ …… ≥ pq assuming l1 ≤ …… ≤ lq, for if pm < pn with lm < ln, then interchanging the encodings for sm and sn (and renaming subscripts to stay in order), new old > 4.8

Start with S = {s1, …, sq} the source alphabet. And consider B = {0,1} as our code alphabet (binary). First, observe that lq1 = lq, since the code is instantaneous, s<q cannot be a prefix of sq, so dropping the last symbol from sq (if lq > lq1) won’t hurt. Huffman algorithm: So, we can “combine” sq1 and sq into a superposition (sq1+sq) with probability (pq1+pq) and get a code for the reduced alphabet. For q = 1, assign s1 = ε . For q > 1, let sq-1 = (sq-1+sq) 0 and sq = (sq-1+sq) 1 Example: N. B. the case for q = 1 does not produce a valid code. 4.8

Huffman  Lavg s1, …, sq l1≤ … ≤ lq ≥ trying to show Argument that Huffman always produces a code of shortest average length Alternative  L By induction on q: Base Case: If q = 2, then obviously no shorter code exists Basis: 0 1 s1 s2 Assume p1 ≥ … ≥ pq are given Induction Step: q> 2 Given any code for s1, …, sq with minimal average length, we know that l1≤ … ≤ lq1 = lq = lq1,q+ 1, because we’ve already shown that total height= lq reduced code combined symbol sq1+sq 0 1 4.8

But by IH L′avg ≤ L′, and more importantly: the Huffman code also satisfies the equation: L′avg + (pq1 + pq) = Lavg because it shares the same properties: pq1 pq ….0 ….1 So the reduced code with always satisfy: pq1 + pq sq1+sq  Larg≤ L⁯ Example: p1 = 0.7 p2 = p3 = p4 = 0.1 See howLavg = 1.5 compares to log2 q = 2 4.8

Code Extensions Take p1 =⅔p2 = ⅓Huffman code gives s1 = 0 s2 = 1 Lavg = 1 Square the symbol alphabet to get: S2 : s1,1 =00s1,2 =01s2,1 =10s2,2 =11 p1,1 =4⁄9p1,2 =2⁄9p2,1 =2⁄9p2,2 =1⁄9 Apply Huffman to S2: s1,1 = 1; s1,2 = 01; s2,1 = 000; s2,2 = 001 But we are sending two symbols at a time! 4.10

Huffman Codes in radix r At each stage down, we merge the last (least probable) r states into l, reducing the # of states by r 1. Since we end with one state, we must begin with k∙(r  1) + 1 states. We pad out states with probability 0 to get this. Example: r = 4; k = 3 pads 4.11

Chapter 4

Chapter 4

Presentation Transcript

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4-4

Chapter 4

Chapter 4

Chapter 4 - 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

Chapter 4