560 likes | 1.1k Vues
LZW Compression. Heejin Park College of Information and Communications Hanyang University. Overview. Popular text compressors such as zip and Unix’s compress are based on the LZW (Lempel-Ziv-Welch) method.
E N D
LZW Compression Heejin Park College of Information and Communications Hanyang University
Overview • Popular text compressors such as zip and Unix’s compress are based on the LZW (Lempel-Ziv-Welch) method. • Character sequences in the original text are replaced by codes that are dynamically determined.
code 0 1 key a b Compression • Assume the letters in the text are limited to {a, b}. • In practice, the alphabet may be the 256 character ASCII set. • The characters in the alphabet are assigned code numbers beginning at 0. • The initial code table is:
code 0 1 key a b Compression • Original text = abababbabaabbabbaabba • Compress by scanning the original text from left to right. • Find longest prefix p for which there is a code in the code table. • Represent p by its code pCode and assign the next available code number to pc, where c is the next character in the text that is to be compressed.
code 0 1 2 key a b ab Compression • Original text = abababbabaabbabbaabba • Compressed text = • Findp = a,pCode = 0. • Take c = b • Representaby0and enterabinto the code table. • Compressed text = 0 (ahas been compressed)
3 ba code 0 1 2 key a b ab Compression • Original text =abababbabaabbabbaabba • Compressed text = 0 • Findp = b,pCode = 1. • Take c = a • Representbby1and enterbainto the code table. • Compressed text = 01
3 4 ba aba code 0 1 2 key a b ab Compression • Original text =abababbabaabbabbaabba • Compressed text = 01 • Findp = ab,pCode = 2. • Take c = a • Representabby2and enterabainto the code table. • Compressed text = 012
3 4 5 ba aba abb code 0 1 2 key a b ab Compression • Original text =abababbabaabbabbaabba • Compressed text = 012 • Findp = ab,pCode = 2. • Take c = b • Representabby2and enterabbinto the code table. • Compressed text = 0122
6 bab 3 4 5 ba aba abb code 0 1 2 key a b ab Compression • Original text =abababbabaabbabbaabba • Compressed text = 0122 • Findp = ba,pCode = 3. • Take c = b • Representbaby3and enterbabinto the code table. • Compressed text = 01223
6 7 baa bab 3 4 5 ba aba abb code 0 1 2 key a b ab Compression • Original text =abababbabaabbabbaabba • Compressed text = 01223 • Findp = ba,pCode = 3. • Take c = a • Representbaby3and enterbaainto the code table. • Compressed text = 012233
7 6 8 abba baa bab 3 4 5 ba aba abb code 0 1 2 key a b ab Compression • Original text =abababbabaabbabbaabba • Compressed text = 012233 • Findp = abb,pCode = 5. • Take c = a • Representabbby5and enterabbainto the code table. • Compressed text = 0122335
8 7 6 9 abbaa abba baa bab 3 4 5 ba aba abb code 0 1 2 key a b ab Compression • Original text =abababbabaabbabbaabba • Compressed text = 0122335 • Findp = abba,pCode = 8. • Take c = a • Representabbaby8and enterabbaainto the code table. • Compressed text = 01223358
8 7 6 9 abbaa abba baa bab 3 4 5 ba aba abb code 0 1 2 key a b ab Compression • Original text =abababbabaabbabbaabba • Compressed text = 01223358 • Findp = abba,pCode = 8. • Take c = null • Representabbaby8 • Compressed text = 012233588
9 7 6 8 abba baa bab abbaa 3 4 5 ba aba abb code 0 1 2 key a b ab Compression Code Table Representation • Dictionary. • Pairs are (key, element)=(key,code). • Operations are :get(key)andput(key, code) • Limit number of codes to 212. • Use a hash table. • Convert variable length keys into fixed length keys. • Each key has the form pc, where the string p is a key that is already in the table. • Replace pc with (pCode)c.
8 7 6 9 9 8 7 6 8a bab baa abba 3a 5a abbaa 3b 3 4 5 ba aba abb 3 4 5 1a 2a 2b code code 0 0 1 1 2 2 key key a a b b ab 0b Compression Code Table Representation Modified LZW compression dictionary
Decompression code 0 1 • Original text =abababbabaabbabbaabba • Compressed text =012233588 • Convert codes to text from left to right. • 0representsa. • Decompressed text =a • pCode=0andp = a. • p= afollowed by next text character(c) is entered into the code table. key a b
code 0 1 2 key a b ab Decompression • Original text = abababbabaabbabbaabba • Compressed text =012233588 • 1representsb. • Decompressed text =ab • pCode =1 and p = b. • lastP = afollowed by first character of p is entered into the code table.
code 0 1 2 key a b ab Decompression 3 • Original text = abababbabaabbabbaabba • Compressed text =012233588 • 2representsab. • Decompressed text =abab • pCode =2 and p = ab. • lastP = bfollowed by first character of p is entered into the code table. ba
code 0 1 2 key a b ab Decompression 3 4 • Original text = abababbabaabbabbaabba • Compressed text =012233588 • 2representsab. • Decompressed text =ababab • pCode =2 and p = ab. • lastP = abfollowed by first character of p is entered into the code table. ba aba
code code 0 0 1 1 2 2 key key a a b b ab ab Decompression 3 3 4 5 • Original text = abababbabaabbabbaabba • Compressed text =012233588 • 3representsba. • Decompressed text =abababba • pCode =3 and p = ba. • lastP = abfollowed by first character of p is entered into the code table. ba ba aba abb
6 bab code code code 0 0 0 1 1 1 2 2 2 key key key a a a b b b ab ab ab Decompression 3 3 3 4 4 5 • Original text = abababbabaabbabbaabba • Compressed text =012233588 • 3representsba. • Decompressed text =abababbaba • pCode =3 and p = ba. • lastP = bafollowed by first character of p is entered into the code table. ba ba ba aba aba abb
7 6 baa bab code code code code 0 0 0 0 1 1 1 1 2 2 2 2 key key key key a a a a b b b b ab ab ab ab Decompression 3 3 3 3 4 4 4 5 5 • Original text = abababbabaabbabbaabba • Compressed text =012233588 • 5 representsabb. • Decompressed text =abababbabaabb • pCode =5 and p = abb. • lastP = bafollowed by first character of p is entered into the code table. ba ba ba ba aba aba aba abb abb
6 8 6 7 bab bab abba baa code code code code code 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 key key key key key a a a a a b b b b b ab ab ab ab ab Decompression 3 3 3 3 3 4 4 4 4 5 5 5 • Original text = abababbabaabbabbaabba • Compressed text =012233588 • 8 represents???. • When a code is not in the table, its key is lastPfollowed by first character oflastP. • lastP = abb • So 8 representsabba. ba ba ba ba ba aba aba aba aba abb abb abb
6 9 6 7 7 6 8 bab abba abbaa baa bab baa bab code code code code code code 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 2 key key key key key key a a a a a a b b b b b b ab ab ab ab ab ab Decompression 3 3 3 3 3 3 4 4 4 4 4 5 5 5 5 • Original text = abababbabaabbabbaabba • Compressed text =012233588 • 8 representsabba. • Decompressed text =abababbabaabbabbaabba • pCode =8 and p = abba. • lastP = abbafollowed by first character of p is entered into the code table. ba ba ba ba ba ba aba aba aba aba aba abb abb abb abb
8 7 6 9 abbaa abba baa bab 3 4 5 ba aba abb code 0 1 2 key a b ab Decompression Code Table Representation • Dictionary. • Pairs are (key, element) = (code, what the code represents) = (code, codeKey). • Operations are :get(key)andput(key, code) • Keys are integers 0, 1, 2, … • Use a 1D array codeTable[code]= codeKey • Each code key has the form pc, where the string p is a code key that is already in the table. • Replace pc with (pCode)c.
Performance Evaluation • Time Complexity • Compression. • O(n) expected time, where n is the length of the text. • Decompression. • O(n) time, where n is the length of the decompressed text.
conclusion • Character sequences in the original text are replaced by codes that are dynamically determined. • The code table is not encoded into the compressed text, because it may be reconstructed from the compresses text during decompression.