1 / 25

Lempel-Ziv Compression Techniques

Lempel-Ziv Compression Techniques. Introduction to Run-Length Encoding Introduction to Lempel-Ziv Encoding: LZ77 & LZ78 Example 1: Encoding using LZ78 Example 2: Decoding using LZ78. Introduction to Run-Length Encoding.

keelia
Télécharger la présentation

Lempel-Ziv Compression Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lempel-Ziv Compression Techniques • Introduction to Run-Length Encoding • Introduction to Lempel-Ziv Encoding: LZ77 & LZ78 • Example 1: Encoding using LZ78 • Example 2: Decoding using LZ78

  2. Introduction to Run-Length Encoding • A run is defined as a sequence of identical characters. Run-length encoding assumes data has many runs. • Each character that repeats itself in a sequence three or more times is replaced by the single character and its frequency. • Data to be transmitted is first scanned to find the coding information before transmission. • For example, the message AAAABBBAABBBBBCCCCCCCCDABCBAAABBBBCCCD • is replaced by 4A3BAA5B8CDABCB3A4B3CD • Saving can be dramatic when long runs are present.

  3. Run-Length Encoding: Possible Problems • A problem arises if one of the characters being transmitted is a digit, as in 11111111111544444 which is represented as 1111554 (eleven 1s, one 5, five 4s). • A solution is instead of using a number n, a character can be used whose ASCII value is n. • For example, the run of 43 consecutive letters “c” is represented as +c (“+” has ASSCII code 43), and the run of 49 1s is coded as 11 (“1” has ASCII code 49). • This method is only modestly efficient for text files and widely used for encoding fax images. • The main advantage of this algorithm is simplicity and speed.

  4. Introduction to Lempel-Ziv Encoding • Data compression up until the late 1970's mainly directed towards creating better methodologies for Huffman coding. • An innovative, radically different method was introduced in1977 by Abraham Lempel and Jacob Ziv. • This technique (called Lempel-Ziv) actually consists of two considerably different algorithms, LZ77 and LZ78. • Due to patents, LZ77 and LZ78 led to many variants: • The zip and unzip use the LZH techique while UNIX's compress methods belong to the LZW and LZC classes.

  5. LZ78 Compression Algorithm Start with empty dictionary P = empty WHILE (there more characters in the char stream) C = next character in the char stream IF (the string P+C in the dictionary) P = P+C ELSE //(if P is empty,zero is its codeword) output the code word corresponding to P output C add the string P+C to the dictionary P = empty IF (P is not empty) output the code word corresponding to P END

  6. Example 1: LZ78 Compression 1 Example 1: Use the LZ78 algorithm to encode the message ABBCBCABABCAABCAAB Solution: The encoding process is presented below in which: • The column STEP indicates the number of the encoding step. • The column POS indicates the current position in the input data. • The column DICTIONARY shows what string has been added to the dictionary. The index of the string is equal to the step number. • The column OUTPUT presents the output in the form (W,C). W represents the index of prefix in the dictionary. • The output of each step decodes to the string that has been added to the dictionary.

  7. Example 1: LZ78 Compression Step 1 ABBCBCABABCAABCAAB POS = 1 C = empty P = empty

  8. Example 1: LZ78 Compression Step 2 ABBCBCABABCAABCAAB POS = 2 C = empty P = empty

  9. Example 1: LZ78 Compression Step 3 ABBCBCABABCAABCAAB POS = 3 C = empty P = empty

  10. Example 1: LZ78 Compression Step 4 ABBCBCABABCAABCAAB POS = 5 C = empty P = empty

  11. Example 1: LZ78 Compression Step 5 ABBCBCABABCAABCAAB POS = 8 C = empty P = empty

  12. Example 1: LZ78 Compression Step 6 ABBCBCABABCAABCAAB POS = 10 C = A P = BCA

  13. Example 1: LZ78 Compression Step 7 ABBCBCABABCAABCAAB POS = 14 C = empty P = empty

  14. Example 1: Coded Information in Bits • Now, we calculate the number of bits needed to represent the coded information <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> • The number of bits needed to represent each integer with indexn is at most equal to the number of bits needed to represent the (n-1)th index. • For example, the number of bits needed to represent the integer 3 within index 4 is equal to 2, because it takes two bits to express 3 (the (4-1)th index) in binary. • Thus, we see that the number of bits required when the string ABBCBCABABCAABCAAB is compressed is (1+8)+(1+8)+(2+8)+(2+8)+(2+8)+(3+8)+(3+8) = 70 bits

  15. LZ78 Decompression Algorithm At the start the dictionary is empty While (more code words in the code stream) W = code word C = character following it output the string of W + C add the string of W + C to the dictionary end

  16. Example 2: LZ78 Decompression Example 2: Use the above algorithm to decompress the compressed output sequence of Example 1, namely, <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>

  17. Example 2: LZ78 Decompression Step 1 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty

  18. Example 2: LZ78 Decompression Step 2 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty

  19. Example 2: LZ78 Decompression Step 3 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty

  20. Example 2: LZ78 Decompression Step 4 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty

  21. Example 2: LZ78 Decompression Step 5 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty

  22. Example 2: LZ78 Decompression Step 6 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty

  23. Example 2: LZ78 Decompression Step 7 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty

  24. LZ78: Concluding Remarks • LZ77 is based on a sliding window. Its output is very similar to that LZ78. • The compression ratio of LZ77 is similar to that of LZ78. • The biggest advantage of LZ78 over the LZ77 algorithm is the reduced number of string comparisons in each encoding step. • More application areas of LZ77 and LZ78: • LZ77: gzip, Squeeze, LHA, PKZIP, ZOO • LZ78: compress, GIF, CCITT (modems), ARC, PAK

  25. Exercises • How many bits are required to represent the codeword (2,C) generated on page 11? How many bits are required to represent the associated string without compression? • Use LZ78 to trace encoding the string SATATASACITASA. • Write a Java program that encodes a given string using LZ78. • Write a Java program that decodes a given set of encoded code words using LZ78.

More Related