1 / 11

Squishin ’ Stuff

Squishin ’ Stuff. Huffman Compression. Data Compression. Begin with a computer file (text, picture, movie, sound, executable, etc ) Most file contain extra information or redundancy Goal: Reorganize the file to remove the excess information and redundancy

bin
Télécharger la présentation

Squishin ’ Stuff

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Squishin’ Stuff Huffman Compression

  2. Data Compression • Begin with a computer file (text, picture, movie, sound, executable, etc) • Most file contain extra information or redundancy • Goal: Reorganize the file to remove the excess information and redundancy • Lossless Compression: Compress the file in such a way that none of the information is lost (good for text files and executables) • Lossy Compression: Allow some information to be thrown away in order to get a better level of compression (good for pictures, movies, or sounds) • Many, many, many algorithms out there to compress files • Different types of files work best with different algorithms (need to consider the structure of the file and how things are connected). • We’re going to focus on Huffman compression which is used many compression programs, most notably winzip. • We’re just going to play with text files.

  3. Text Files • Each character is represented by one byte. Each byte is a sequence of 8 bits (1’s and 0’s) (ASCII code). • International standard for how a character is represented. • A 01000001 • B 01000010 • ~ 01111110 • 3 00110011 • Most text files use less than 128 characters; this code has room for 256. Extra information!! • Goal: Use shorter codes to represent more frequent characters. • You have seen this before…

  4. Morse Code

  5. Example

  6. Example

  7. RAWA AWIS RINBABBE • That didn’t work. • If we do this, we need a way to know when a letter stops. • Huffman coding provides this, though we’ll lose some compression. • Huffman Coding • Named after some guy called Huffman (1952). • Use a tree to construct the code, and then use the tree to interpret the code.

  8. Huffman Chart

  9. Issues and Problems

  10. Issues and Problems

  11. What’s the best you can do? • Obviously, there is a limit to how far down you can compress a file. • Assume your file has n different characters in it, say a1…an, each with probability p1…pn (so p1+p2+…+pn = 1). • The entropy of the file is defined to be negative of the sum of pilog2(pi). • Measures the least number of bits, on average, needed to represent a character. • For my name, the entropy is 3.12 (takes at least 3.12 bits per character to represent my name). Huffman gave an average of 3.19 bits per character. • Huffman compression will always give an average that is within one bit of entropy.

More Related