1 / 9

File Compression

File Compression. Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take up less space on the disk We can save space by taking advantage of the fact that most files have a relatively low “information content”.

davidbarker
Télécharger la présentation

File Compression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File Compression • Even though disks have gotten bigger, we are still running short on disk space • A common technique is to compress files so that they take up less space on the disk • We can save space by taking advantage of the fact that most files have a relatively low “information content” Compression

  2. Run Length Encoding • The simplest type of redundancy in a file is long runs of repeated characters • AAAABBBAABBBBBCCCCCCCC • This string can be represented more compactly by replacing each repeated string with a single occurrence of the character and a count • 4A3B2A5B8C • For binary files a refined version of this method can yield dramatic savings Compression

  3. Variable Length Encoding • Suppose we wish to encode • ABRACADABRA • Instead of using the standard 8 (or 16) bits to represent these letters, why not use 3? • A = 000 000001100000010000011000001100000 • B = 001 • C = 010 • D = 011 • R = 100 Compression

  4. We can do better!! • Why use the same number of bits for each letter • A = 0 0 1 11 0 01 0 10 0 1 11 0 • B = 1 • C = 01 • D = 10 • R = 11 • This is not really a code because it depends on the blanks • 011100101001110 Compression

  5. Consider this Tree B D A C R Compression

  6. More Formally • Start with a frequency table Compression

  7. More Formally • Create a binary tree out of the two elements with the lowest frequencies • New frequency is the sum of the frequencies • Add new node to the frequency table 2 C, 1 D, 1 Compression

  8. More Formally • Repeat until only one element is left in the table 11 6 A,5 2 4 C, 1 D, 1 B,2 R, 2 Compression

  9. Huffman Coding • The general method for finding this code was developed by D. Huffman in 1952 Compression

More Related