90 likes | 100 Vues
File Compression. Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take up less space on the disk We can save space by taking advantage of the fact that most files have a relatively low “information content”.
E N D
File Compression • Even though disks have gotten bigger, we are still running short on disk space • A common technique is to compress files so that they take up less space on the disk • We can save space by taking advantage of the fact that most files have a relatively low “information content” Compression
Run Length Encoding • The simplest type of redundancy in a file is long runs of repeated characters • AAAABBBAABBBBBCCCCCCCC • This string can be represented more compactly by replacing each repeated string with a single occurrence of the character and a count • 4A3B2A5B8C • For binary files a refined version of this method can yield dramatic savings Compression
Variable Length Encoding • Suppose we wish to encode • ABRACADABRA • Instead of using the standard 8 (or 16) bits to represent these letters, why not use 3? • A = 000 000001100000010000011000001100000 • B = 001 • C = 010 • D = 011 • R = 100 Compression
We can do better!! • Why use the same number of bits for each letter • A = 0 0 1 11 0 01 0 10 0 1 11 0 • B = 1 • C = 01 • D = 10 • R = 11 • This is not really a code because it depends on the blanks • 011100101001110 Compression
Consider this Tree B D A C R Compression
More Formally • Start with a frequency table Compression
More Formally • Create a binary tree out of the two elements with the lowest frequencies • New frequency is the sum of the frequencies • Add new node to the frequency table 2 C, 1 D, 1 Compression
More Formally • Repeat until only one element is left in the table 11 6 A,5 2 4 C, 1 D, 1 B,2 R, 2 Compression
Huffman Coding • The general method for finding this code was developed by D. Huffman in 1952 Compression