1 / 35

UTILITIES

UTILITIES. Group 3 Xin Li Soma Reddy. Data Compression. To reduce the size of files stored on disk and to increase the effective rate of transmission by modems. A Standard coding scheme. File Compression. Compression Reducing the number of bits required for data representation.

enoch
Télécharger la présentation

UTILITIES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UTILITIES Group 3 Xin Li Soma Reddy

  2. Data Compression To reduce the size of files stored on disk and to increase the effective rate of transmission by modems.

  3. A Standard coding scheme

  4. File Compression • Compression • Reducing the number of bits required for data representation. • Two phases • The encoding phase (compressing) • The decoding phase (uncompressing) • Strategy • Ensure that most-frequent characters have the shortest representation.

  5. A Binary Trie A left branch represents 0 and a right branch represents 1. The path to a node indicates its representation.

  6. Representation of the original code by a tree

  7. A Slightly Better Tree

  8. A Full Tree All nodes either are leaves or have two children.

  9. A Prefix Code • No character code is a prefix of another character code. • Guaranteed if the characters are only in leaves. • Can be decoded unambiguously.

  10. An Optimal Prefix Code Tree

  11. Optimal Prefix Code

  12. Huffman’s Algorithm • Constructs an optimal prefix code. • The weight of a tree is the sum of the frequencies of its leaves. • Works by repeatedly merging the two minimum weight trees.

  13. Initial Stage of Huffman’s Algorithm

  14. Huffman’s Algorithm After the First Merge

  15. Huffman’s Algorithm After the Second Merge

  16. Huffman’s Algorithm After the Third Merge

  17. Huffman’s Algorithm After the Fourth Merge

  18. Huffman’s Algorithm After the Fifth Merge

  19. Huffman’s Algorithm After the Final Merge

  20. Implementation • BitInputStream Class • BitOutputStream Class • CharCounter Class • HuffmanTree Class • Hzip Class • HZIPInputStream Class • HZIPOutputStream Class

  21. BitInputStream Class • Wraps an Inputstream and provides bit-at-a-time input • Main Methods: readBit reads one bit as a 0 or 1 getBit gets an individual bit in an 8-bit byte close closes underlying stream

  22. BitOutputStream Class • Wraps an Outputstream and provides bit-at-a-time output • Main Methods: writeBit writes one bit (0 or 1) writebits writes array of bits setBit sets an individual bit in an 8-bit byte flush flushes buffered bits close closes underlying stream

  23. CharCounter Class • Maintains character counts • Main Methods: getCount returns the number of occurrences of a character setCount sets the number of occurences of a character

  24. HuffmanTree Class (cont) • Manipulates Huffman coding trees • Main Methods: getCode obtains the code of a given character getChar obtains the character by giving a code createTree constructs the Huffman coding tree

  25. HuffmanTree Class • Main Methods: writeEncodingTable writes an encoding table to an output stream readEncodingTable reads the encoding table from an input stream

  26. Hzip Class • Main Methods: compress adds a “.huf” to the filename uncompress adds a “.uc” to the filename main

  27. HZIPInputStream Class • Contains an uncompression wrapper • Main Method: read returns an uncompressed byte from the wrapped input stream

  28. HZIPOutputStream Class • Contains a compression wrapper • Writes to HZIPOutputStream are compressed and sent to the output stream being wrapped. No writing is actually done until close. • Main Method: close

  29. Programming Project Part 1 Storing the character counts in the encoding table gives the uncompression algorithm the ability to perform extra consistency checks. Code is added to verify that the result of the uncompression has the same character counts as the encoding table claimed.

  30. Part 1 Implementation (cont) • Add several public methods In HZIPInputputStream class public HuffmanTree getTree () { return codeTree; } In HuffmanTree class public CharCounter getCharCounter() { return theCounts; }

  31. Part 1 Implementation In Hzip class , uncompress method HuffmanTree tree = hzin.getTree(); CharCounter newcc1 = tree.getCharCounter(); CharCounter newcc2 = new CharCounter(in); for (int i = 0; i < BitUtils.DIFF_BYTES; i++) { if (newcc2.getCount(i) != newcc1.getCount(i)) { System.out.println( " There is an error in the uncompressing process."); File file1 = new File(inFile); file1.delete(); } }

  32. Part 2 Check the size of the resulting compressed file and abort if the size is larger than or equal to the original.

  33. Part 2 Implementation In Hzip class, compress method File originFile = new File (inFile); File compreFile = new File (compressedFile); if (originFile.length() < compreFile.length()) { System.out.println( "The size of the resulting compressed file is larger than the original."); compreFile.delete(); return; } else if (originFile.length() == compreFile.length()) { System.out.println( "The size of the resulting compressed file is equal to the original."); compreFile.delete(); return; }

  34. Run Example To compress a text file whose size is six bytes C:\>set path=c:/j2sdk1.4.1_01/bin C:\>javac Hzip.java C:\>javac HZIPInputStream.java C:\>javac HZIPOutputStream.java C:\>java Hzip -c file1.txt The size of the resulting compressed file is larger than the original. C:\>

  35. Conclusion • Text compression is an important technique that allows us to increase both effective disk capacity and effective modem speed. It is an area of active research. • Huffman’s algorithm typically achieves compression of 25% on text files.

More Related