1 / 22

M5 research group, University of Central Florida

StarNT: Dictionary-based Fast Transform. Weifeng Sun wsun@cs.ucf.edu School of Electrical Engineering and Computer Science University of Central Florida. M5 research group, University of Central Florida. 25 April 2003. Weifeng Sun. 1. Table of Contents.

heloise
Télécharger la présentation

M5 research group, University of Central Florida

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. StarNT: Dictionary-based Fast Transform Weifeng Sun wsun@cs.ucf.edu School of Electrical Engineering and Computer Science University of Central Florida M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 1

  2. Table of Contents • Preprocessing/Postproprossing Model • Star Transform • StarNT Transform • StarZip • Domain Specific Text Compression Tool • Review M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 2

  3. Current Text Compression Model • First-order Entropy Coder • Huffman (word, canonical) • Arithmetic: arbitrary precision • Statistical Models • PPM(BWT): prediction by context • DMC • Dictionary Models • LZ-family: good compression, fast M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 3

  4. Preprocessing/Postprocessing Model Preprocessor Compression Algorithm Text File Compressed File Decompression Algorithm Postprocessor M5 research group, University of Central Florida Weifeng Sun 25 April 2003 4

  5. Goal of Preprocessor • Accelerate the backend compressing algorithm • The shorter, the faster • Backend compressor oriented • More “delicious” input • Preserve some original context • Provide some “artificial” context • Universal • Text transform M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 5

  6. StarNT: Transform paradigm Transform Encoding Compression Algorithm Text File Transformed File Dictionary Compressed File Transform Encoding Decompression Algorithm M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 6

  7. Table of Contents • Preprocessing/Postproprossing Model • Star Transform • StarNT Transform • StarZip • Domain Specific Text Compression Tool • Review M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 7

  8. Example: Star-encoding Transform dictionary Input text This is a long example to demonstrate the “substitution” method. a * is ** to *a the *** long **** this ***a test ***b method ****** example ******* demonstrate *********** ***a^ ** * **** ******* *a *********** *** “substitution” ******. 100111001100000101011010011100 Lots of compression gain! M5 research group, University of Central Florida Weifeng Sun 25 April 2003 8

  9. Example: LIPT-transform Transform dictionary Input text This is a long example to demonstrate the “substitution” method. a *a is *bq to *be the *cd long *dfa this *dr test *dB method *fb example *gY demonstrate *key *dr^ *bq *a *dfa *gY *be *key *cd “substitution” *fb. 1001110011000001010110 MORE gain! M5 research group, University of Central Florida Weifeng Sun 25 April 2003 9

  10. Table of Contents • Preprocessing/Postproprossing Model • Star Transform • StarNT Transform • StarZip • Domain Specific Text Compression Tool • Review M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 10

  11. StarNT Transform • Fast Transform Encoding/Decoding • Ternary search tree • Fast Backend Compression/Decompression • Shorter transform output • Higher Compression Ratio • More efficient transform • StarZip: Multi-corpus Compression Tool M5 research group, University of Central Florida Weifeng Sun 25 April 2003 11

  12. Example: Ternary Search Tree • Hash table • Binary tree • Digital search tries • Ternary search trees Searching for a string of lengthk in a ternary search tree with nstrings will require at most O(log n+k) CHAR comparisons M5 research group, University of Central Florida Weifeng Sun 25 April 2003 12

  13. StarNT: Efficient Transform • Maintain some original context, provide new “artificial” context • Preserve word frequency information • Use word length information • Index encoding • Codeword denotes the index of the word in the dictionary • Lightning transform decoding. M5 research group, University of Central Florida Weifeng Sun 25 April 2003 13

  14. StarNT: Fast Backend Compression/Decompression • Shorter transform immediate file • The meaning of symbol ‘*’ changed! M5 research group, University of Central Florida Weifeng Sun 25 April 2003 14

  15. StarNT: Compression Performance Bzip2 –9 + StarNT Gzip –9 + StarNT PPMD (k=5) + StarNT 11.2% 16.4% 10.2% • StarNT is better than LIPT • bzip2+StarNT is better than PPMD • in time complexity • compression performance. M5 research group, University of Central Florida Weifeng Sun 25 April 2003 15

  16. StarNT: Timing Performance -- Compared with LIPT • Encoding • Decoding 76.3% 84.9% M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 16

  17. StarNT: Timing Performance -- Compared with Backend Compressor Encoding Bzip2 -9 Gzip -9 PPMD (k=5) 28.1% 50.4% 21.2% Decoding 18.6% Some Increase neglectable M5 research group, University of Central Florida Weifeng Sun 25 April 2003 17

  18. Table of Contents • Preprocessing/Postproprossing Model • Star Transform • StarNT Transform • StarZip • Domain Specific Text Compression Tool • Review M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 18

  19. StarZip: Domain Specific Dictionary • Five corpora used (from ibiblio.com) M5 research group, University of Central Florida Weifeng Sun 25 April 2003 19

  20. StarZip: Preliminary Result -- Compression Performance Bzip2 –9 + StarZip Gzip –9 + StarZip PPMD (k=5) + StarZip 13% 19% 10% M5 research group, University of Central Florida Weifeng Sun 25 April 2003 20

  21. Table of Contents • Preprocessing/Postproprossing Model • Star Transform • StarNT Transform • StarZip • Domain Specific Text Compression Tool • Review M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 21

  22. Review: Philosopy of Preprocessing /Postprocessing • Transfom th txt into som intermdiate form whic can b compresed with betr eficency. • Xploit th natral redndancy of the laguage in makng this tranformaton. M5 research group, University of Central Florida Weifeng Sun 25 April 2003 22

More Related