1 / 9

An Overview of Different Compression Algorithms

An Overview of Different Compression Algorithms. Their application on compressing inverted files. Alternative Compression Algorithms. Arithmetic coding Huffman coding Character-based Word-based Dictionary-based coding – Ziv-Lempel family of coding. Pros and Cons of Different Algorithms.

albert
Télécharger la présentation

An Overview of Different Compression Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Overview of Different Compression Algorithms Their application on compressing inverted files

  2. Alternative Compression Algorithms • Arithmetic coding • Huffman coding • Character-based • Word-based • Dictionary-based coding – Ziv-Lempel family of coding

  3. Pros and Cons of Different Algorithms

  4. Choosing an Compression Algorithm for inverted files • Factors need to be considered • Compression ratio • Speed • Random access • In modern IR system, Word-based Huffman coding is commonly used • There are a lot of research on Ziv-Lempel family coding to see if they can be applied to indices compression

  5. An Improved Sliding-window Ziv-Lempel Algorithm • Conventional LZ family compression algorithms use a sliding window approach. • Based on longest matching length (m-length) • An improved sliding window LZ algorithm is proposed by Bender and Wolf. • Instead of m-length, the improved algorithm is based on the offset of the length (o-length) and the differential of the length (-length)

  6. Benefits of the Improved Algorithm • Better compression ratio in the experiment • Still linear compression and searching: O(n). • It didn’t really provide an LZ algorithm that support random access.

  7. Another Modified LZ algorithm • Proposed by Williams • Use literal/copy item; • Each step, transmit original if it is a literal item, a pointer if it is a copy item; • Aimed at faster compression speed and smaller memory footprint. • Better used in the embedded system where real-time compression is required. • Inappropriate for index compression.

  8. Conclusion • Up to date, the best practical compression algorithm for index is still word-based Huffman coding. • There are theoretical studies about Ziv-Lempel family coding. Non of them are practically applicable to our problem. But they can be used in other areas.

  9. Reference • An Improved Data Compression Algorithm Based on Ziv-Lempel Data Compression Algorithm, Paul Edward Bender and Jack Keil Wolf; • An Extremely Fast Ziv-Lempel Data Compression Algorithm, Ross N. Williams; • Modern Information Retrieval, Ricardo Baeza-Yates and Berthier Ribeiro-Neto;

More Related