A Simpler Analysis of Burrows-Wheeler Based Compression

1. A Simpler Analysis of Burrows-Wheeler Based Compression Haim Kaplan Shir Landau Elad Verbin

2. Our Results Improve the bounds of one of the main BWT based compression algorithms New technique for worst case analysis of BWT based compression algorithms using the Local Entropy Interesting results concerning compression of integer strings

3. The Burrows-Wheeler Transform(1994) Given a string S the Burrows-Wheeler Transform creates a permutation of S that is locally homogeneous.

4. Empirical Entropy - Intuition H0(s): Maximum compression we can get without context information where a fixed codeword is assigned to each alphabet character (e.g.: Huffman code ) Hk(s): Lower bound for compression with order-k contexts � the codeword representing each symbol depends on the k symbols preceding it Traditionally, compression ratio of compression algorithms measured using Hk(s)

5. History The Main Burrows-Wheeler Compression Algorithm (Burrows, Wheeler 1994):

6. MTF

7. Main Bounds (Manzini 1999) gk is a constant dependant on the context k and the size of the alphabet these are worst-case bounds

8. Now we are ready to begin�

9. Some Intuition� The more the contexts are similar in the original string, the more its BWT will exhibit local similarity� The more local similarity found in the BWT of the string the smaller the numbers we get in MTF� ? We want a statistic that measures local similarity in a string and specifically in the BWT of the string ? The solution: Local Entropy

10. The Local Entropy- Definition We define: given a string s = �s1s2�sn� The local entropy of s: (Bentley, Sleator, Tarjan, Wei, 86)

11. The Local Entropy - Definition Note: LE(s) = number of bits needed to write the MTF sequence in binary. Example: MTF(s)= 311 ? LE(s) = 4 ? MTF(s) in binary = 1111

12. The Local Entropy � Properties We use two properties of LE: The entropy hierarchy Convexity

13. The Local Entropy � Property 1 The entropy hierarchy: We prove: For each k: LE(BWT(s)) = nHk(s) + O(1) ? Any upper bound that we get for BWT with LE holds for Hk(s) as well.

14. The Local Entropy � Properties 2 Convexity: ? This means that a partition of a string s does not improve the Local Entropy of s.

15. Convexity Cutting the input string into parts doesn�t influence much: Only positions per part

16. Convexity � Why do we need it? Ferragina, Giancarlo, Manzini and Sciortino, JACM 2005:

17. Using LE and its properties we get our bounds Theorem: For every where

18. Our bounds We get an improvement of the known bounds: As opposed to the known bounds (Manzini, 1999):

19. Our Test Results

20. How is LE related to compression of integer sequences? We mentioned �dream world� but what about reality? How close can we come to ? Problem: Compress an integer sequence S close to its sum of logs: Notice for any s:

21. Compressing Integer Sequences Universal Encodings of Integers: prefix-free encoding for integers (e.g. Fibonacci encoding, Elias encoding). Doing some math, it turns out that order-0 encoding is good. Not only good: It is best!

22. The order-0 math Theorem: For any string s of length n over the integer alphabet {1,2,�h} and for any , Strange conclusion� we get an upper-bound on the order-0 algorithm with a phrase dependant on the value of the integers. This is true for all strings but is especially interesting for strings with smaller integers.

23. A lower bound for SL Theorem: For any algorithm A and for any , and any C such that C < log(?(�)) there exists a string S of length n for which: |A(S)| > ��SL(S) + C�n

24. Our Results - Summary New improved bounds for BWMTF Local Entropy (LE) New bounds for compression of integer strings

25. Open Issues We question the effectiveness of . Is there a better statistic?

26. Any Questions?

29. Example

30. Example , cont.

31. Example, cont. Assign 0 to left branches, 1 to right branches Each encoding is a path from the root

32. The Burrows-Wheeler Transform (1994)

33. Suffix Arrays and the BWT

A Simpler Analysis of Burrows-Wheeler Based Compression

A Simpler Analysis of Burrows-Wheeler Based Compression

Presentation Transcript

Content Based Compression

DNA Sequence Compression using the Burrows-Wheeler Transform

Burrows Wheeler Transform In Image Compression

Cluster-Based Delta Compression of a Collection of Files

Wavelet-based Image Compression

Concrete Compression Analysis

A Compression-Based Model of Musical Learning

LARRY BURROWS!

Biostatistics-Lecture 16 Sequence alignment based on Burrows-Wheeler Transformation

Truly Parallel Burrows-Wheeler Compression and Decompression

Nick Burrows

Amanda Burrows

Wavelet Based Color Compression

The Burrows-Wheeler Transform: Theory and Practice

Lecture 17: Suffix Arrays and Burrows Wheeler Transforms

Optimal Partitions of Strings: A new class of Burrows-Wheeler Compression Algorithms

Burrows Wheeler Transform

Burrows Wheeler Transform

Combinatorial aspects of the Burrows-Wheeler transform

Blockwise Suffix Sorting for Space-Efficient Burrows-Wheeler

Burrows Wheeler Transform

Context-based Data Compression