1 / 63

Computing Reversed Lempel-Ziv Factorization Online

Computing Reversed Lempel-Ziv Factorization Online. Shiho Sugimoto , Tomohiro I, Shunsuke Inenaga , Hideo Bannai , Masayuki Takeda Kyushu University, Japan. Outline. Reversed LZ factorization without self-references (RLZ) Online RLZ algorithm by Kolpakov and Kucherov

norah
Télécharger la présentation

Computing Reversed Lempel-Ziv Factorization Online

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing ReversedLempel-Ziv Factorization Online Shiho Sugimoto, Tomohiro I, ShunsukeInenaga,Hideo Bannai, Masayuki Takeda Kyushu University, Japan

  2. Outline • Reversed LZ factorization without self-references (RLZ) • Online RLZ algorithm by Kolpakov and Kucherov • New online RLZ algorithm using O(n log σ) bits of space • Reversed LZ factorization with self-references (RLZS) • New online RLZS algorithm using O(n log n)bits of space • New online RLZS algorithm using O(n log σ)bits of space n: the length of input string σ : the alphabet size

  3. Background • LZ factorization was proposed in 1977[Ziv & Lempel, 1977]. • data compression etc. • Reversed LZ factorization (RLZ in short) was proposed in 2009 [Kolpakov & Kucherov, 2009]. • finding gapped palindromes etc.

  4. LZ factorization without self-references [Ziv & Lempel, 1977] LZ factorization without self-references of string w of length n is a factorizations1,s2,...,smsuch that • w = s1s2…sm • siis the longest non-empty prefix ofw[|s1…si−1|+1..n]that is also a substring ofw[1.. | s1…si−1|]if such exists • si = w[|s1…si−1|+1] otherwise

  5. LZ factorization without self-references [Ziv & Lempel, 1977] LZ factorization without self-references of string w of length n is a factorizations1,s2,...,smsuch that • w = s1s2…sm • siis the longest non-empty prefix ofw[|s1…si−1|+1..n]that is also a substring ofw[1.. | s1…si−1|]if such exists • si = w[|s1…si−1|+1] otherwise s1 s2 Ex)w = a b b a a a a b b b a c

  6. LZ factorization without self-references [Ziv & Lempel, 1977] LZ factorization without self-references of string w of length n is a factorizations1,s2,...,smsuch that • w = s1s2…sm • siis the longest non-empty prefix ofw[|s1…si−1|+1..n]that is also a substring ofw[1.. | s1…si−1|]if such exists • si = w[|s1…si−1|+1] otherwise s1 s2 s3 Ex)w = a b b a a a a b b b a c

  7. LZ factorization without self-references [Ziv & Lempel, 1977] LZ factorization without self-references of string w of length n is a factorizations1,s2,...,smsuch that • w = s1s2…sm • siis the longest non-empty prefix ofw[|s1…si−1|+1..n]that is also a substring ofw[1.. | s1…si−1|]if such exists • si = w[|s1…si−1|+1] otherwise s1 s2 s3 s4 Ex)w = a b b a a a a b b b a c

  8. LZ factorization without self-references [Ziv & Lempel, 1977] LZ factorization without self-references of string w of length n is a factorizations1,s2,...,smsuch that • w = s1s2…sm • siis the longest non-empty prefix ofw[|s1…si−1|+1..n]that is also a substring ofw[1.. | s1…si−1|]if such exists • si = w[|s1…si−1|+1] otherwise s1 s2 s3 s4 s5 Ex)w = a b b a a a a b b b a c

  9. LZ factorization without self-references [Ziv & Lempel, 1977] LZ factorization without self-references of string w of length n is a factorizations1,s2,...,smsuch that • w = s1s2…sm • siis the longest non-empty prefix ofw[|s1…si−1|+1..n]that is also a substring ofw[1.. | s1…si−1|]if such exists • si = w[|s1…si−1|+1] otherwise s1 s2 s3 s4 s5 s6 Ex)w = a b b a a a a b b b a c

  10. LZ factorization without self-references [Ziv & Lempel, 1977] LZ factorization without self-references of string w of length n is a factorizations1,s2,...,smsuch that • w = s1s2…sm • siis the longest non-empty prefix ofw[|s1…si−1|+1..n]that is also a substring ofw[1.. | s1…si−1|]if such exists • si = w[|s1…si−1|+1] otherwise s1 s2 s3 s4 s5 s6 s7 Ex)w = a b b a a a a b b b a c

  11. LZ factorization without self-references [Ziv & Lempel, 1977] LZ factorization without self-references of string w of length n is a factorizations1,s2,...,smsuch that • w = s1s2…sm • siis the longest non-empty prefix ofw[|s1…si−1|+1..n]that is also a substring ofw[1.. | s1…si−1|]if such exists • si = w[|s1…si−1|+1] otherwise s1 s2 s3 s4 s5 s6 s7 s8 s9 Ex)w = a b b a a a a b b b a c

  12. Reversed LZ factorizationwithout self-references (RLZ) [Kolpakov & Kucherov, 2009] RLZ without self-references of string wof lengthnis a factorizationf1,f2,...,fmsuch that • w = f1 f2…fm • fiis the longest non-empty prefix ofw[|f1...fi−1|+1..n]that is also a substring ofw[1.. | f1...fi−1|]Rif such exists • fi = w[|f1...fi−1|+1] otherwise reversed

  13. Reversed LZ factorizationwithout self-references (RLZ) [Kolpakov & Kucherov, 2009] RLZ without self-references of string wof lengthnis a factorizationf1,f2,...,fmsuch that • w = f1 f2…fm • fiis the longest non-empty prefix ofw[|f1...fi−1|+1..n]that is also a substring ofw[1.. | f1...fi−1|]Rif such exists • fi = w[|f1...fi−1|+1] otherwise f1 f2 Ex)w = a b b a a a a b b b a c reversed

  14. Reversed LZ factorizationwithout self-references (RLZ) [Kolpakov & Kucherov, 2009] RLZ without self-references of string wof lengthnis a factorizationf1,f2,...,fmsuch that • w = f1 f2…fm • fiis the longest non-empty prefix ofw[|f1...fi−1|+1..n]that is also a substring ofw[1.. | f1...fi−1|]Rif such exists • fi = w[|f1...fi−1|+1] otherwise f1 f2 f3 Ex)w = a b b a a a a b b b a c reversed

  15. Reversed LZ factorizationwithout self-references (RLZ) [Kolpakov & Kucherov, 2009] RLZ without self-references of string wof lengthnis a factorizationf1,f2,...,fmsuch that • w = f1 f2…fm • fiis the longest non-empty prefix ofw[|f1...fi−1|+1..n]that is also a substring ofw[1.. | f1...fi−1|]Rif such exists • fi = w[|f1...fi−1|+1] otherwise f1 f2 f3 f4 Ex)w = a b b a a a a b b b a c reversed

  16. Reversed LZ factorizationwithout self-references (RLZ) [Kolpakov & Kucherov, 2009] RLZ without self-references of string wof lengthnis a factorizationf1,f2,...,fmsuch that • w = f1 f2…fm • fiis the longest non-empty prefix ofw[|f1...fi−1|+1..n]that is also a substring ofw[1.. | f1...fi−1|]Rif such exists • fi = w[|f1...fi−1|+1] otherwise f1 f2 f3 f4 f5 Ex)w = a b b a a a a b b b a c reversed

  17. Reversed LZ factorizationwithout self-references (RLZ) [Kolpakov & Kucherov, 2009] RLZ without self-references of string wof lengthnis a factorizationf1,f2,...,fmsuch that • w = f1 f2…fm • fiis the longest non-empty prefix ofw[|f1...fi−1|+1..n]that is also a substring ofw[1.. | f1...fi−1|]Rif such exists • fi = w[|f1...fi−1|+1] otherwise f1 f2 f3 f4 f5 f6 f7 Ex)w = a b b a a a a b b b a c reversed

  18. KK algorithm [Kolpakov & Kucherov, 2009] • Computes RLZ in an online manner • Works inO(n log n) bits of space andO(n log σ) time (on a word RAM model). • Constructs suffix tree for reversed prefixes online. • Computes RLZ factors from suffix tree. • Blumer’s version of Weiner’s algorithm achieves above complexity [Blumer et al, 1985] [Weiner, 1973].

  19. KK algorithm [Kolpakov & Kucherov, 2009] f1 Ex)w = a b b a a a a b b b a c Stree(ε)

  20. KK algorithm [Kolpakov & Kucherov, 2009] f1 f2 Ex)w = a b b a a a a b b b a c Stree(aR) a

  21. KK algorithm [Kolpakov & Kucherov, 2009] f1 f2 Ex)w = a b b a a a a b b b a c Stree((ab)R) b a a

  22. KK algorithm [Kolpakov & Kucherov, 2009] f1 f2 f3 Ex)w = a b b a a a a b b b a c Stree((ab)R) b a a

  23. KK algorithm [Kolpakov & Kucherov, 2009] f1 f2 f3 f4 Ex)w = a b b a a a a b b b a c Stree((abba)R) a b b b a b a a

  24. KK algorithm [Kolpakov & Kucherov, 2009] f1 f2 f3 f4 f5 Ex)w = a b b a a a a b b b a c Stree((aabba)R) a b a b b b b a a a b a

  25. KK algorithm [Kolpakov & Kucherov, 2009] f1 f2 f3 f4 f5 Ex)w = a b b a a a a b b b a c Stree((aabba)R) This suffix tree requires O(n log n) bits of space a b a b b b b a a a b a We propose a new online RLZ algorithm which uses only O(n log σ) bits of space. (σ≦n is the alphabet size)

  26. ForO(n log σ) bits of space • We utilize the idea of Starikovskaya’s algorithm. • It computes LZ factorization online in O(n log σ) bits of space and O(n log2n)time [Starikovskaya, 2012]. • We divide input string into blocks of lengthr= O(logσn). • Each block is replaced by a meta-character.

  27. ForO(n log σ) bits of space • We utilize the idea of Starikovskaya’s algorithm. • It computes LZ factorization online in O(n log σ) bits of space and O(n log2n)time [Starikovskaya, 2012]. • We divide input string into blocks of lengthr= O(logσn). • Each block is replaced by a meta-character. Ex)w = a b b a a a a b b b a c ……… r = 3 • B A B C ………

  28. ForO(n log σ) bits of space • We utilize the idea of Starikovskaya’s algorithm. • It computes LZ factorization online in O(n log σ) bits of space and O(n log2n)time [Starikovskaya, 2012]. • We divide input string into blocks of lengthr= O(logσn). • Each block is replaced by a meta-character. Ex)w = a b b a a a a b b b a c ……… r = 3 • B A B C ………

  29. Our online RLZ algorithm • For fiof length shorter than r, we use suffix trie of reversed subwords of length 2r. • can find fi in o(n) bits of space and O(|fi| log σ) time. • For fi of length at least r, we use suffix tree of reversed blocks (meta-characters). • can find fi in O(n log σ)bits of space and O(|fi| log2n) time.

  30. Our online RLZ algorithm • For fiof length shorter than r, we use suffix trie of reversed subwords of length 2r. • can find fi in o(n) bits of space and O(|fi| log σ) time. • For fi of length at least r, we use suffix tree of reversed blocks (meta-characters). • can find fi in O(n log σ)bits of space and O(|fi| log2n) time. Theorem We can compute RLZ without self-references online in O(n log σ)bits of space and O(nlog2n) time.

  31. Outline • Reversed LZ factorization without self-references (RLZ) • Online RLZ algorithm by Kolpakov and Kucherov • New online RLZ algorithm using O(n log σ) bits of space • Reversed LZ factorization with self-references (RLZS) • New online RLZS algorithm using O(n log n)bits of space • New online RLZS algorithm using O(n log σ)bits of space n: the length of input string σ : the alphabet size

  32. LZ factorization withself-references [Ziv & Lempel, 1977] LZ factorization with self-references of string w of length nis a factorizationt1,t2,...,tmsuch that • w = t1t2…tm • tiis the longest non-empty prefix of w[|t1…ti−1|+1..n] that is also a substring ofw[1.. |t1…ti|-1]if such exists • ti= w[|t1…ti−1|+1]otherwise. self-reference

  33. LZ factorization withself-references [Ziv & Lempel, 1977] LZ factorization with self-references of string w of length nis a factorizationt1,t2,...,tmsuch that • w = t1t2…tm • tiis the longest non-empty prefix of w[|t1…ti−1|+1..n] that is also a substring ofw[1.. |t1…ti|-1]if such exists • ti= w[|t1…ti−1|+1]otherwise. self-reference t1 t2 t3 Ex)w = a b b a a a a b b b a c

  34. LZ factorization withself-references [Ziv & Lempel, 1977] LZ factorization with self-references of string w of length nis a factorizationt1,t2,...,tmsuch that • w = t1t2…tm • tiis the longest non-empty prefix of w[|t1…ti−1|+1..n] that is also a substring ofw[1.. |t1…ti|-1]if such exists • ti= w[|t1…ti−1|+1]otherwise. self-reference t1 t2 t3 t4 Ex)w = a b b a a a a b b b a c

  35. LZ factorization withself-references [Ziv & Lempel, 1977] LZ factorization with self-references of string w of length nis a factorizationt1,t2,...,tmsuch that • w = t1t2…tm • tiis the longest non-empty prefix of w[|t1…ti−1|+1..n] that is also a substring ofw[1.. |t1…ti|-1]if such exists • ti= w[|t1…ti−1|+1]otherwise. self-reference t1 t2 t3 t8 t4 t5 t6 t7 Ex)w = a b b a a a a b b b a c

  36. Reversed LZ factorizationwithself-references RLZ with self-references (RLZS) of string w of length nis a factorizationg1,g2,...,gmsuch that • w = g1g2…gm • giis the longest non-empty prefix of w[|g1...gi−1|+1..n] that is also a substring ofw[1.. |g1…gi|-1]Rif such exists • gi= w[|g1…gi−1|+1]otherwise. self-reference

  37. Reversed LZ factorizationwithself-references RLZ with self-references (RLZS) of string w of length nis a factorizationg1,g2,...,gmsuch that • w = g1g2…gm • giis the longest non-empty prefix of w[|g1...gi−1|+1..n] that is also a substring ofw[1.. |g1…gi|-1]Rif such exists • gi= w[|g1…gi−1|+1]otherwise. self-reference g1 g2 Ex)w = a b b a a a a b b b a c

  38. Reversed LZ factorizationwithself-references RLZ with self-references (RLZS) of string w of length nis a factorizationg1,g2,...,gmsuch that • w = g1g2…gm • giis the longest non-empty prefix of w[|g1...gi−1|+1..n] that is also a substring ofw[1.. |g1…gi|-1]Rif such exists • gi= w[|g1…gi−1|+1]otherwise. self-reference g1 g3 g2 Ex)w = a b b a a a a b b b a c

  39. Reversed LZ factorizationwithself-references RLZ with self-references (RLZS) of string w of length nis a factorizationg1,g2,...,gmsuch that • w = g1g2…gm • giis the longest non-empty prefix of w[|g1...gi−1|+1..n] that is also a substring ofw[1.. |g1…gi|-1]Rif such exists • gi= w[|g1…gi−1|+1]otherwise. self-reference g1 g3 g2 g4 g5 Ex)w = a b b a a a a b b b a c

  40. online computation of RLZS Ex)w= a b b a a a a b b b a c w[1..1] = a

  41. online computation of RLZS Ex)w = a b b a a a a b b b a c w[1..1] = a w[1..2]= a b

  42. online computation of RLZS Ex)w = a b b a a a a b b b a c w[1..1] = a w[1..2]= a b w[1..3]= a b b

  43. online computation of RLZS Ex)w = a b b a a a a b b b a c w[1..1] = a w[1..2]= a b w[1..3]= a b b w[1..4]= a b b a

  44. online computation of RLZS Ex)w = a b b a a a a b b b a c w[1..1] = a w[1..2]= a b w[1..3]= a b b w[1..4]= a b b a

  45. online computation of RLZS Ex)w = a b b a a a a b b b a c w[1..1] = a w[1..2]= a b w[1..3]= a b b w[1..4]= a b b a w[1..5]= a b b a a w[1..6]= a b b a a a w[1..7]= a b b a a a a w[1..8]= a b b a a a a b w[1..9]= a b b a a a a b b w[1..10]= a b b a a a a b b b w[1..11]= a b b a a a a b b b a w[1..12]= a b b a a a a b b b a c

  46. Reversed LZ factorizationwithself-references Every self-referencing factor is a suffix of a palindrome. g1 g3 g2 g4 g5 Ex)w = a b b a a a a b b b a c palindrome

  47. Reversed LZ factorizationwithself-references Every self-referencing factor is a suffix of a palindrome. g1 g3 g2 g4 g5 Ex)w = a b b a a a a b b b a c palindrome

  48. online RLZS in O(nlogn) bits of space We can compute each RLZS factor giby • using KK algorithm, and • In a total of O(n log n)bits of space andO(n log σ)time. • computing the longest palindrome which ends at each position, online • In a total of O(n log n) bits of space and O(n)time, by modifying Manachar’s algorithm [Manacher, 1975]. Theorem We can compute RLZS online in O(n log n) bits of space andO(n logσ) time.

  49. Outline • Reversed LZ factorization without self-references (RLZ) • Online RLZ algorithm by Kolpakov and Kucherov • New online RLZ algorithm using O(n log σ) bits of space • Reversed LZ factorization with self-references (RLZS) • New online RLZS algorithm using O(n log n)bits of space • New online RLZS algorithm using O(n log σ)bits of space n: the length of input string σ : the alphabet size

  50. Suffix palindromes • All suffix palindromes of a string of length n can be presented by O(log n) arithmetic progressions [Apostolico,1995].

More Related