1 / 27

An Optimal Algorithm for Online Square Detection

An Optimal Algorithm for Online Square Detection. Gen-Huey Chen, Jin-Ju Hong, Hsueh-I Lu National Taiwan University. Outline. The definitions of the square detection problem and the online square detection problem The techniques of the algorithm in [Cro86] for the square detection problem

wyatt
Télécharger la présentation

An Optimal Algorithm for Online Square Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Optimal Algorithm for Online Square Detection Gen-Huey Chen, Jin-Ju Hong, Hsueh-I Lu National Taiwan University CPM 2005

  2. Outline • The definitions of the square detection problem and the online square detection problem • The techniques of the algorithm in [Cro86] for the square detection problem • Our algorithm for the online square detection problem • Conclusion CPM 2005

  3. Square Detection Problem • Square: a nonempty string of the form XX • E.g. “a b c a b c” is a square. “a b c a b c a” is not a square. • Input: a string S • Square detection problem: Is there a square in S? CPM 2005

  4. Online Square Detection Problem • Leung, Peng, and Ting in COCOON’04 • Input: a string S • Let m be the unknown smallest integer s.t. S[1..m] contains a square. • Online square detection problem: Determine m as soon as S[m] is read. • An O(m log2m)-time algorithm [LPT04] • An O(m logβ)-time algorithm in our paper CPM 2005

  5. Algorithm in [Cro86] forSquare Detection Problem fork = 1 top// p: # of blocks { if a square ends in Bithenreturn YES; } return NO; B1 B2 B3 B4 . . . Bp CPM 2005

  6. f-factorization • Let dk denote the starting position of the k-th block Bk. • Bk is S[dk] if S[dk] does not occur before dk, or the longest prefix of S[dk..n] that occurs before dk. 1 2 3 4 5 6 7 8 9 10 11… • E.g. S = a a a b b a b a b a a … B1B2B3B4B5B6 CPM 2005

  7. f-factorization (cont.) • A square ending in Bk is centered either in Bk-1 or in Bk. . . . Bk-1 Bk CPM 2005

  8. Square Ending in the k-th Block • Case 1. The square is entirely in the k-th block. • Case 2. The square begins in the (k-1)-st block. • Case 2.1. The square is centered in the (k-1)-st block. • Case 2.2. The square is centered in the k-th block. • Case 3. The square begins before the (k-1)-st block and centered in the (k-1)-st or k-th block. … … … … CPM 2005

  9. Our Algorithm for OnlineSquare Detection Problem fori = 1 ton// n = |S| { compute the f-factorization of S[1..i]; if a square ends at S[i] thenreturni; } return NO-SQUARE; CPM 2005

  10. Square Ending at S[i] in Bk • Case 1. The square is entirely in the k-th block. • Case 2. The square begins in the (k-1)-st block. • Case 2.1. The square is centered in the (k-1)-st block. • Case 2.2. The square is centered in the k-th block. • Case 3. The square begins before the (k-1)-st block and centered in the (k-1)-st or k-th block. … … … … CPM 2005

  11. S • L(i1, i2, i)-square: • R(i1, i2, i)-square: i1 j c i2 i i1  j < i2 i1  c < i2 S i1 j i2 c i i1  j < i2 i2  c < i CPM 2005

  12. Square Ending at S[i] in Bk • Case 1. The square is entirely in the k-th block. • Case 2. The square begins in the (k-1)-st block. • Case 2.1. The square is centered in the (k-1)-st block. • Case 2.2. The square is centered in the k-th block. • Case 3. The square begins before the (k-1)-st block and centered in the (k-1)-st or k-th block. … dk i dk-1 L(dk-1, dk, i)-square : … R(dk-1, dk, i)-square : … 1 dk-1 i R(1, dk-1, i)-square : … CPM 2005

  13. Our Algorithm for OnlineSquare Detection Problem fori = 1 ton// n = |S| { compute the f-factorization of S[1..i]; let S[i] belong to Bk; if an L(dk-1, dk, i)-square is detected thenreturni; if an R(dk-1, dk, i)-square is detected thenreturni; if an R(1, dk-1, i)-square is detected thenreturni; } return NO-SQUARE; amortized O(logβ) time CPM 2005

  14. Longest Common Extensions • For positions i1i2i3 in S • XR(i1, i2, i3): longest common right extension of positions i1 and i2 with boundary i3 1 2 3 4 5 6 7 8 9 10 • E.g. S = a b a b b a b a b a • XL(i2, i3, i1): longest common left extension of positions i2 and i3 with boundary i1 XR(3, 8, 10) = 2 XL(4, 9, 2) = 3 CPM 2005

  15. Head Extension Function: XR(1, j, i) • If the string S is read character by character, in the i-th iteration, for all ji, XR(1, j, i) can be computed in O(1) time with totally O(i)-time preprocessing. 1 2 3 4 5 6 7 8 9 10 • E.g. S = a b a b b a b a b a XR(1,j,10) 10 0 2 0 0 4 0 3 0 1 • We call XR(1, j, i) the head extension function CPM 2005

  16. L(i1, i2, i)-square S Y Z Y Z i1 j i2 i CPM 2005

  17. L(i1, i2, i)-square • [ML84] S has an L(i1, i2, i)-square if and only if there is an index j with i1j<i2 such that XR(j, i2, i) = |S[i2..i]| and XL(j-1, i2-1, i1) + XR(j, i2, i)  |S[j..i2-1]|. S Y Z Y Z i1 j i2 i S[1..i-1] contains no square. = CPM 2005

  18. Detecting L(dk-1, dk, i)-squares • Let z(j) = |S[j..dk-1]|-XL(j-1,dk-1,dk-1) for all j in Bk-1 • In the i-th iteration: is there an index j in Bk-1 s.t. XR(j, dk, i) = z(j)? S Y Z Y =Z ? dk-1 j dk i z(j) CPM 2005

  19. In the dk-th iteration (preprocessing) • Compute z(j) for all j in Bk-1 • Build the suffix tree of Bk-1$ • For all u, compute min{z(j)| j↔ a leaf in u’s subtree} S Y Z Y dk-1 j dk i z(j) u z(j) CPM 2005 O(|Bk-1|logβ) time

  20. In the i-th iteration • If |S[dk..i]| equals the value stored in u  a square ends at position i S Y Z Y =Z ? dk-1 j dk i z(j) S[dk..i] u z(j) CPM 2005

  21. R(i1, i2, i)-square S Y Z Y Z i1 i2 j i CPM 2005

  22. R(i1, i2, i)-square • [ML84] S has an R(i1, i2, i)-square if and only if there is an index j with i2<j<i such that XR(i2, j+1, i) = |S[j+1..i]| and XL(i2-1, j, i1) + XR(i2, j, i)  |S[i2..j]|. S Y Z Y Z i1 i2 j i S[1..i-1] contains no square. = CPM 2005

  23. Detecting R(dk-1, dk, i)-square • Let z(j) = |S[dk..j]|-XL(dk-1,j,dk-1) for all j in Bk • Insert the position j into the set of j+z(j) • For all j in the set of i, XR(dk, j+1, i) = z(j)? S Y Z Y =Z ? dk-1 dk j i set of j+z(j) amortized O(logβ) time z(j) insert j CPM 2005

  24. Computing XL(dk-1, j, dk-1) g • |S[g,dk-1]| = min( |S[dk-1..dk-1]|, |S[dk..j]| ) • For all v with gv<dk, XL(v, dk-1, g) can be computed in O(1) time using the technique of computing the head extension function. S Y Z Y v dk-1 dk j i CPM 2005

  25. Computing XL(dk-1, j, dk-1)(cont.) g • Let F(j) denote the longest suffix of S[dk..j] that is also a substring of S[g..dk-1] • XL(dk-1,j,dk-1) = |F(j)| if y=dk-1 min( |F(j)|, XL(y,dk-1,g) ) otherwise S Y Z Y F(j) dk-1 y dk j i CPM 2005

  26. Time Complexity fori = 1 ton// n = |S| { compute the f-factorization of S[1..i]; let S[i] belong to Bk; if an L(dk-1, dk, i)-square is detected thenreturni; if an R(dk-1, dk, i)-square is detected thenreturni; if an R(1, dk-1, i)-square is detected thenreturni; } return NO-SQUARE; amortized O(logβ) time CPM 2005

  27. Conclusion • Each of those O(logβ) terms comes from the traversal in a suffix tree of a string with O(β) distinct characters. • Expected time: O(m) • Is it possible to reduce the running time to worst-case O(m) time for a general alphabet? CPM 2005

More Related