1 / 24

String Matching of Bit Parallel Suffix Automata

String Matching of Bit Parallel Suffix Automata. Suffix Automata. Base on a Deterministic Acyclic Word Graph (DAWG) To facilitate comparing equivalence suffix string Nondeterministic suffix automata Deterministic suffix automata. Subset Construction.

shelagh
Télécharger la présentation

String Matching of Bit Parallel Suffix Automata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. String Matching of Bit Parallel Suffix Automata

  2. Suffix Automata • Base on a Deterministic Acyclic Word Graph (DAWG) • To facilitate comparing equivalence suffix string • Nondeterministic suffix automata • Deterministic suffix automata Subset Construction

  3. Also called Backward Deterministic automata Matching (BDM) Build the factor x for pattern p endpos(x) set of all the pattern position where an occurrence of x ends Ex: Pattern = baabbaa, endpos(aa) = {3,7} Safe shift, if no equivalent suffix in pattern Suffix Automata Search Text: shift left to right Windows size = pattern length Fail to matching a factor Shift window

  4. BDM Algorithm Build automata Reached the final state

  5. Suffix Automata Search Example 1. Build Reverse Deterministic Suffix Automata 2. endpos(x) to find a factor 3. Fail to find a factor, do a safe shift

  6. Suffix Automata Search Example 1. T= [abbaba a ]bbaab a is a factor of pr and a reverse prefix of p. last =6 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b

  7. Suffix Automata Search Example 2. T= [abbab aa ]bbaab aa is a factor of pr and a reverse prefix of p. last =5 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b

  8. Suffix Automata Search Example 3. T= [abba baa ]bbaab aab is a factor of pr 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b

  9. Suffix Automata Search Example 4. T= [abb abaa ]bbaab We fail to recognize the next a.So we shift the window to last. We search again in position:T= abbab[aabbaab] . last=7 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b

  10. Suffix Automata Search Example 5. T= abbab[aabbaa b ] b is a factor of pr 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b

  11. Suffix Automata Search Example 6. T= abbab[aabba ab ] ba is a factor of pr 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b

  12. Suffix Automata Search Example 7. T= abbab[aabb aab ] baa is a factor of pr and a reverse prefix of p. last =4 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b

  13. Suffix Automata Search Example 8. T= abbab[aab baab ] baab is a factor of pr 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b

  14. Suffix Automata Search Example 9. T= abbab[aa bbaab ] baabb is a factor of pr 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b

  15. Suffix Automata Search Example 10. T= abbab[a abbaab ] baabba is a factor of pr 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b

  16. Suffix Automata Search Example 11. T= abbab[ aabbaab ] We recognize the word aabbaab and report an occurrence. 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b

  17. BNDM Algorithm • Backward Nondeterministic Dawg Matching (BNDM) • Handle class, multiple pattern, and allow errors • Using bit parallelism, Combine Shift-Or and BDM • Faster than BDM 20% ~ 25%, Faster than BM 10% ~ 40% • Update Function

  18. BNDM Algorithm

  19. BNDM Example

  20. BNDM Example

  21. BNDM Further Improvement • Handle long pattern • Partition pattern p into subpatterns pi • Build a array of D and B, process each part with basic algorithm • If pi is found, than process pi+1 … • Handle Class • Modified B table only • Have the ith bit set for all chars belonging to ith position in pattern • Multiple Pattern • Two method • Interleave patterns, shift r bit for each D update • Just concatenate, shift 1 bit, but modifed D = (D<<1) &(1m-10)r • Where r is # of patterns • Approximate Matching • Use Wu’s method

  22. Performance Comparison In 1/100 of second per megabyte

  23. Reference • Gonzalo Navarro and Mathieu Raffinot. A Bit-parallel approach to Suffix Automata: Fast Extended String Matching. In M. Farach (editor), Proc. CPM'98, LNCS 1448. Pages 14-33, 1998. • Gonzalo Navarro, Mathieu Raffinot, Fast and Flexible String Matching by Combining Bit-parallelism and Suffix Automata (1998)

  24. Rreverse Pattern ?

More Related