1 / 114

Advisor: Prof. R. C. T. Lee Speaker: G. W. Cheng

Two exact string matching algorithms using suffix to prefix rule. Advisor: Prof. R. C. T. Lee Speaker: G. W. Cheng. Speeding up on two string matching algorithms. Algorithmica, Vol.12 , 1994, pp. 247-267

rico
Télécharger la présentation

Advisor: Prof. R. C. T. Lee Speaker: G. W. Cheng

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Two exact string matching algorithms using suffix to prefix rule Advisor: Prof. R. C. T. Lee Speaker: G. W. Cheng

  2. Speeding up on two string matching algorithms Algorithmica, Vol.12, 1994, pp.247-267 CROCHEMORE, M., CZUMAJ, A., GASIENIEC, L., JAROMINEK, S., LECROQ, T., PLANDOWSKI, W. and RYTTER, W.

  3. Problem Definition: We are given a text string and a pattern string and we want to find all occurrences of P in T.

  4. Consider the following example: There are two occurrences of P in T as shown below:

  5. Rule 1: The Suffix to Prefix Rule • For a window to have any chance to match a pattern, in some way, there must be a suffix of the window which is equal to a prefix of the pattern. T P

  6. Basic Ideas Open a window W with size |P| in the text. W T |P| p • Find the longest suffix of W is also the prefix of pattern. Case 1: W T |P| p Match!

  7. Case 2: W T |P| p W T |P| p Case 3: If there is no such suffix, we move W withlength |P|. W T |P| |P| p

  8. Preprocessing phase • T=GCATCGGCGAGAGTATACAGTACG  • P=GCAGAGAG We construct the suffix automaton of P. C Suffix Automaton A G C A G G G A 8 7 6 5 4 3 2 1 0 C A C

  9. Preprocessing: Construct a Suffix Tree of the reverse of Pattern PR: the reversal string of P. 1 2 4 7 3 8 6 5

  10. When there is a match, how do we move the window? T P

  11. T P

  12. Find the longest suffix of W is also the prefix of pattern. T P

  13. T P

  14. A Whole Example • T=GCATCGCAGAGAGTATACAGTACG  • P=GCAGAGAG • First attempt : T P Shift by: 5 (8 - 3)

  15. Second attempt : T P Shift by: 7 (8 - 1)

  16. Third attempt: T P Shift by: 7 (8 - 1)

  17. Third attempt: T P

  18. Conclusion • Preprocessing phase is O(m). • Searching phase is O(mn).

  19. Reference • [A90]Algorithms for finding patterns in strings, A. V. Aho, Handbook of Theoretical Computer Science, Vol. A, Elsevier, Amsterdam, 1990, pp.255-300. • [A85]The myriad virtues of suffix trees, Apostolico, A., Combinatorial Algorithms on words, NATO Advanced Science Institutes, Series F, Vol. 12, 1985, pp.85-96 • [AG86]The Boyer-Moore-Galil string searching strategies revisited, Apostolico, A. and Giancarlo, R., SIAM, Comput. 15, 1986, pp98-105. • [BR92]Average running time of the Boyer-Moore-Horspool algorithm, Baeza-Yates, R. A. and Regnier, M. Theoret. Comput. Sci., 1992, pp.19-31. • [BKR91]Analysis of algorithms and Data Structures, Banachowski, L., Kreczmar, A. and Rytter, W., Addison-Wesley. Reading, MA,1991.

  20. Speeding up on two string matching algorithms Algorithmica, Vol.12, 1994, pp.247-267 CROCHEMORE, M., CZUMAJ, A., GASIENIEC, L., JAROMINEK, S., LECROQ, T., PLANDOWSKI, W. and RYTTER, W.

  21. A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching In Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science 1448, Springer-Verlag, Berlin, 14-31, 1998. NAVARRO G., RAFFINOT M.,

  22. Problem Definition: We are given a text string and a pattern string and we want to find all occurrences of P in T.

  23. This algorithm compares the pattern P with T within a sliding window. And the sliding window slides from left to right. Example: Text : ABDAACDGAEEGGGGJJ Pattern : ACDAAC sliding window

  24. Example: Text : ABDAACDGAEEGGGGJJ Pattern : ACDAAC sliding window

  25. Example: Text : ABDAACDGAEEGGGGJJ Pattern : ACDAAC sliding window

  26. Basic idea • In this algorithm, we want to find the longest prefix of the pattern which is equal to the suffix of the window.

  27. Example: Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD We want to find the suffix of “BDDCACDAD” which is a longest prefix of the pattern.

  28. Example: Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD We find all substrings ”D” in the pattern.

  29. Example: Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD ACDADCEAD ACDADCEAD Actually, it means that we compare the windows as above.

  30. Example: Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD mismatch Then we try to find out all substrings ”AD” in the pattern.

  31. Example: Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD We succeed in finding all substrings ”AD” in the pattern.

  32. Example: Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD mismatch We try to find out all substrings ”DAD” in the pattern.

  33. Example: Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD We find all substrings ”DAD” in the pattern.

  34. Example: Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD We try to find all substrings ”CDAD” in the pattern.

  35. Example: Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD We try to find all substrings ”ACDAD” in the pattern.

  36. Example: Text : ABDDCACDADEGGGGJJ Pattern : ACDADCEAD We can align the pattern and the text with the longest prefix of the pattern to the suffix of the window.

  37. Why do we want to find the longest suffix of the text in the sliding window which is also a prefix of pattern? We will explain this by the following idea.

  38. Case 1: u is not a prefix of P, and no prefix of P is equal to the suffix of the window. u T: u P: u:

  39. So, we can shift the pattern as below. u T: u P: u:

  40. Example: Text : ABDDCCDDADEGGGGJJ Pattern : ACDADCEAD P must be shifted in such a way to avoid comparing any part of P with “DDAD”.

  41. Example: Text : ABDDCCDDADEGGGGJJ Pattern : ACDADCEAD So, we can shift the pattern as above.

  42. Case 2: u is not a prefix of P. u T: u P: u:

  43. But a suffix v of the window of T may be a prefix of P. u T: v u P: v v : u:

  44. So, we can shift pattern as below. u T: v u P: v : u:

  45. Example: Text: ABCABCABA Pattern: CABBCAD “BCA” is a the longest suffix of “ABCABCA” which is also a substring of pattern “CA” is a suffix of “BCA” which is a prefix of the pattern.

  46. Example: Text: ABCABCABA Pattern: CABBCAD So we can shift as above.

  47. The idea that we explained above is the main idea of this algorithm, and we will use bit-parallel method to implement this algorithm.

  48. Here, we explain how to use bit-parallel to find the substring of a pattern which is equal to a suffix of the window. Example: Text: ABCABCCBA ,∑={A,B,C} Pattern: ACBCCBB

  49. Example: Text: ABCABCCBA Pattern: ACBCCBB For every character exists in both Text and Pattern, we build: Pattern: ACBCCBB A: 1000000 B: 0010011 C: 0101100 others: 0000000

  50. Example: Text: ABCABCCBA Pattern: ACBCCBB Pattern: ACBCCBB A: 1000000 B: 0010011 C: 0101100 other: 0000000 D: 1111111 We use a mask D to record some information.

More Related