1 / 56

Reverse Colussi algorithm

Reverse Colussi algorithm. Fastest pattern matching in strings, Colussi, L. Journal of Algorithms, Vol. 16 , No. 2, 1994, pp.163-189 Advisor: Prof. R. C. T. Lee Speaker: Y. K. Shie.

snow
Télécharger la présentation

Reverse Colussi algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reverse Colussi algorithm Fastest pattern matching in strings, Colussi, L. Journal of Algorithms, Vol. 16 , No. 2, 1994, pp.163-189 Advisor: Prof. R. C. T. Lee Speaker: Y. K. Shie

  2. The Reverse Colussi Algorithm is an algorithm which solves the string matching problem and it is in the spirit of the original Colussi Algorithm..

  3. The Main Points of the Reverse Colussi Algorithm 1. It changes the bad character rule from matching one character to matching a pair of characters. • Reverse Colussi algorithm divides the position into special position and non-special position. Special position allow smaller number of jump. • The Reverse Colussi Algorithm processes the special position first.

  4. Note that the Colussi Algorithm does not consider all of the positions where the prefix function assumes value -1. That this can be done can be seen by the following fact: The position where prefix function assumes -1 allows the largest number of steps to shift. Thus the Colussi Algorithm examines all positions which allow smaller number of steps of shift which is a safe action.

  5. In this Reverse Colussi Algorithm, we define some points which are special and some points which are not special. Special points allow smaller number of steps to shift than non-special points. Thus, in the Reverse Colussi Algorithm, we examine the special positions first. We shall make this clear later.

  6. Ti is the ith character in T (1≦i≦n). Pj are the jth character in P (1≦j≦m). The bad character rule is like the Rule 2-1, Character Matching Rule.

  7. Rule 2-1: Character Matching Rule(A Special Version of Rule 2) • For any character x in T, find the nearest x in P which is to the left of x in T.

  8. Implication of Rule 2-1 • Case 1. If there is an x in P to the left of T, move P so that the two x’s match.

  9. Case 2: If no such an x exists in P, consider the partial window defined by x in T and the string to the left of it.

  10. Consider the following case where the last character X of the window of T does not match with the last character of P. rcBc table

  11. Suppose we successfully find an X in P as shown below: rcBc table

  12. rcBc table Then we can move P as shown as below:

  13. rcBc table Suppose the last character Y of the window of T does not match with the last character of P as shown below:

  14. rcBc table Then we try to find a pair of X and Y in P such that after we move P, these X and Y in P match with the X and Y in T.

  15. Thus, the Reverse Colussi Algorithm uses a very special version of Rule 2: a pair of characters.

  16. How do we find this pair of characters in P? We use the rcBc Table.

  17. rcBc table Y is the last character of the windows of T. s is the length which we shift in last step. k is an integer. case 1: If we can find Pm-k-1=Y and Pm-k-s-1=Pm-s-1, we fill the minimal k into rcBc[Y, s]. case 2: If we can find Pm-k-1=Y and k>m-s-1, we fill the minimal k into rcBc[Y, s]. case 3: Otherwise, we fill the m into rcBc[Y, s].

  18. Y = A ex: s = 1: X = A XY = AA does not exist in P. rcBc[Y, 1] = 8

  19. Y = A 5 ex: s = 2: X = G Looking for exists. rcBc[Y, 2] = 5

  20. Y = A 5 ex: s = 3: X = A Looking for qualifies. rcBc[Y, 3] = 5

  21. ex:

  22. rcGs table We build the rcGs table which corresponds to the good suffix rules of Boyer-Moore algorithm. The good suffix rules are like the Rule 1, The Suffix to Prefix Rule, and Rule 2, The Substring Matching Rule.

  23. Rule 1: The Suffix to Prefix Rule • For a window to have any chance to match a pattern, in some way, there must be a suffix of the window which is equal to a prefix of the pattern. T P

  24. Rule 2: The Substring Matching Rule • For any substring u in T, find a nearest u in P which is to the left of it. If such an u in P exists, move P such then the two u’s match; otherwise, we may define a new partial window.

  25. A repeating suffix of a string S is a suffix which appears somewhere else in S. For instance, ABA is a repeating suffix of CABAGTABA. BA is also a suffix repeating suffix.

  26. Let x be the character to the left of a repeating suffix. A repeating suffix u of S is a maximal repeating suffix if xu does not appear elsewhere in S. For instance, in CABAGTABA , ABA is a maximal repeating suffix because TABA does not appear any where in S while BA is not because ABA appears somewhere else in S.

  27. Given a pattern P, denote all positions to the left of maximal repeating suffixes of P as special positions. The Reverse Colussi Algorithm consider these special positions first. In this case, we can see that the following suffixes are all maximal suffixes: G ( corresponding substring : G ) AG ( corresponding substring : CAG ) AGAG ( correspondingsubstring : CAGAG)

  28. For The special positions are

  29. For each maximal suffix u, let the last position of corresponding substring be located at p. Then, if a mismatching occur at the special positions with u, we may move Pm-p-1 steps, where m is length of P (Rule 2). p = 5 m = 8 u special position substring associates with u

  30. So we can move 8 - 5 - 1 = 2 as below: T: P: The number of steps moved for each special position is stored in a table, called hmin.

  31. special positions For a special position i = 3, we record its length of move 2 (8-5-1) on hmin[2]=3.

  32. special positions For a special position i = 5, we record its length of move 4 (8-3-1) on hmin[4]=5.

  33. special positions For a special position i = 6, we record its length of move 7 (8-0-1) on hmin[7]=6.

  34. Note that for special positions, Rule 2 (substring matching rule) can be used. • For non-special positions, Rule 1 (suffix to prefix rule) can be used.

  35. The basic idea of the Reverse Colussi • Algorithm is as follows: • We consider special positions first and • non-special positions next. • We use Rule 2 (substring matching rule) • when we consider special positions. • 3. We use Rule 1 (suffix to prefix rule) when • we consider non-special positions.

  36. After we compare special positions, we must compare the remainder positions, called non-special positions. We compare those non-special positions form left to right. The number of steps moved for each non-special position is stored in a table, called rmin. The value of rmin can be found by Rule 1 (the suffix to prefix rule).

  37. If a suffix S which exists at the right side of a non-special position i is equal to a prefix, rmin(i)=m-|S|. (|S| is the length of S.) If no such S exists, rmin(i)=m.

  38. ex1: A suffix S is equal to a prefix which is at right side of some non-special positions, so the values of rmin of these non-special positions are m-|S| ( 8-1 ). S

  39. ex2: A suffix S is equal to a prefix which is at right side of some non-special positions, so the values of rmin of these non-special positions are m-|S| ( 11-5 ). special positions S

  40. ex2: We find a shorter suffix at right side of some non-special position which is equal to a prefix, so the values of rmin of these non-special positions are m-|S| ( 11-3 ). special positions S

  41. ex2: And we find a shorter suffix at right side of some non-special position which is equal to a prefix, so the values of rmin of these non-special positions are m-|S| ( 11-1 ). special positions S

  42. ex3: No suffix is equal to any prefix, so the values of all non-special positions in rmin are m.

  43. rcGs table After we bulid those tables, we can use those tables to build the rcGs table. ex : GCAGAGAG

  44. rcGs table First, we fill the index of special positions that hmin is nonempty into rcGs table.

  45. rcGs table Second, we fill the rmin value that rmin is nonempty into rcGs table.

  46. rcGs table If P exact match with T, we can move P by Rule 1. Therefore, we fill rcGs[8]=m-|S| (8-1).

  47. ex: T= P= s = m = 8

  48. ex: Shift by 1 (rcBc[A][s], s = 8), and change s = 1

  49. ex: Shift by 2 (rcGs[1]), and change s = 2

  50. ex: Shift by 2 (rcGs[1]), and change s = 2

More Related