Reverse Colussi algorithm

Reverse Colussi algorithm Fastest pattern matching in strings, Colussi, L. Journal of Algorithms, Vol. 16 , No. 2, 1994, pp.163-189 Advisor: Prof. R. C. T. Lee Speaker: Y. K. Shie

The Reverse Colussi Algorithm is an algorithm which solves the string matching problem and it is in the spirit of the original Colussi Algorithm..

The Main Points of the Reverse Colussi Algorithm 1. It changes the bad character rule from matching one character to matching a pair of characters. • Reverse Colussi algorithm divides the position into special position and non-special position. Special position allow smaller number of jump. • The Reverse Colussi Algorithm processes the special position first.

Note that the Colussi Algorithm does not consider all of the positions where the prefix function assumes value -1. That this can be done can be seen by the following fact: The position where prefix function assumes -1 allows the largest number of steps to shift. Thus the Colussi Algorithm examines all positions which allow smaller number of steps of shift which is a safe action.

In this Reverse Colussi Algorithm, we define some points which are special and some points which are not special. Special points allow smaller number of steps to shift than non-special points. Thus, in the Reverse Colussi Algorithm, we examine the special positions first. We shall make this clear later.

Ti is the ith character in T (1≦i≦n). Pj are the jth character in P (1≦j≦m). The bad character rule is like the Rule 2-1, Character Matching Rule.

Rule 2-1: Character Matching Rule(A Special Version of Rule 2) • For any character x in T, find the nearest x in P which is to the left of x in T.

Implication of Rule 2-1 • Case 1. If there is an x in P to the left of T, move P so that the two x’s match.

Case 2: If no such an x exists in P, consider the partial window defined by x in T and the string to the left of it.

Consider the following case where the last character X of the window of T does not match with the last character of P. rcBc table

Suppose we successfully find an X in P as shown below: rcBc table

rcBc table Then we can move P as shown as below:

rcBc table Suppose the last character Y of the window of T does not match with the last character of P as shown below:

rcBc table Then we try to find a pair of X and Y in P such that after we move P, these X and Y in P match with the X and Y in T.

Thus, the Reverse Colussi Algorithm uses a very special version of Rule 2: a pair of characters.

How do we find this pair of characters in P? We use the rcBc Table.

rcBc table Y is the last character of the windows of T. s is the length which we shift in last step. k is an integer. case 1: If we can find Pm-k-1=Y and Pm-k-s-1=Pm-s-1, we fill the minimal k into rcBc[Y, s]. case 2: If we can find Pm-k-1=Y and k>m-s-1, we fill the minimal k into rcBc[Y, s]. case 3: Otherwise, we fill the m into rcBc[Y, s].

Y = A ex: s = 1: X = A XY = AA does not exist in P. rcBc[Y, 1] = 8

Y = A 5 ex: s = 2: X = G Looking for exists. rcBc[Y, 2] = 5

Y = A 5 ex: s = 3: X = A Looking for qualifies. rcBc[Y, 3] = 5

ex:

rcGs table We build the rcGs table which corresponds to the good suffix rules of Boyer-Moore algorithm. The good suffix rules are like the Rule 1, The Suffix to Prefix Rule, and Rule 2, The Substring Matching Rule.

Rule 1: The Suffix to Prefix Rule • For a window to have any chance to match a pattern, in some way, there must be a suffix of the window which is equal to a prefix of the pattern. T P

Rule 2: The Substring Matching Rule • For any substring u in T, find a nearest u in P which is to the left of it. If such an u in P exists, move P such then the two u’s match; otherwise, we may define a new partial window.

A repeating suffix of a string S is a suffix which appears somewhere else in S. For instance, ABA is a repeating suffix of CABAGTABA. BA is also a suffix repeating suffix.

Let x be the character to the left of a repeating suffix. A repeating suffix u of S is a maximal repeating suffix if xu does not appear elsewhere in S. For instance, in CABAGTABA , ABA is a maximal repeating suffix because TABA does not appear any where in S while BA is not because ABA appears somewhere else in S.

Given a pattern P, denote all positions to the left of maximal repeating suffixes of P as special positions. The Reverse Colussi Algorithm consider these special positions first. In this case, we can see that the following suffixes are all maximal suffixes: G ( corresponding substring : G ) AG ( corresponding substring : CAG ) AGAG ( correspondingsubstring : CAGAG)

For The special positions are

For each maximal suffix u, let the last position of corresponding substring be located at p. Then, if a mismatching occur at the special positions with u, we may move Pm-p-1 steps, where m is length of P (Rule 2). p = 5 m = 8 u special position substring associates with u

So we can move 8 - 5 - 1 = 2 as below: T: P: The number of steps moved for each special position is stored in a table, called hmin.

special positions For a special position i = 3, we record its length of move 2 (8-5-1) on hmin[2]=3.

Note that for special positions, Rule 2 (substring matching rule) can be used. • For non-special positions, Rule 1 (suffix to prefix rule) can be used.

The basic idea of the Reverse Colussi • Algorithm is as follows: • We consider special positions first and • non-special positions next. • We use Rule 2 (substring matching rule) • when we consider special positions. • 3. We use Rule 1 (suffix to prefix rule) when • we consider non-special positions.

After we compare special positions, we must compare the remainder positions, called non-special positions. We compare those non-special positions form left to right. The number of steps moved for each non-special position is stored in a table, called rmin. The value of rmin can be found by Rule 1 (the suffix to prefix rule).

If a suffix S which exists at the right side of a non-special position i is equal to a prefix, rmin(i)=m-|S|. (|S| is the length of S.) If no such S exists, rmin(i)=m.

ex1: A suffix S is equal to a prefix which is at right side of some non-special positions, so the values of rmin of these non-special positions are m-|S| ( 8-1 ). S

ex2: A suffix S is equal to a prefix which is at right side of some non-special positions, so the values of rmin of these non-special positions are m-|S| ( 11-5 ). special positions S

ex2: We find a shorter suffix at right side of some non-special position which is equal to a prefix, so the values of rmin of these non-special positions are m-|S| ( 11-3 ). special positions S

ex2: And we find a shorter suffix at right side of some non-special position which is equal to a prefix, so the values of rmin of these non-special positions are m-|S| ( 11-1 ). special positions S

ex3: No suffix is equal to any prefix, so the values of all non-special positions in rmin are m.

rcGs table After we bulid those tables, we can use those tables to build the rcGs table. ex : GCAGAGAG

rcGs table First, we fill the index of special positions that hmin is nonempty into rcGs table.

rcGs table Second, we fill the rmin value that rmin is nonempty into rcGs table.

rcGs table If P exact match with T, we can move P by Rule 1. Therefore, we fill rcGs[8]=m-|S| (8-1).

ex: T= P= s = m = 8

ex: Shift by 1 (rcBc[A][s], s = 8), and change s = 1

ex: Shift by 2 (rcGs[1]), and change s = 2

Reverse Colussi algorithm

Reverse Colussi algorithm

Presentation Transcript

Reverse Mortgages

Reverse Engineering

REVERSE OSMOSIS

Reverse Engineering

Reverse Engineering

Reverse Circulation

Reverse Mortgages

reverse

Reverse Engineering

Reverse Interactomics

Reverse Fault

Reverse cycle

Reverse Engineering

Reverse Jeopardy

Reverse Mortgages

Advantages of Reverse Mortgage - Rainmaker Reverse

Reverse Osmosis