Average Case Analysis of an Exact String Matching Algorithm

Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen

Problem Definition We are given text T=t1t2…tn with length n and a pattern P=p1p2…pm with length m and we are asked to find all occurrences of P in T. Example: There are two occurrences of P in T as shown below:

There are many rules in exact string matching algorithms. For example, the Suffix to Prefix Rule, the Substring Matching Rule, ….

We use the idea, the substring matching rule, in this algorithm.

The Substring Matching Rule For any substring S in T, find a nearest S in P which is to the left of it. If such an S in P exists, move P such then the two S’s match; otherwise, we may define a new partial window.

In this algorithm, we first check whether S=T[i-r+1…i] is a substring of P or not. If S does not occur in P, we shift P to right m-r steps.

If S occurs in P, according to the Substring Matching Rule, we should slide P so that the two substrings S match as shown below.

But, our algorithm is not that smart, instead of sliding P so that the two substrings S match, we simply examine the entire window starting from i-m+1 to 2i-r to see whether P occurs in this window, as shown below.

Note that our not so smart algorithm covers the case of sliding P to match the two substrings S.

Algorithm • Algorithm fast-on-average; • i=m; • while i≦n do begain • if T[i-r+1…i] is a substring of P then • compute all occurrences of P whose starting positions are in T[i-m+1…i-r+1] applying KMP algorithm. • else { P does not start in T[i-m+1…i-r+1] } • i=i+m-r • end

Analysis First of all, let us note that in the above algorithm, we have to determine whether the suffix S occurs in P or not. This is again an exact string matching problem. Let us assume that there is a pre-processing to construct a suffix tree of P. Whether S occurs in P or not can be determined by feeding S into the suffix tree of P. Because the length of S is r, we can determine whether S occurs in P in O(r).

For reasons which will become clear later, we assume that

We assume that the text is a random string and the size of alphabet is α.

There are αr possible substrings with length r consisting ofαdistinct characters. There are only m-r substrings with length r in P whose length is m . Thus, the probability that S is a substring of P is not great than

If S is a substring of P, we find all occurrences of P in T[i-m…2i-r] using KMP algorithm.

Because the length of T[i-m…2i-r] is 2m-r, time complexity of Step i using KMP algorithm is O(m)

(1)The probability that S occurs in P is . (2)When S occurs in P, the time complexity that we use KMP algorithm to find all occurrences of P in T[i-m+1…2i-r] is O(m). Summary of (2) and (3), the average time-complexity of applying the KMP algorithm is In the above, the time complexity of checking whether S occurs in P is O(r). Thus, the average time-complexity of applying the KMP algorithm once is O(r).

Thus, if S does not occurs in P, the time complexity of Step i is only the checking time-complexity which is O(r) . If does, the time complexity of Step i is O(r).

Because there are windows with length m in T, the time complexity of this algorithm on average is .

Reference • [KMP77] Faster Pattern Matching in Strings, SIAM Journal on Computing6 (2),1977, pp. 323–350. • [CR2002] Section 2.2:Boyer-Moore algorithm and its variations, Jewels of Stringology, 2002, pp. 30-31.

Thank you

Average Case Analysis of an Exact String Matching Algorithm

Average Case Analysis of an Exact String Matching Algorithm

Presentation Transcript

Fast Exact String Matching On the GPU

String Matching

A Fast String Matching Algorithm

String Matching Using the Rabin-Karp Algorithm

A Fast String Matching Algorithm

R-trees: An Average Case Analysis

String Matching

Exact String Matching, Suffix Trees, and Applications

Exact Matching

String Matching

String Matching

String Matching

String Matching

String Matching

String Matching: Knuth-Morris-Pratt algorithm

brute force string matching algorithm

Exact String Matching Algorithms

Rules in Exact String Matching Algorithms

String Matching

String matching

String Matching

String Matching