1 / 22

Average Case Analysis of an Exact String Matching Algorithm

Average Case Analysis of an Exact String Matching Algorithm. Advisor : Professor R. C. T. Lee Speaker : S. C. Chen. Problem Definition.

norton
Télécharger la présentation

Average Case Analysis of an Exact String Matching Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen

  2. Problem Definition We are given text T=t1t2…tn with length n and a pattern P=p1p2…pm with length m and we are asked to find all occurrences of P in T. Example: There are two occurrences of P in T as shown below:

  3. There are many rules in exact string matching algorithms. For example, the Suffix to Prefix Rule, the Substring Matching Rule, ….

  4. We use the idea, the substring matching rule, in this algorithm.

  5. The Substring Matching Rule For any substring S in T, find a nearest S in P which is to the left of it. If such an S in P exists, move P such then the two S’s match; otherwise, we may define a new partial window.

  6. In this algorithm, we first check whether S=T[i-r+1…i] is a substring of P or not. If S does not occur in P, we shift P to right m-r steps.

  7. If S occurs in P, according to the Substring Matching Rule, we should slide P so that the two substrings S match as shown below.

  8. But, our algorithm is not that smart, instead of sliding P so that the two substrings S match, we simply examine the entire window starting from i-m+1 to 2i-r to see whether P occurs in this window, as shown below.

  9. Note that our not so smart algorithm covers the case of sliding P to match the two substrings S.

  10. Algorithm • Algorithm fast-on-average; • i=m; • while i≦n do begain • if T[i-r+1…i] is a substring of P then • compute all occurrences of P whose starting positions are in T[i-m+1…i-r+1] applying KMP algorithm. • else { P does not start in T[i-m+1…i-r+1] } • i=i+m-r • end

  11. Analysis First of all, let us note that in the above algorithm, we have to determine whether the suffix S occurs in P or not. This is again an exact string matching problem. Let us assume that there is a pre-processing to construct a suffix tree of P. Whether S occurs in P or not can be determined by feeding S into the suffix tree of P. Because the length of S is r, we can determine whether S occurs in P in O(r).

  12. For reasons which will become clear later, we assume that

  13. We assume that the text is a random string and the size of alphabet is α.

  14. There are αr possible substrings with length r consisting ofαdistinct characters. There are only m-r substrings with length r in P whose length is m . Thus, the probability that S is a substring of P is not great than

  15. If S is a substring of P, we find all occurrences of P in T[i-m…2i-r] using KMP algorithm.

  16. Because the length of T[i-m…2i-r] is 2m-r, time complexity of Step i using KMP algorithm is O(m)

  17. (1)The probability that S occurs in P is . (2)When S occurs in P, the time complexity that we use KMP algorithm to find all occurrences of P in T[i-m+1…2i-r] is O(m). Summary of (2) and (3), the average time-complexity of applying the KMP algorithm is In the above, the time complexity of checking whether S occurs in P is O(r). Thus, the average time-complexity of applying the KMP algorithm once is O(r).

  18. Thus, if S does not occurs in P, the time complexity of Step i is only the checking time-complexity which is O(r) . If does, the time complexity of Step i is O(r).

  19. Because there are windows with length m in T, the time complexity of this algorithm on average is .

  20. Reference • [KMP77] Faster Pattern Matching in Strings, SIAM Journal on Computing6 (2),1977, pp. 323–350. • [CR2002] Section 2.2:Boyer-Moore algorithm and its variations, Jewels of Stringology, 2002, pp. 30-31.

  21. Thank you

More Related