1 / 19

A Fast String Matching Algorithm

A Fast String Matching Algorithm. The Boyer Moore Algorithm. The obvious search algorithm . Considers each character position of str and determines whether the successive patlen characters of str matches pat . In worst case, the number of comparisons is in the order of i*patlen .

mauli
Télécharger la présentation

A Fast String Matching Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Fast String Matching Algorithm The Boyer Moore Algorithm

  2. The obvious search algorithm • Considers each character position of str and determines whether the successive patlen characters of str matches pat. • In worst case, the number of comparisons is in the order of i*patlen. Ex. pat: aab ; str: ..aaaaac .

  3. Knuth-Pratt-Morris Algoritm • Linear search algorithm. • Preprocesses pat in time linear in patlen and searches str in time linear in i+patlen. EXAMPLE HERE IS A SIMPLE EXAMPLE … EXAMPLE EXAMPLE EXAMPLE

  4. Characteristics of Boyer Moore Algorithm • Basic idea: string matches the pattern from the right rather than from the left. • Expected value: c*( i +patlen ), c<1 • Preprocessing pat and compute two tables: delta1 & delta2 for shifting pat & the pointer of str. • Ex. pat : AT-THAT; str : …WHICH-FINALLY-HALTS.—AT-THAT-POINT

  5. Informal Description Compare the last char of the pat with the patlenth char of str : AT-THAT WHICH-FINALLY-HALTS.—AT-THAT-POINT Observation 1: charis not to occur in pat, skip patlen( =delta1(F) ) chars of str. AT-THAT

  6. Informal Description Observation 2: char is in pat, slide pat downdelta1(-) positions so that char is aligned to the corresponding character in pat. delta1(char)= if char not occur in pat,then patlen ; else patlen –j , where j is the maximum integer such that pat(j)=char. • AT-THAT • WHICH-FINALLY-HALTS.--AT-THAT-POINT

  7. Informal Description Observation 3a:str matches the last m chars of pat, and came to a mismatch at some new char. Move strptr by delta1(L).(pat shifted by delta1(L)-m) AT-THAT …FINALLY-HALTS.--AT-THAT-POINT AT-THAT

  8. Informal Description Observation 3b: the final m chars of pat(a subpat) is matched, find the right most plausible reoccurrence of the subpat, align it with the matched m chars of str (slide pat delta2(-) positions). AT-THAT …FINALLY-HALTS.—AT-THAT-POINT AT-THAT AT-THAT

  9. The delta1 & delta2 tables • The delta1 table has as many entries as there are chars in the alphabet. Ex. pat: a b c d e ; a t – t h a t delta1: 4 3 2 1 0 else,5; 1 0 4 0 2 1 0 else,7 • The delta2 table has as many entries as there are chars in pat. delta2( j )= ( j + 1- rpr(j) ) + (patlen – j)= patlen + 1 - rpr(j) Ex. pat: a b c d e ; a t - t h a t delta2: 9 8 7 6 1 ; 11 10 9 8 7 8 1

  10. The algorithm stringlen length of string. i patlen. top : if i > stringlen then return false. j patlen. loop: if j=0 then return i+1. if string(i)=pat(j) then j j-1 i i-1 goto loop. close; i i +max( delta1(sting(i)) , delta2(j)) goto top.

  11. Performance (empirical evidence)

  12. The Implementation in mstring.c • Function: make_skip(char*, int) • Purpose: create the skip(delta 1) table • Function inputs: char *ptrn, int plen • Local variables: int *skip, *sptr • Return: int *skip • Function: make_shift(char*, int) • Purpose: create the shift(delta2) table • Function inputs: char*ptrn, int plen • Local variables: int *shift, *sptr; char *pptr, c • Return: int *shift

  13. Flowchart of make_skip() Allocate memory to skip Return skip true *skip++=plen+1 plen==0? false skip[*ptrn++]=plen--

  14. make_skip() int *make_skip(char *ptrn, int plen) { int *skip = (int *) malloc(256 * sizeof(int)); int *sptr = &skip[256]; if (skip == NULL) FatalPrintError("malloc"); while(sptr-- != skip) *sptr = plen + 1; while(plen != 0) skip[(unsigned char) *ptrn++] = plen--; return skip; }

  15. Allocate memory to shift Procedures of make_shift(): c=ptrn[plen-1]; Look for rpr of c Look for two identical subpat Assign values to shift Return shift

  16. make_shift() int *shift = (int *) malloc(plen * sizeof(int)); int *sptr = shift + plen - 1; char *pptr = ptrn + plen - 1; char c; if (shift == NULL) FatalPrintError("malloc"); c = ptrn[plen - 1]; *sptr = 1;

  17. make_shift() while(sptr-- != shift) { char *p1 = ptrn + plen - 2, *p2, *p3; do { while(p1 >= ptrn && *p1-- != c); p2 = ptrn + plen - 2; p3 = p1; while(p3 >= ptrn && *p3-- == *p2-- && p2 >= pptr); } while(p3 >= ptrn && p2 >= pptr); // p2>=j,p3>=1 *sptr = shift + plen - sptr + p2 - p3; pptr--; }return shift;

  18. Ex:j=5 j= 1 2 3 4 5 6 7 Pat: edbcabc step1 p1 step2 p3 p2 syep3 p3 p2 ∴ delta2( j )= (p2-p3)+ (plen – j) =5

More Related