1 / 20

KMP algorithm

KMP algorithm. KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R.,, Fast pattern matching in strings, SIAM Journal on Computing 6(1), 1977, pp.323-350. . Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan. b. a. b. c. a. b. c. d. a. b. c. d. d. a. c. d. a. a. c. a. b. c. d. b. 15.

maja
Télécharger la présentation

KMP algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KMP algorithm KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R.,, Fast pattern matching in strings, SIAM Journal on Computing 6(1), 1977, pp.323-350. Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan

  2. b a b c a b c d a b c d d a c d a a c a b c d b 15 a c 0 c d a b c d 1 2 3 7 5 14 4 16 9 6 13 12 8 10 19 18 17 11 Definition • String Matching Problem: Input: A text string T with length n and a pattern string P with length m. Output: Find all of the positions that occurrence of P in T. Example T: P: The occurrences of P in T start at: T4 ,T10 ,T16

  3. Knuth-Morris-Pratt algorithm • The design of the Knuth-Morris-Pratt algorithm follows a tight analysis of the Morris and Pratt algorithm. The KMP algorithm just improves MP algorithm. • KMP algorithm performs the comparisons from left to right.

  4. Ti: the character of the ith position of text T. Pj : the character of the jth position of pattern P. n :The length of T. m : The length of P. Ti,j: the string Ti Ti+1…Tj , 0≦i ≦j ≦n-1. Pi,j: the string Pi Pi+1…Pj , 0≦i ≦j ≦m-1. Example: We suppose that P0 is aligned to Ts . s=5 X ← mismatch Shift According the above example: T11 = ‘t’ P6 = ‘g’ T5,8 = ‘atca’ P1,3= ‘tca’ n=20 m = 12 T5,10=P0,5 T11≠P6 Ts =T5=‘a’

  5. a b u u Let a,b and c be variables. When the mismatch occurrs, we use ‘u’ to denote the portion of the pattern, that is P0,i-1, which is front of the mismatch character, that is Pi. There is a substringwhich is equal to u of pattern P in the text T. There are some the prefixes of u which is equal to the suffixes of u. We use vp to denote the longest prefix of u which is equal some of suffix of u, that is vs. We can say that vp is the border of u. Consider an attempt at a left position j on T, that is when the window is positioned on the substring of the text Tj,j+m-1. Assume that the first mismatch occurs between Pi and Ti+j (a≠b) with 0 ≦ i ≦ m-1. Then, P0,i-1 = Tj,i+j-1 = u and b = Pi ≠ Ti+j=a. j j+m-1 i+j n-1 0 T: X P: m-1 0 i 0 x i m-1 vp vs P: u border i-1-x i-1

  6. b b b a b b b b b b b a vp vp vp vp vp vs vs vs vs vs Difference between MP and KMP algorithm In the same situation, the difference between MP algorithm and KMP algorithm is KMP algorithm more than one step in the shifting. MP algorithm: Example T: u X P: KMP algorithm: T: u X P:

  7. A: P(0) = P(i) B: P(0, j) is a suffix of P(0, i-1) C: P(j+1) = P(i) P(i) = 1 P(i) ≠-1

  8. A case where prefix(i) = 0 0 i P b v c b v c There is a suffix of P(0, i) equal to a prefix of P and Pi≠ P0 0 Prefix function of P In this case, we move P i steps.

  9. A full example special

  10. a a b b c c d a a b b c c a d b d a b a a a b b b c c a b c c b c d a a b c c a b c a b d a a c c b b d a a c c b b d b 7 3 c 1 2 a 0 5 6 7 a 4 3 3 d 5 -1 0 2 -1 c 0 0 4 0 1 1 b 4 0 Position : Pattern : kmpNext[i]: Shift : Example Ⅰ. T: P: Ⅱ. T: P:

  11. a c d a b b b a c d a c a c b c b b a c a b b d c a a b b c d c b a a c a d c c d a b c a c d a b c a a c b b a b a a b b c d a a b b c c d c b b 6 c 7 a 7 5 3 3 2 1 0 b 0 4 0 -1 4 -1 d 2 c b 0 0 1 3 5 4 1 a 0 Position : Pattern : kmpNext[i]: Shift : Ⅲ. T: P: exact match Ⅳ. T: P: exact match

  12. a b c d a a b b c b c a b c a a a a b d a a b b d a a b c c d b a c b b c c d c b a b c b b c c d a a b b c c d a b c a a a a b c c b a d b c d a a b c b c d a a b b c c d c c 7 a 3 7 6 1 4 3 2 0 b 0 5 1 5 0 -1 0 1 -1 d c 0 0 3 4 4 2 a b Position : Pattern : kmpNext[i]: Shift : Ⅴ. T: P: exact match Ⅵ. T: P: exact match

  13. a b a b b c c a a b b b d d a b a b a c a c d c c c d a a b c c d a a b b b c c b b a a c c d 3 4 a 2 3 c b 6 7 a 5 1 7 0 b 0 4 d -1 0 3 -1 0 0 0 1 5 4 c 2 1 Position : Pattern : kmpNext[i]: Shift : Ⅶ. T: P: exact match exact match

  14. Time Complexity Preprocessing phase in O(m) space and time complexity. Searching phase in O(n+m) time complexity.

  15. The prefix function used in KMP method can be obtained by modifying the prefix function algorithm used in the MP method by paying attention to the conditions where prefix(i)=0 and -1.

  16. References • AHO, A.V., 1990, Algorithms for finding patterns in strings. in Handbook of Theoretical Computer Science, Volume A, Algorithms and complexity, J. van Leeuwen ed., Chapter 5, pp 255-300, Elsevier, Amsterdam. • AOE, J.-I., 1994, Computer algorithms: string pattern matching strategies, IEEE Computer Society Press. • BAASE, S., VAN GELDER, A., 1999, Computer Algorithms: Introduction to Design and Analysis, 3rd Edition, Chapter 11, pp. ??-??, Addison-Wesley Publishing Company. • BAEZA-YATES R., NAVARRO G., RIBEIRO-NETO B., 1999, Indexing and Searching, in Modern Information Retrieval, Chapter 8, pp 191-228, Addison-Wesley. • BEAUQUIER, D., BERSTEL, J., CHRÉTIENNE, P., 1992, Éléments d'algorithmique, Chapter 10, pp 337-377, Masson, Paris. • CORMEN, T.H., LEISERSON, C.E., RIVEST, R.L., 1990. Introduction to Algorithms, Chapter 34, pp 853-885, MIT Press. • CROCHEMORE, M., 1997. Off-line serial exact string searching, in Pattern Matching Algorithms, ed. A. Apostolico and Z. Galil, Chapter 1, pp 1-53, Oxford University Press. • CROCHEMORE, M., HANCART, C., 1999, Pattern Matching in Strings, in Algorithms and Theory of Computation Handbook, M.J. Atallah ed., Chapter 11, pp 11-1--11-28, CRC Press Inc., Boca Raton, FL. • CROCHEMORE, M., LECROQ, T., 1996, Pattern matching and text compression algorithms, in CRC Computer Science and Engineering Handbook, A. Tucker ed., Chapter 8, pp 162-202, CRC Press Inc., Boca Raton, FL. • CROCHEMORE, M., RYTTER, W., 1994, Text Algorithms, Oxford University Press. • GONNET, G.H., BAEZA-YATES, R.A., 1991. Handbook of Algorithms and Data Structures in Pascal and C, 2nd Edition, Chapter 7, pp. 251-288, Addison-Wesley Publishing Company.

  17. References • GOODRICH, M.T., TAMASSIA, R., 1998, Data Structures and Algorithms in JAVA, Chapter 11, pp 441-467, John Wiley & Sons. • GUSFIELD, D., 1997, Algorithms on strings, trees, and sequences: Computer Science and Computational Biology, Cambridge University Press. • HANCART, C., 1992, Une analyse en moyenne de l'algorithme de Morris et Pratt et de ses raffinements, in Théorie des Automates et Applications, Actes des 2e Journées Franco-Belges, D. Krob ed., Rouen, France, 1991, PUR 176, Rouen, France, 99-110. • HANCART, C., 1993. Analyse exacte et en moyenne d'algorithmes de recherche d'un motif dans un texte, Ph. D. Thesis, University Paris 7, France. • KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R., 1977, Fast pattern matching in strings, SIAM Journal on Computing 6(1):323-350. • SEDGEWICK, R., 1988, Algorithms, Chapter 19, pp. 277-292, Addison-Wesley Publishing Company. • SEDGEWICK, R., 1988, Algorithms in C, Chapter 19, Addison-Wesley Publishing Company. • SEDGEWICK, R., FLAJOLET, P., 1996, An Introduction to the Analysis of Algorithms, Chapter ?, pp. ??-??, Addison-Wesley Publishing Company. • STEPHEN, G.A., 1994, String Searching Algorithms, World Scientific. • WATSON, B.W., 1995, Taxonomies and Toolkits of Regular Language Algorithms, Ph. D. Thesis, Eindhoven University of Technology, The Netherlands. • WIRTH, N., 1986, Algorithms & Data Structures, Chapter 1, pp. 17-72, Prentice-Hall.

  18. Thank You!

  19. To implement the KMP Algorithm, we only have to modify the prefix function of the MP function. That is, if the following conditions is satisfied, Prefix (i) =-1. 0 i P b b Before b, there is no suffix of P(0, i-1) equal to any prefix of P and Pi=P0. Whenever this occurs, we move P i-(-1) = i+1 steps (the largest step). -1

  20. Another case where prefix(i) = -1 0 i P b v b b v b There is a suffix of P(0, i) equal to a prefix of P and Pi = P0 -1 Prefix function of P In this case, we again move P i+1 steps.

More Related