350 likes | 570 Vues
Knuth-Morris-Pratt. String matching algorithm. Ivaylo Kenov. Telerik Corporation. http:/telerikacademy.com. Telerik Academy Student. Table of Contents. Background and idea The “naive” approach Basic definitions Preprocessing Search algorithm Complexity Additional information.
E N D
Knuth-Morris-Pratt String matching algorithm IvayloKenov Telerik Corporation http:/telerikacademy.com Telerik Academy Student
Table of Contents • Background and idea • The “naive” approach • Basic definitions • Preprocessing • Search algorithm • Complexity • Additional information
Background and idea What is the problem?
Background and idea • The problem of string matching. • We have string text and pattern word. • Check if word occurs in text. • If so, return the position where pattern occurs. • If not, return -1.
The “naive” approach New to string searching
The naive approach (1) • Very obvious solution – compare element by element. • O(m*n) complexity – not good! • Example: String Text Pattern Word
The naive approach (2) • Step 1: compare word[0] with text[0] • Step 2: compare word[1] with text[1] Text Word Text Word
The naive approach (3) • Step 1: compare word[2] with text[2] • Mismatch found – shift word one index to the right and repeat! Text Word Text Word
The naive approach (4) • A match will be found after three shifts to the right of the word! • Problem with the “naive” approach – two much comparisons over the same character! Text Word
The “naive” approach Live demo
Knuth-Morris-Pratt Without repeating!
Knuth-Morris-Pratt • Linear time algorithm for string matching. • O(n) complexity. • Backtracking never occurs. • Already visited characters are not repeated! • Useful with binary data and small-alphabet strings.
Basic definitions Easy theory!
Basic definitions (1) • Prefix – a substring with which our string starts. • Example: “abcdef” starts with “abc”. • Suffix – a substring with which our string ends. • Example: “abcdef” ends with “def”. • Proper prefix and proper suffix – if the length of the substring is less than the length of the string.
Basic definitions (2) • Border - if a substring is proper prefix and proper suffix at the same time. • Example: “ab” is border of “abcab”. • Width of border – length of the border. • The empty string “” is proper prefix, proper suffix and border at the same time of any string!
Basic definitions (3) • How much the algorithm shifts the pattern? • The shift distance is determined by the widest border of the matching prefix of word. • Distance = length of the matching prefix – length of the widest border.
Preprocessing Building every border!
Preprocessing (1) • If a, b are borders of text and length of a < length of b, then a is border of b. • A border r of x can be extended by a, if ra is border of xa.
Preprocessing (2) • We build an array table, which contains information about border widths. • When preprocessing a value, we already know the previous ones and use the extending of the borders for checking. • Border can be extended if tableb[i] = tablei. • If not next border to check is table[table[i]].
Preprocessing (3) • Algorithm for building the table: void FailFunction(string word) { int index = 0; intborderWidth = -1; failureTable[index] = borderWidth; while (index < word.Length) { while (borderWidth >= 0 && word[index] != word[borderWidth]) { borderWidth = failureTable[borderWidth]; } index++; borderWidth++; failureTable[index] = borderWidth; } }
Preprocessing (4) • Example for table: • For pattern ”ababaa” the widths of the borders in array b have the following values. For instance we have table[5] = 3, since the prefix “ababa” of length 5 has a border of width 3. • Note: zero element is always -1.
Preprocessing Live demo
Search algorithm Finding the word!
Search algorithm (1) • The search algorithm is similar: static intKMPSearch(string text, string word, int position) { int index = 0; intborderWidth = 0; intcurrentPosition = 1; while (index < text.Length) { while (borderWidth >= 0 && text[index] != word[borderWidth]) { borderWidth = failureTable[borderWidth]; } index++; borderWidth++; Continues…
Search algorithm (2) • Algorithm continues: Continues… if (borderWidth == word.Length) { if (position == currentPosition) { return (index - borderWidth); } else { currentPosition++; } borderWidth = failureTable[borderWidth]; } } return -1; }
Search algorithm (3) • How it works: • Example:
Search algorithm Live demo
Complexity Linear time algorithm!
Complexity • The table building algorithm is O(m) where m is the length of the pattern. • The search algorithm is O(n) where n is the length of the text. • Overall complexity therefore is O(n).
Additional information • Wikipedia: http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm#Worked_example_of_the_table-building_algorithm • Knuth-Morris-Pratt explained: http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm • Examples and concept: http://wcipeg.com/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
Free Trainings @ Telerik Academy • “C# Programming @ Telerik Academy • csharpfundamentals.telerik.com • Telerik Software Academy • academy.telerik.com • Telerik Academy @ Facebook • facebook.com/TelerikAcademy • Telerik Software Academy Forums • forums.academy.telerik.com