1 / 21

RAIK 283: Data Structures & Algorithms

RAIK 283: Data Structures & Algorithms. Space-Time Tradeoffs: String Matching Algorithms* Dr. Ying Lu ylu@cse.unl.edu. *slides referred to http://www.aw-bc.com/info/levitin. Motivation example.

ocean
Télécharger la présentation

RAIK 283: Data Structures & Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RAIK 283: Data Structures & Algorithms Space-Time Tradeoffs: String Matching Algorithms* Dr. Ying Lu ylu@cse.unl.edu *slides referred to http://www.aw-bc.com/info/levitin Design and Analysis of Algorithms – Chapter 7

  2. Motivation example • How to sort array a[1], a[2], …., a[n], whose values are known to come from a set, e.g., {a, b, c, d, e, f, …. x, y, z}? Design and Analysis of Algorithms – Chapter 7

  3. Space-time tradeoffs For many problems some extra space really pays off: • Extra space in tables • hashing • non comparison-based sorting (e.g., sorting by counting the array a[1], a[2], …., a[n], whose values are known to come from a set, e.g., {a, b, c, d, e, f, …. x, y, z}) • Input enhancement • auxiliary tables (shift tables for pattern matching) • indexing schemes (e.g., B-trees) • Tables of solutions for overlapping subproblems • dynamic programming Design and Analysis of Algorithms – Chapter 7

  4. String matching • pattern: a string of m characters to search for • text: a (long) string of n characters to search in • Brute force algorithm? Design and Analysis of Algorithms – Chapter 7

  5. String matching • pattern: a string of m characters to search for • text: a (long) string of n characters to search in • Brute force algorithm: • Align pattern at beginning of text • moving from left to right, compare each character of pattern to the corresponding character in text until • all characters are found to match (successful search); or • a mismatch is detected • while pattern is not found and the text is not yet exhausted, realign pattern one position to the right and repeat step 2. • Time efficiency: Design and Analysis of Algorithms – Chapter 7

  6. String matching • pattern: a string of m characters to search for • text: a (long) string of n characters to search in • Brute force algorithm: • Align pattern at beginning of text • moving from left to right, compare each character of pattern to the corresponding character in text until • all characters are found to match (successful search); or • a mismatch is detected • while pattern is not found and the text is not yet exhausted, realign pattern one position to the right and repeat step 2. • Time efficiency: (nm) Design and Analysis of Algorithms – Chapter 7

  7. Horspool’s Algorithm • A simplified version of Boyer-Moore algorithm that retains key insights: • compare pattern characters to text from right to left • given a pattern, create a shift table that determines how much to shift the pattern when a mismatch occurs (input enhancement) Design and Analysis of Algorithms – Chapter 7

  8. How far to shift when a mismatch occurs ? Look at first (rightmost) character in text that was compared. Three cases: • The character is not in the pattern .....c...................... (c not in pattern) BAOBAB • The character is in the pattern (but not at rightmost position) .....O...................... (O occurs once in pattern) BAOBAB .....A...................... (A occurs twice in pattern) BAOBAB • The rightmost characters produced a match .....B...................... BAOBAB .....B...................... AAOAAB • Shift Table: Stores number of characters to shift depending on first character compared Design and Analysis of Algorithms – Chapter 7

  9. Shift table • Indexed by text and pattern alphabet • If a character c is not among the first m-1 characters of the pattern, then t(c) = the pattern’s length m • Otherwise, t(c) = the distance from the rightmost c among the first m-1 characters of the pattern to its last character Design and Analysis of Algorithms – Chapter 7

  10. A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 Shift table • Constructed by scanning pattern before search begins • Indexed by text and pattern alphabet • All entries are initialized to length of pattern. E.g., BAOBAB: • For c occurring in the first m-1 characters of the pattern, update table entry to distance of rightmost c among the first m-1 characters of the pattern to its last character Design and Analysis of Algorithms – Chapter 7

  11. A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 Shift table • Constructed by scanning pattern before search begins • Indexed by text and pattern alphabet • All entries are initialized to length of pattern. E.g., BAOBAB: • For c occurring in the first m-1 characters of the pattern, update table entry to distance of rightmost c among the first m-1 characters of the pattern to its last character Design and Analysis of Algorithms – Chapter 7

  12. Shift table • We can create the shift table by processing pattern from L→R: Algorithm ShiftTable(P[0…m-1]) { initialize all the elements of Table with m for j = 0 to m-2 do Table[P[j]] = m-1-j return Table } Design and Analysis of Algorithms – Chapter 7

  13. A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Example BARD LOVED BANANAS BAOBAB Design and Analysis of Algorithms – Chapter 7

  14. In-class exercises • P267: 7.2.1--- Apply Horspool’s algorithm to search for the pattern BAOBAB in the text BESS_KNEW_ABOUT_BAOBABS • P268: 7.2.3--- How many character comparisons will be made by Horspool’s algorithm in searching for each of the following patterns in the binary text of 1000 zeros? • 00001 • 10000 Design and Analysis of Algorithms – Chapter 7

  15. In-class exercises • P264: 7.2.1--- Apply Horspool’s algorithm to search for the pattern BAOBAB in the text BESS_KNEW_ABOUT_BAOBABS • P264: 7.2.3--- How many character comparisons will be made by Horspool’s algorithm in searching for each of the following patterns in the binary text of 1000 zeros? • 00001 (1*996 = 996 comparisons) • 10000 (5*996 = 4980 comparisons) Design and Analysis of Algorithms – Chapter 7

  16. Time Efficiency • Horspool’s algorithm • (nm) in the worst-case. • (n) for random texts Design and Analysis of Algorithms – Chapter 7

  17. In-Class Exercises • You have an array of positive integers like ar[]={1,3,2,4,5,4,2}. You need to create another array ar_low[] such that ar_low[i] = number of elements lower than or equal to ar[i]. So the output of above should be {1,4,3,6,7,6,3}. • Algorithm time complexity should be O(n) and use of extra space is allowed.

  18. Boyer-Moore algorithm • Based on same two ideas: • compare pattern characters to text from right to left • given a pattern, create a shift table that determines how much to shift the pattern when a mismatch occurs (input enhancement) • Uses additional shift table with same idea applied to the number of matched characters Design and Analysis of Algorithms – Chapter 7

  19. In-class exercises • P264: 7.2.7 How many character comparisons will Boyer-Moore algorithm make in searching for each of the following patterns in the binary text of 1000 zeros? • 00001 • 10000 Design and Analysis of Algorithms – Chapter 7

  20. In-class exercises • P264: 7.2.7 How many character comparisons will Boyer-Moore algorithm make in searching for each of the following patterns in the binary text of 1000 zeros? • 00001 (1*996 = 996 comparisons) • 10000 (5*200 = 1000 comparisons) Design and Analysis of Algorithms – Chapter 7

  21. In-class exercises • P264: 7.2.3--- How many character comparisons will be made by Horspool’s algorithm in searching for each of the following patterns in the binary text of 1000 zeros? • 01010 • P264: 7.2.7 How many character comparisons will Boyer-Moore algorithm make in searching for each of the following patterns in the binary text of 1000 zeros? • 01010 Design and Analysis of Algorithms – Chapter 7

More Related