1 / 11

The Rabin-Karp Algorithm

The Rabin-Karp Algorithm. String Matching. Jonathan M. Elchison 19 November 2004 CS-3410 Algorithms Dr. Shomper. Background. String matching Naïve method n ≡ size of input string m ≡ size of pattern to be matched O( (n-m+1)m ) Θ ( n 2 ) if m = floor( n/2 ) We can do better.

Leo
Télécharger la présentation

The Rabin-Karp Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Rabin-Karp Algorithm String Matching Jonathan M. Elchison 19 November 2004 CS-3410 Algorithms Dr. Shomper

  2. Background • String matching • Naïve method • n ≡ size of input string • m ≡ size of pattern to be matched • O( (n-m+1)m ) • Θ( n2 ) if m = floor( n/2 ) • We can do better

  3. How it works • Consider a hashing scheme • Each symbol in alphabet Σ can be represented by an ordinal value { 0, 1, 2, ..., d } • |Σ| = d • “Radix-d digits”

  4. How it works • Hash pattern P into a numeric value • Let a string be represented by the sum of these digits • Horner’s rule (§ 30.1) • Example • { A, B, C, ..., Z } → { 0, 1, 2, ..., 26 } • BAN → 1 + 0 + 13 = 14 • CARD → 2 + 0 + 17 + 3 = 22

  5. Upper limits • Problem • For long patterns, or for large alphabets, the number representing a given string may be too large to be practical • Solution • Use MOD operation • When MOD q, values will be < q • Example • BAN = 1 + 0 + 13 = 14 • 14 mod 13 = 1 • BAN → 1 • CARD = 2 + 0 + 17 + 3 = 22 • 22 mod 13 = 9 • CARD → 9

  6. Searching

  7. Spurious Hits • Question • Does a hash value match mean that the patterns match? • Answer • No – these are called “spurious hits” • Possible cases • MOD operation interfered with uniqueness of hash values • 14 mod 13 = 1 • 27 mod 13 = 1 • MOD value q is usually chosen as a prime such that 10q just fits within 1 computer word • Information is lost in generalization (addition) • BAN → 1 + 0 + 13 = 14 • CAM → 2 + 0 + 12 = 14

  8. Code RABIN-KARP-MATCHER( T, P, d, q ) n ← length[ T ] m ← length[ P ] h ← dm-1 mod q p ← 0 t0 ← 0 for i ← 1 to m ► Preprocessing do p ← ( d*p + P[ i ] ) mod q t0 ← ( d*t0 + T[ i ] ) mod q for s ← 0 to n – m ► Matching do if p = ts then if P[ 1..m ] = T[ s+1 .. s+m ] then print “Pattern occurs with shift” s if s < n – m then ts+1 ← ( d * ( ts – T[ s + 1 ] * h ) + T[ s + m + 1 ] ) mod q

  9. Performance • Preprocessing (determining each pattern hash) • Θ( m ) • Worst case running time • Θ( (n-m+1)m ) • No better than naïve method • Expected case • If we assume the number of hits is constant compared to n, we expect O( n ) • Only pattern-match “hits” – not all shifts

  10. Demonstration • http://www-igm.univ-mlv.fr/~lecroq/string/node5.html

  11. Sources: • Cormen, Thomas S., et al. Introduction to Algorithms. 2nd ed. Boston: MIT Press, 2001. • Karp-Rabin algorithm. 15 Jan 1997. <http://www-igm.univ-mlv.fr/~lecroq/string/node5.html>. • Shomper, Keith. “Rabin-Karp Animation.” E-mail to Jonathan Elchison. 12 Nov 2004. The Rabin-Karp Algorithm String Matching Jonathan M. Elchison 19 November 2004 CS-3410 Algorithms Dr. Shomper

More Related