1 / 15

Hashing

Hashing. Hashing. Many applications require I NSERT , S EARCH and D ELETE functions Hashing on average time can do all of these in O (1) Based on keys Falls under two general categories: Direct-Address Tables Hash Tables. Direct-Addressing. Good for when universe U of keys is small

clowry
Télécharger la présentation

Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hashing Jeff Chastine

  2. Hashing • Many applications require INSERT, SEARCH and DELETE functions • Hashing on average time can do all of these in O (1) • Based on keys • Falls under two general categories: • Direct-Address Tables • Hash Tables Jeff Chastine

  3. Direct-Addressing • Good for when universe U of keys is small • U = {0, 1, …, m – 1 | m is not large} • All elements have unique keys • Table T [0..m -1] | each slot corresponds to a key • All operations take only O (1) Jeff Chastine

  4. Direct Implementation 0 key satellite data 1 U (universe of keys) 2 2 0 3 3 6 9 7 4 4 1 2 5 5 K (actual keys) 3 6 5 7 8 8 8 9 Jeff Chastine

  5. Direct-Addressing Operations DIRECT-ADDRESS-SEARCH (T, k) return T[k] DIRECT-ADDRESS-INSERT (T, x) T[key[x]] ←x DIRECT-ADDRESS-DELETE (T, x) T[key[x]] ←NIL Jeff Chastine

  6. Hash Tables • What are potential problems with direct addressing? • |U| may be impractical • Set of actual keys may be small • Example SSNs • Here, hash tables require much less storage • Only catch: O (1) is average time instead of worst-case ! Jeff Chastine

  7. How it works • With direct-addressing, something with key k goes into slot k • With hashing it goes into h (k) | h is a hash function • Hash functions try to “randomize” • Hash function maps U to T [0..m – 1] h :U→ {0, 1, …, m – 1} • Instead of |U| values,need only m values Jeff Chastine

  8. Hash Implementation T 0 U (universe of keys) h (k1) h (k4) k1 h (k2)= h (k5) K (actual keys) k5 k4 k2 k3 h (k3) m - 1 Jeff Chastine

  9. Collisions • Have two keys hash to the same slot • Because |U| > m, pigeon hole principle • Therefore, collisions must exist • We often talk of the load factor (α = n/m) • Pick a good hash function • Near random, yet deterministic • Can chain collisions together • This is where the worst-case comes from • Can use open addressing Jeff Chastine

  10. Chaining T U (universe of keys) k1 k7 k4 k7 k1 k5 k2 K (actual keys) k5 k4 k2 k3 k3 Jeff Chastine

  11. Hash Functions • What makes a good hash function? • Equally likely to hash to any of the m slots • If keys are random numbers [0 … 1} then take floor of km • Convert strings to ASCII to hash? • Most usually involve mod Jeff Chastine

  12. Hash Functions • Division method: h (k ) = k mod m • Multiplication method: Let 0 < A < 1 h (k ) = floor(m (k A mod 1) ) // Fractional part Jeff Chastine

  13. Open Addressing • Systematically examine or probe slots until item is found • No lists and no elements stored outside the table; thus α <= 1 • Instead of following pointers, we compute the sequence • Instead of fixed order – is based off of key Jeff Chastine

  14. Kinds of Open Addressing • Linear Probing h (k, i ) = (h’ (k ) + i ) mod m • Quadratic Probing h (k, i ) = (h’ (k ) +c1i + c2i 2) mod m • Double Hashing h (k, i ) = (h1(k ) + i h2(k )) mod m Jeff Chastine

  15. Jeff Chastine

More Related