1 / 22

Hash Functions

Hash Functions. Andy Wang Data Structures, Algorithms, and Generic Programming. Introduction. Hash function Maps keys to integers (buckets) Hash(Key) = Integer Ideally in a random-like manner Evenly distributed bucket values Even if the input data is not evenly distributed. An Example.

morna
Télécharger la présentation

Hash Functions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hash Functions Andy Wang Data Structures, Algorithms, and Generic Programming

  2. Introduction • Hash function • Maps keys to integers (buckets) • Hash(Key) = Integer • Ideally in a random-like manner • Evenly distributed bucket values • Even if the input data is not evenly distributed

  3. An Example • ID Number Generation • Key = your name • Hash(Key) = a number • Not a great hash function… • Two people with the same name will have the same number…

  4. Simple Hash Functions • Assumptions: • K: an unsigned 32-bit integer • M: the number of buckets (the number of entries in a hash table) • Goal: • If a bit is changed in K, all bits are equally likely to change for Hash(K)

  5. A Simple Hash Function… • What if K = M? • Hash(K) = K • What is wrong? • Your student ID = SSN • I can’t use your SSN to post your grades…

  6. Another Simple Function • If K > M • Hash(K) = K % M • What is wrong? • Suppose M = 4, K = 2, 4, 6, 8 • K % M = 2, 0, 2, 0

  7. Yet Another Simple Function • If K > P, P = prime number • Hash(K) = K % P • Suppose P = 3, K = 2, 4, 6, 8 • K % P = 2, 1, 0, 3 • More uniform distribution…but still problematic for other cases

  8. More on Prime Numbers • K > P1 > P2, P1 and P2 are prime numbers • Hash(K) = (K % P1) % P2 • Suppose P1 = 5, P2 = 3, K = 2, 4, 6, 8, 10 • (K % 5) = 2, 4, 1, 3, 0 • (K % 5) % 3 = 2, 1, 1, 0, 0 • Still uniform distribution

  9. Polynomial Functions • If K > P, P = prime number • Hash(K) = K(K + 3) % P • Slightly better than pure modulo functions

  10. How About… • Hash(K) = rand() • What is wrong? • Not repeatable

  11. How About… • K > P, P = prime number • Hash(K) = rand(K) % P • Better randomness • Can be expensive to compute random numbers

  12. Pre-generated Randomness • Two prime numbers: P1 and P2 • K > P1 and K > P2 • A table R[P1], with R[i] pre-initialized to rand(i) % P2 • Hash(K) = R[K % P1] • Slight Problem: Possible duplicate mapping

  13. To Avoid Duplicate Mapping… • Two prime numbers: P1 and P2 • K > P1 and K > P2 • A table R[P1], with R[i] pre-initialized to unique random numbers • Hash(K) = R[K % P1]

  14. An Example • K = 0…232, P1 = 3, P2 = 5 • R[3] = {0, 4, 1} • Hash(K) = R[K % 3]

  15. Hashing a Sequence of Keys • K = {K1, K2, …, Kn) • E.g., Hash(“test”) = 98157 • Design Principles • Use the entire key • Use the ordering information • Use pre-generated randomness

  16. Use the Entire Key unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ Key[j] } return hash; } • Problem: Hash(“ab”) == Hash(“ba”)

  17. Use the Ordering Information unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ Key[j] hash = /* hash with some shiftings */ } return hash; } • Problem: H(short keys) will not perturb all 32-bits (clustering)

  18. Use Pre-generated Randomness unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ R[Key[j]] hash = /* hash with some shiftings */ } return hash; }

  19. CRC Variant • Do 5-bit circular shift of hash • XOR hash and K[j] … for (…) { highorder = hash & 0xf8000000; hash = hash << 5; hash = hash ^ (highorder >> 27) hash = hash ^ K[j]; } …

  20. CRC Variant + For long keys, all 32-bits are exercised + More randomness toward lower bits - Not all bits are changed for short keys

  21. BUZ Hash • Set up an array R to store precomputed random numbers … for (…) { highorder = hash & 0x80000000; hash = hash << 1; hash = hash ^ (highorder >> 31) hash = hash ^ R[K[j]]; } …

  22. References • Aho, Sethi, and Ullman. Compilers: Principles, Techniques, and Tools, 1986. • Cormen, Leiserson, River. Introduction to Algorithms, 1990 • Knuth. The Art of Computer Programming, 1973 • Kuenning. Hash Functions, 2003.

More Related