330 likes | 461 Vues
Discover the concept of hashing, its implementation in hash tables, key characteristics of good hash functions, and methods like division, middle square, and multiplication for optimal data storage and retrieval.
E N D
Hashing, Hashing Tables Chapter 8
Introduction • Definition: • Key: a key is a field or composite of fields that uniquely identifies an entry in a table.
Example • Table of students in a course sorted by name -------------------------------------------------------------- Name Year Mark -------------------------------------------------------------- Adams, Keith 3 94 Davis, Susan 1 75 Jordan, Ann 1 86 Patterson, Lynn 4 73 Williams, George 1 65
Hashing • The implementation of hash tables is called Hashing. • Hashing is a technique used for performing insertions and finds in constant average time. • Efficient removal of items not required
The General Idea • Array of some fixed size, containing items.
Keys and Hash Functions • Each key is mapped into some number in the range 0 to TableSize-1 and placed in the appropriate cell. • The mapping is called a hash function
Keys and Hash Functions • Characteristics of a good hash function • Avoids collisions • Spread keys evenly in the array • Easy to compute
Avoid Collisions • Ideal situation • Given a set of n<=M distinct keys {k1,k2,…,kn}, the set of hash values {h(k1),h(k2),…,h(kn)} contains no duplicates • We can only try to reduce the likelihood of a collision using knowledge about the keys • E.g. if we know the telephone numbers are all from the same district, so the district number will have little use in our hash function
Spreading Keys Evenly • We need to know the distribution of the keys • An equal number of keys should map into each array position
Ease of Computation • The running time of the hash function should be O(1) (Jumping immediately to the desired record is a direct access approach, much like direct access of data on a disk)
Hashing Methods • We are dealing with integer values first, K=Z • The value of the hash function falls between 0 and M-1
Division Method • The simplest method of hashing an integer • The division method of hashing h(x) = x mod M.
Choice of M • Generally, any M is good • we often choose M to be a prime number
Implementation Unsigned int const M = 1031; // a prime Unsigned int h(unsigned int x) { return x%M; }
Middle Square Method • Avoid division • Making use of the fact that computer does finite-precision integer arithmetic • All arithmetic is done modulo W, where W=2w, w is the word size of the computer • M=2k, W=2w • Meaning: • Multiply x by itself, then shift to the right k bits.
Implementation • unsigned int const k = 10; // M==1024 • unsigned int const w = bitsizeof (unsigned int); • unsigned int h (unsigned int x) • { return (x * x) >> (w - k); }
Multiplication Method • We multiply the key by a
Implementation unsigned int const k = 10; // M==1024 unsigned int const w = bitsizeof (unsigned int); unsigned int const a = 2654435769U; unsigned int h (unsigned int x) { return (x * a) >> (w - k); } }