110 likes | 123 Vues
A lecture on hashes. By Charles Morris. What is a hash table?. A hash table is an array-like data structure that associates its input (the key) with the associated output (the record, or value). They use a ‘Hashing Function’ to create the association; more on this later.
E N D
A lecture on hashes By Charles Morris
What is a hash table? • A hash table is an array-like data structure that associates its input (the key) with the associated output (the record, or value). • They use a ‘Hashing Function’ to create the association; more on this later. • Hashes were first known as “Associative Arrays” (and you may think of them as so); but using seven syllables is tiring.
Why a hash? • Hashes have many distinct benefits. • Hashes are designed so that you may find a variable without knowing its location. • Computational complexity for lookup varies; Almost always O(1) or O(2), but in the case of `c` collisions, (where c <= n) it is O(c) (like searching an array).
What is a ‘hash function’? • A hash function is simply a function that generates a fingerprint based on it’s input. • Hashing functions are used in many fields, in cryptography one-way hash functions are used to create a small pseudo-random checksum out of (normally) much larger data. This checksum is used for authentication and secure network transmissions, amongst other applications.
Hash functions and Collisions • In a one-way cryptographic hash function, the amount of collisions is not as important as the randomness of the collisions. These functions try to minimize the computational feasibility of finding a key ‘j’ such that j = f(k); where k is the original key.
Hash functions and Collisions • In a hash function used to store data in a hash table, collisions should be minimized as much as possible; as every time a collision happens for key ‘j’ where j = f(k); it increases the O(lookup of f(k)). • This assumes that your hash table watches for collisions, if it doesn’t, the old value will be trampled with the new value.
Collision mitigation • As was stated, if two keys ‘k’ and ‘j’ both hash to the same index ( f(j) == f(k) ), this causes a collision. • Collisions are avoided by using a good hashing algorithm, however they always happen to some degree when a hash function is given enough data to run on.
Collision resolution • Chaining • Instead of one value being at the location f(k), there is a chain (maybe a linked-list) of values. • Linear or Quadratic Probing is a similar solution, where space is reserved at certain locations. • Double hashing • Double hashing requires another hash computation, O(2n), however the records become so sparse that collisions become very rare.
Hash function example • The hashing function used in Perl 5.005: • (close relative of the popular ‘djb2’ algorithm) // (Defined by the PERL_HASH macro in hv.h) // ported by Charles Morris (me) for C++ programmers unsigned long hashingfunction( string key ) { unsigned long fingerprint = 0; for( int j = 0; j < key.length(); j++) //for each letter in the string { fingerprint = fingerprint * 33 + (int)key.at(j); //sum } return fingerprint; }
Hash function example • hf(‘abc’) using the previous function fingerprint = 0 //fingerprint = fingerprint * 33 + (int)’a’; fingerprint = 0 * 33 + 97; //fingerprint = 97 //fingerprint = fingerprint * 33 + (int)’b’; fingerprint = (97 * 33) + 98; //fingerprint = 3299 //fingerprint = fingerprint * 33 + (int)’c’; fingerprint = (3299 * 33) + 99; //fingerprint = 108966