370 likes | 485 Vues
Hashtables are abstract data structures that support efficient operations like insert, find, and remove in constant time, O(1). Unlike search trees, they don't require an order relation for the elements. This article explores the underlying principles of hashtables, including direct access tables, hash functions, and collision resolution techniques such as chaining and open addressing. We'll analyze the advantages and drawbacks of methods like linear and quadratic probing, providing examples to illustrate the functioning of hashtables in real-world applications.
E N D
Hashtables • An Abstract data type that supports the following operations: • Insert • Find • Remove • Search trees can be used for the same operations but require an order relation to be defined an logarithmic time. • Hashtables do not require an order relationship on the elements and all operations take O(1) time on average.
Direct Access Tables • Assume that the keys are distinct numbers in the range U = {1,2,3….m}, use an array of size m and place the kth element in the kth index of the array. • O(1) time for all operations • Problem: wasteful for small sets and impractical if m is very large
Hashtables • Main Idea: instead of using the keys themselves as index in the table, use a hash function for mapping keys to indices. • Note U is the set representing all possible keys, it is therefore usually much larger than m.
Simple Uniform Hashing • We assume that we use a hash function that given an key, will hash the key into any slot with equal probability. • We will try to provide some reasonable hash functions later
hash functions • The hash function is responsible to map keys into integers (slot numbers). A good hash function must have the following properties • 1. Easy to evaluate - computing h(x) in O(1) • 2. Uniform distribution over all the table slots • 3. Similar keys will be mapped to different slots
hash functions • The first step is to represent the key as a natural integer number. • For example if S is a String then we can compute the interpret it as an integer value using the formula
Collisions • Mapping keys to indices can cause collisions if to keys are mapped by the hash function to the same index • Solutions • Chaining • Open addressing
Collision resolution - Chaining • All keys that have the same hash value are placed in a linked list • Insertion can be done at the beginning of the list in O(1) time • Searching is proportional to the length of the list
Collision resolution by chaining • Let h be a hash table of 9 slots and h(k) = k mod 9, insert the elements : 6, 43, 23, 62, 1, 13, 34, 55, 25 h(6) = 6 mod 9 = 6 h(43) = 43 mod 9 = 7 h(23) = 23 mod 9 = 5 h(62) = 62 mod 9 = 8 h(1) = 1 mod 9 = 1 h(13) = 13 mod 9 = 4 h(34) = 34 mod 9 = 7 h(55) = 55 mod 9 = 1 h(25) = 25 mod 9 = 7
Analysis • The load factor of a hashtable is defined by the number of elements stored in the table divided by the number of slots • An search will take under the assumption of uniform hashing
Division method • An appropriate hash function for a hashtable that uses chaining is the division method. • Powers of 10 and 2 should be avoided • Good values are primes not close to powers of 2
Open Addressing • Each element occupies a single slot in the hashtable. No chaining is done • To insert an element, we probe the table according to the hash function until an empty slot is found. • The hash function is now a function of both the key and the number of attempts in the insertion process
Hash Insert • HashInsert (T,k) { int i; for (i = 0; i < m; i++) { j = h(k,i) if (T[j] == null) break; } if (i < m) T[j] = k else hashtable overflow }
Hash Search • HashSearch (T,k) { int i; for (int i = 0; i < m; i++) { j = h(k,i) if (T[j] == null) return not found else if (T[j] ==k) return j } }
Linear probing • Using linear probing the hash function uses an ordinary hash function h’, such as a function using the division method, and turns it into: • If a slot is occupied, we try the subsequent slot, etc., thus the initial slot determines the probing sequence for insertion and search.
Linear Probing • Easy to implement but suffers from primary clustering. • The probability of probing into a slot following an occupied slot is greater than the probability of any other slot.
Linear Probing • Given a hash function h’, the linear probing scheme is simply
Exercise • You are given a hash table h with 11 slots. Demonstrate inserting the following elements using linear probing and a hash function h(k) = k mod m • 10,22,31,4,15,28,17,88,59
Solution • h(10,0) = (10mod11 + 0) mod 11 = 10 • h(22,0) = (22mod11 + 0) mod 11 = 0 • h(31,0) = (31mod11 + 0) mod 11 = 9 • h(4,0) = (4mod11 + 0) mod 11 = 4 • h(15,0) = (15mod11 + 0) mod 11 = 4 • h(15,1) = (15mod11 + 0) mod 11 = 5 • h(28,0) = (28mod11 +1) mod 11 = 6 • h(17,0) = (17mod11 + 0) mod 11 = 6 • h(17,1) = (17mod11 + 1) mod 11 = 7 • h(88,0) = (88mod11 + 0) mod 11 = 10 • h(88,1) = (88mod11 +1) mod 11 = 1 • h(59,0) = (59mod11 + 0) mod 11 = 4 • h(59,1) = (59mod11 + 1) mod 11 = 5 • h(59,2) = (59mod11 + 2) mod 11 = 6 • h(59,3) = (59mod11 + 3) mod 11 = 7 • h(59,4) = (59mod11 + 4) mod 11 = 8
Quadric Probing • Using quadratic probing the has function again uses an initial hash function h’, and is now • Choosing a subsequent slot once a slot is full depends on the probe number i. • Quadric probing involves a secondary form of clustering since only the initial probe determines the entire probing sequence,
Quadric Probing • Given a hash function h’ quadric probing is done by:
Example • You are given a hash table h with 11 slots. Demonstrate inserting the following elements using quadric probing and a hash function • 10,22,31,4,15,28,17,88,59
h(10,0) = (10mod11 + 0) mod 11 = 10 • h(22,0) = (22mod11 + 0) mod 11 = 0 • h(31,0) = (31mod11 + 0) mod 11 = 9 • h(4,0) = (4mod11 + 0) mod 11 = 4 • h(15,0) = (15mod11 + 0) mod 11 = 4 • h(15,1) = (15mod11 + 1 + 3) mod 11 = 8 • h(28,0) = (28mod11 +1) mod 11 = 6 • h(17,0) = (17mod11 + 0) mod 11 = 6 • h(17,1) = (17mod11 + 1 + 3) mod 11 = 10 • h(17,2) = (17mod11 + 2 + 12) mod 11 = 9 • h(17,3) = (17mod11 + 3 + 27) mod 11 = 3 • h(88,0) = (88mod11 + 0) mod 11 = 0 • h(88,1) = (88mod11 + 1 + 3) mod 11 = 4 • h(88,2) = (88mod11 + 2 + 12) mod 11 = 3 • h(88,3) = (88mod11+ 3+ 27) mod 11 = 8 • h(88,4) = (88mod11+ 4+ 48) mod 11 = 8 • h(88,5) = (88mod11+ 5+ 75) mod 11 = 3 • h(88,6) = (88mod11+ 6+ 108) mod 11 = 4 • h(88,7) = (88mod11+ 7+ 147) mod 11 = 0 • h(88,8) = (88mod11+ 8+ 192) mod 11 = 2 • h(59,0) = (59mod11 + 0) mod 11 = 4 • h(59,1) = (59mod11 + 1 + 3) mod 11 = 8 • h(59,2) = (59mod11 + 1 + 12) mod 11 = 7
Double Hashing • Given two hash functions • Problem should not have any common divisors.
Double Hashing • Example 1: select m to be a power of 2, and design to produce odd numbers. • Example 2: select m to be prime, and m’ to be m-1.
Analysis • In open addressing the load factor can not be more than 1. • Insertion and unsuccessful searching requires at most attempts • A successful search will take at most
Analysis • When the table is 50% full, searching will require 1.387 probes on average • When the table is 90% full, searching will require 2.599 probes on average
Problems with open addressing • If an element is deleted, we can not simply remove the element, since later search operations may fail. Rehashing will ruin the running time • Solution: Use a DELETED node.
Rehashing • If we do not know the size of the elements in advance, we use a technique similar to the one used in vectors. Once the load factor reaches some predefined threshold, rehash the data into a larger hashtable.
Example • Given a set S of unique integers and a number z, find such that x+y = z • An efficient worst case algorithm • An efficient average case algorithm
An efficient worst case algorithm • 1. Sort all elements in S - . • 2. For every x in S we search for z-x (y) in S using binary search – Total of O(nlogn)
An efficient average case algorithm • 1. We use a hash table where m is of order n for all we execute insert(x) • 2. For all we execute search(z-x) Total - average case Total - worst case
Example • Given a set S of sortable items, we are asked if all items in S are unique. • 1. Sort the elements of S. • 2. Iterate on the elements of S searching for subsequent equal values. • Execution time
Example • 1. Use a hash table were m is of order n. for all we execute insert(x). We modify the insert operation to signal if x already exists in the table. (every insert includes a search operation) • Execution time - average case
Java hashcode • Each java object has a method public int hashcode, which is defined in class Object, and is supported for the purposes of hashtables and hashmaps. • The default implementation returns a unique number that is based on the memory location of the object. • If two objects are equal they must have the same hashcode
Java hashcode • It is not required that distinct objects will have distinct hashcodes, but it will improve the performance of the hashtables. • Can the hashcode of an object change throughout it’s life cycle?