Hashtables

Hashtables

Hashtables • An Abstract data type that supports the following operations: • Insert • Find • Remove • Search trees can be used for the same operations but require an order relation to be defined an logarithmic time. • Hashtables do not require an order relationship on the elements and all operations take O(1) time on average.

Direct Access Tables • Assume that the keys are distinct numbers in the range U = {1,2,3….m}, use an array of size m and place the kth element in the kth index of the array. • O(1) time for all operations • Problem: wasteful for small sets and impractical if m is very large

Hashtables • Main Idea: instead of using the keys themselves as index in the table, use a hash function for mapping keys to indices. • Note U is the set representing all possible keys, it is therefore usually much larger than m.

Simple Uniform Hashing • We assume that we use a hash function that given an key, will hash the key into any slot with equal probability. • We will try to provide some reasonable hash functions later

hash functions • The hash function is responsible to map keys into integers (slot numbers). A good hash function must have the following properties • 1. Easy to evaluate - computing h(x) in O(1) • 2. Uniform distribution over all the table slots • 3. Similar keys will be mapped to different slots

hash functions • The first step is to represent the key as a natural integer number. • For example if S is a String then we can compute the interpret it as an integer value using the formula

Collisions • Mapping keys to indices can cause collisions if to keys are mapped by the hash function to the same index • Solutions • Chaining • Open addressing

Collision resolution - Chaining • All keys that have the same hash value are placed in a linked list • Insertion can be done at the beginning of the list in O(1) time • Searching is proportional to the length of the list

Collision resolution by chaining • Let h be a hash table of 9 slots and h(k) = k mod 9, insert the elements : 6, 43, 23, 62, 1, 13, 34, 55, 25 h(6) = 6 mod 9 = 6 h(43) = 43 mod 9 = 7 h(23) = 23 mod 9 = 5 h(62) = 62 mod 9 = 8 h(1) = 1 mod 9 = 1 h(13) = 13 mod 9 = 4 h(34) = 34 mod 9 = 7 h(55) = 55 mod 9 = 1 h(25) = 25 mod 9 = 7

Analysis • The load factor of a hashtable is defined by the number of elements stored in the table divided by the number of slots • An search will take under the assumption of uniform hashing

Division method • An appropriate hash function for a hashtable that uses chaining is the division method. • Powers of 10 and 2 should be avoided • Good values are primes not close to powers of 2

Open Addressing • Each element occupies a single slot in the hashtable. No chaining is done • To insert an element, we probe the table according to the hash function until an empty slot is found. • The hash function is now a function of both the key and the number of attempts in the insertion process

Hash Insert • HashInsert (T,k) { int i; for (i = 0; i < m; i++) { j = h(k,i) if (T[j] == null) break; } if (i < m) T[j] = k else hashtable overflow }

Hash Search • HashSearch (T,k) { int i; for (int i = 0; i < m; i++) { j = h(k,i) if (T[j] == null) return not found else if (T[j] ==k) return j } }

Linear probing • Using linear probing the hash function uses an ordinary hash function h’, such as a function using the division method, and turns it into: • If a slot is occupied, we try the subsequent slot, etc., thus the initial slot determines the probing sequence for insertion and search.

Linear Probing • Easy to implement but suffers from primary clustering. • The probability of probing into a slot following an occupied slot is greater than the probability of any other slot.

Linear Probing • Given a hash function h’, the linear probing scheme is simply

Exercise • You are given a hash table h with 11 slots. Demonstrate inserting the following elements using linear probing and a hash function h(k) = k mod m • 10,22,31,4,15,28,17,88,59

Solution • h(10,0) = (10mod11 + 0) mod 11 = 10 • h(22,0) = (22mod11 + 0) mod 11 = 0 • h(31,0) = (31mod11 + 0) mod 11 = 9 • h(4,0) = (4mod11 + 0) mod 11 = 4 • h(15,0) = (15mod11 + 0) mod 11 = 4 • h(15,1) = (15mod11 + 0) mod 11 = 5 • h(28,0) = (28mod11 +1) mod 11 = 6 • h(17,0) = (17mod11 + 0) mod 11 = 6 • h(17,1) = (17mod11 + 1) mod 11 = 7 • h(88,0) = (88mod11 + 0) mod 11 = 10 • h(88,1) = (88mod11 +1) mod 11 = 1 • h(59,0) = (59mod11 + 0) mod 11 = 4 • h(59,1) = (59mod11 + 1) mod 11 = 5 • h(59,2) = (59mod11 + 2) mod 11 = 6 • h(59,3) = (59mod11 + 3) mod 11 = 7 • h(59,4) = (59mod11 + 4) mod 11 = 8

Quadric Probing • Using quadratic probing the has function again uses an initial hash function h’, and is now • Choosing a subsequent slot once a slot is full depends on the probe number i. • Quadric probing involves a secondary form of clustering since only the initial probe determines the entire probing sequence,

Quadric Probing • Given a hash function h’ quadric probing is done by:

Example • You are given a hash table h with 11 slots. Demonstrate inserting the following elements using quadric probing and a hash function • 10,22,31,4,15,28,17,88,59

h(10,0) = (10mod11 + 0) mod 11 = 10 • h(22,0) = (22mod11 + 0) mod 11 = 0 • h(31,0) = (31mod11 + 0) mod 11 = 9 • h(4,0) = (4mod11 + 0) mod 11 = 4 • h(15,0) = (15mod11 + 0) mod 11 = 4 • h(15,1) = (15mod11 + 1 + 3) mod 11 = 8 • h(28,0) = (28mod11 +1) mod 11 = 6 • h(17,0) = (17mod11 + 0) mod 11 = 6 • h(17,1) = (17mod11 + 1 + 3) mod 11 = 10 • h(17,2) = (17mod11 + 2 + 12) mod 11 = 9 • h(17,3) = (17mod11 + 3 + 27) mod 11 = 3 • h(88,0) = (88mod11 + 0) mod 11 = 0 • h(88,1) = (88mod11 + 1 + 3) mod 11 = 4 • h(88,2) = (88mod11 + 2 + 12) mod 11 = 3 • h(88,3) = (88mod11+ 3+ 27) mod 11 = 8 • h(88,4) = (88mod11+ 4+ 48) mod 11 = 8 • h(88,5) = (88mod11+ 5+ 75) mod 11 = 3 • h(88,6) = (88mod11+ 6+ 108) mod 11 = 4 • h(88,7) = (88mod11+ 7+ 147) mod 11 = 0 • h(88,8) = (88mod11+ 8+ 192) mod 11 = 2 • h(59,0) = (59mod11 + 0) mod 11 = 4 • h(59,1) = (59mod11 + 1 + 3) mod 11 = 8 • h(59,2) = (59mod11 + 1 + 12) mod 11 = 7

Double Hashing • Given two hash functions • Problem should not have any common divisors.

Double Hashing • Example 1: select m to be a power of 2, and design to produce odd numbers. • Example 2: select m to be prime, and m’ to be m-1.

Analysis • In open addressing the load factor can not be more than 1. • Insertion and unsuccessful searching requires at most attempts • A successful search will take at most

Analysis • When the table is 50% full, searching will require 1.387 probes on average • When the table is 90% full, searching will require 2.599 probes on average

Problems with open addressing • If an element is deleted, we can not simply remove the element, since later search operations may fail. Rehashing will ruin the running time • Solution: Use a DELETED node.

Rehashing • If we do not know the size of the elements in advance, we use a technique similar to the one used in vectors. Once the load factor reaches some predefined threshold, rehash the data into a larger hashtable.

Example • Given a set S of unique integers and a number z, find such that x+y = z • An efficient worst case algorithm • An efficient average case algorithm

An efficient worst case algorithm • 1. Sort all elements in S - . • 2. For every x in S we search for z-x (y) in S using binary search – Total of O(nlogn)

An efficient average case algorithm • 1. We use a hash table where m is of order n for all we execute insert(x) • 2. For all we execute search(z-x) Total - average case Total - worst case

Example • Given a set S of sortable items, we are asked if all items in S are unique. • 1. Sort the elements of S. • 2. Iterate on the elements of S searching for subsequent equal values. • Execution time

Example • 1. Use a hash table were m is of order n. for all we execute insert(x). We modify the insert operation to signal if x already exists in the table. (every insert includes a search operation) • Execution time - average case

Java hashcode • Each java object has a method public int hashcode, which is defined in class Object, and is supported for the purposes of hashtables and hashmaps. • The default implementation returns a unique number that is based on the memory location of the object. • If two objects are equal they must have the same hashcode

Java hashcode • It is not required that distinct objects will have distinct hashcodes, but it will improve the performance of the hashtables. • Can the hashcode of an object change throughout it’s life cycle?

Hashtables

Hashtables

Presentation Transcript

JETT 2005 Session 5: Algorithms, Efficiency, Hashing and Hashtables

Hashtables

Hashtables