COMP 171 Data Structures and Algorithms

COMP 171Data Structures and Algorithms Tutorial 10 Hash Tables

Data Dictionary • A data structure that supports: • Insert • Search • Delete • Examples: • Binary Search Tree • Red Black Tree • B Tree • Link List

Hash Table • An effective data dictionary • Worst Case: Θ(n) time • Under assumptions: O(1) time • Generalization of an array • Size proportional to the number of keys actually stored • Array index is computed from the key

Direct Address Tables • Universe U: • a set that contains all the possible keys • Table has |U| slots. • Each key in U is mapped into one unique entry in the Table • Insert, delete and search takes O(1) time • Works well when U is small • If U is large, impractical

Hash Function • Assume Hash Table has m slots • Hash function h is used to compute slot from the key k • h maps U into the slots of a hash table • h: U → {0, 1, …, m-1} • |U| > m, at least 2 keys will have the same hash value, collision • “Good” hash function can minimize the number of collision

If the keys are not natural number: • Interpret them as natural number using suitable radix notation • Example: character string into radix-128 integer • Division method • h(k) = k mod m • m is usually a prime • Avoid m too close to an exact power of 2 • Ex 11.3-3 • Choose m = 2p-1 and k is a character string interpreted in radix 2p. Show that if x can be derived from y by permuting its characters, then h(x) = h(y).

Multiplication method • h(k) = m ( k A mod 1), 0 < A < 1 • Value m is not critical • Usually choose m to be power of 2 • It works better with some values of A • Eg. (√5 – 1 ) / 2

Separate Chaining • Put all the elements that hash to the same slot in a link list • Element is inserted into the head of the link list • Worst case insertion: O(1) • Worst case search: O(n) • Worst case deletion: O(1)

Given a hash table has m slots that stores n elements, we define load factor α • α= n/m • Simple uniform hashing • Element is equally likely to hash into any of the m slots, independently of where any other element has hashed to • Under Simple uniform hashing • Average time for search = Θ(1+α)

Open Addressing • Each table slot contains either an element or NIL • When collision happens, we successively examine, or probe, the hash table until we find an empty slot to put the key • Deletion is done by marking the slot as “Deleted” but not “NIL” • Hash function h now takes two values: • The key value and the probe number • h(k, i)

Linear Probing • h(k, i) = ( h’(k) + i ) mod m • Initial probe determine the entire probe sequence, there are only m distinct probe sequence • Primary clustering • Quadratic Probing • h(k, i) = ( h’(k) + c1i + c2i ) mod m • there are only m distinct probe sequence • Secondary clustering

Double hashing • Make use of 2 different hash function • h(k, i) = ( h1(k) + ih2(k) ) mod m • ih2(k) should be co-prime with m • Usually take m as a prime number • Probe sequence depends on both has function, so there are m2 probe sequences • Double hashing is better then linear or quadratic probing

Trie • Assumption: • Digital data / radix • Tree structure is used • Insertion is done by creating a path of nodes from the root to the data • Deletion is done by removing the pointer that points to that element • Time Complexity: O(L) • Max # of keys for given L = 128L+1 - 1

Memory Usage: • Node size * Number of node • ((N+1)*pointer size) * (L * n) • N: radix • L: maximum length of the keys • n: number of keys • Improvement 1 • Put all nodes into an array of nodes • Replace pointer by array index • |Array index| = lg (L * n) 

Improvement 2 • Eliminate nodes with a single child • Do Skipping • Label each internal node with its position • Improvement 3 • De La Briandais Tree • Eliminate null pointer in the internal node • Save memory when array are sparsely populated

COMP 171 Data Structures and Algorithms