150 likes | 269 Vues
This tutorial delves into hash tables as an effective data structure supporting insertion, search, and deletion operations. It covers essential concepts such as direct addressing, collision handling, and various hashing techniques like division and multiplication methods. The tutorial also explains strategies for separate chaining and open addressing, along with time complexities for operations. Optimize your understanding of how hash tables enhance data dictionary functionality, manage load factors, and utilize different probing methods. Ideal for students and professionals seeking to master fundamental algorithms.
E N D
COMP 171Data Structures and Algorithms Tutorial 10 Hash Tables
Data Dictionary • A data structure that supports: • Insert • Search • Delete • Examples: • Binary Search Tree • Red Black Tree • B Tree • Link List
Hash Table • An effective data dictionary • Worst Case: Θ(n) time • Under assumptions: O(1) time • Generalization of an array • Size proportional to the number of keys actually stored • Array index is computed from the key
Direct Address Tables • Universe U: • a set that contains all the possible keys • Table has |U| slots. • Each key in U is mapped into one unique entry in the Table • Insert, delete and search takes O(1) time • Works well when U is small • If U is large, impractical
Hash Function • Assume Hash Table has m slots • Hash function h is used to compute slot from the key k • h maps U into the slots of a hash table • h: U → {0, 1, …, m-1} • |U| > m, at least 2 keys will have the same hash value, collision • “Good” hash function can minimize the number of collision
If the keys are not natural number: • Interpret them as natural number using suitable radix notation • Example: character string into radix-128 integer • Division method • h(k) = k mod m • m is usually a prime • Avoid m too close to an exact power of 2 • Ex 11.3-3 • Choose m = 2p-1 and k is a character string interpreted in radix 2p. Show that if x can be derived from y by permuting its characters, then h(x) = h(y).
Multiplication method • h(k) = m ( k A mod 1), 0 < A < 1 • Value m is not critical • Usually choose m to be power of 2 • It works better with some values of A • Eg. (√5 – 1 ) / 2
Separate Chaining • Put all the elements that hash to the same slot in a link list • Element is inserted into the head of the link list • Worst case insertion: O(1) • Worst case search: O(n) • Worst case deletion: O(1)
Given a hash table has m slots that stores n elements, we define load factor α • α= n/m • Simple uniform hashing • Element is equally likely to hash into any of the m slots, independently of where any other element has hashed to • Under Simple uniform hashing • Average time for search = Θ(1+α)
Open Addressing • Each table slot contains either an element or NIL • When collision happens, we successively examine, or probe, the hash table until we find an empty slot to put the key • Deletion is done by marking the slot as “Deleted” but not “NIL” • Hash function h now takes two values: • The key value and the probe number • h(k, i)
Linear Probing • h(k, i) = ( h’(k) + i ) mod m • Initial probe determine the entire probe sequence, there are only m distinct probe sequence • Primary clustering • Quadratic Probing • h(k, i) = ( h’(k) + c1i + c2i ) mod m • there are only m distinct probe sequence • Secondary clustering
Double hashing • Make use of 2 different hash function • h(k, i) = ( h1(k) + ih2(k) ) mod m • ih2(k) should be co-prime with m • Usually take m as a prime number • Probe sequence depends on both has function, so there are m2 probe sequences • Double hashing is better then linear or quadratic probing
Trie • Assumption: • Digital data / radix • Tree structure is used • Insertion is done by creating a path of nodes from the root to the data • Deletion is done by removing the pointer that points to that element • Time Complexity: O(L) • Max # of keys for given L = 128L+1 - 1
Memory Usage: • Node size * Number of node • ((N+1)*pointer size) * (L * n) • N: radix • L: maximum length of the keys • n: number of keys • Improvement 1 • Put all nodes into an array of nodes • Replace pointer by array index • |Array index| = lg (L * n)
Improvement 2 • Eliminate nodes with a single child • Do Skipping • Label each internal node with its position • Improvement 3 • De La Briandais Tree • Eliminate null pointer in the internal node • Save memory when array are sparsely populated