CS 261

CS 261 – Data Structures Hash Tables Part II: Using Buckets

Hash Tables, Review • Hash tables are similar to Vectors except… • Elements can be indexed by values other than integers • A single position may hold more than one element • Arbitrary values (hash keys) map to integers by means of a hash function • Computing a hash function is usually a two-step process: • Transform the value (or key) to an integer • Map that integer to a valid hash table index • Example: storing names • Compute an integer from a name • Map the integer to an index in a table (i.e., a vector, array, etc.)

Hash Tables Say we’re storing names: Angie Joe Abigail Linda Mark Max Robert John 0Angie, Robert Hash Function 1Linda 2Joe, Max, John 3 4Abigail, Mark

Hash Tables: Resolving Collisions There are two general approaches to resolving collisions: • Open address hashing: if a spot is full, probe for next empty spot • Chaining (or buckets): keep a Linked list at each table entry Today we will look at option 2

Resolving Collisions: Chaining / Buckets Maintain a Collection at each hash table entry: • Chaining/buckets:maintain a linked list (or other collection type data structure, such as an AVL tree) at each table entry 0 Robert Angie Linda 1 2 Max John Joe 3 4 Mark Abigail

Combining arrays and linked lists … struct hashTable { struct link ** table; // array initialized to null pointers int tablesize; int dataCount; // count number of elements … };

Hash table init Void hashTableInit (struct hashTable &ht, int size) { int i; ht->count = 0; ht->table = (struct link **) malloc(size * sizeof(struct link *)); assert(ht->table != 0); ht->tablesize = size; for (i = 0; i < size; i++) ht->table[i] = 0; /* null pointer */ }

Adding a value to a hash table public void add (struct hashTable * ht, EleType newValue) { // find correct bucket, add to list int indx = abs(hashfun(newValue)) % table.length; struct link * newLink = (struct link *) malloc(…) assert(newLink != 0); newLink->value = newValue; newLink->next = ht->table[indx]; ht->table[indx] = newLink; /* add to bucket */ ht->count++; // note: next step: reorganize if load factor > 3.0 } }

Contains test, size • Contains: Find correct bucket, then see if the element is there • Remove: Slightly more tricky, because you only want to decrement the count if the element is actually in the list. • Alternatives: instead of keeping count in the hash table, can call count on each list. What are pro/con for this?

Remove • As with many data structures, remove is the tricky one • In order to remove from a singly linked list, need a pointer to the preceding node • Use the same trick we talked about earlier in the term • Keep a pointer to the “current” node (the one you are checking) and a pointer to the “previous” node (the one right before) • Previous pointer is null at the first link (a special case)

Hash Table Size • Load factor: l = n / m • So, load factor represents the average number of elements at each table entry • Want the load factor to remain small • Can do same trick as open table hashing - if load factor becomes larger than some fixed limit (say, 3.0) then you double the table size # of elements Load factor Size of table

Hash Tables: Algorithmic Complexity • Assumptions: • Time to compute hash function is constant • Chaining uses a linked list • Worst case analysis  All values hash to same position • Best case analysis  Hash function uniformly distributes the values(all buckets have the same number of objects in them) • Find element operation: • Worst case for open addressing  O( ) • Worst case for chaining  O( ) • Best case for open addressing  O( ) • Best case for chaining  O( ) n n  O(log n) if use AVL tree 1 1

Hash Tables: Average Case • Assuming that the hash function distributes elements uniformly (a BIG if) • Then the average case for all operations is O() • So you want to try and keep the load factor relatively small. • You can do this by resizing the table (doubling the size) if the load factor is larger than some fixed limit, say 10 • But that only improves things IF the hash function distributes values uniformly. • What happens if hash value is always zero?

So when should you use hash tables? • Your data values must be objects with good hash functions defined (string, Double) • Or you need to write your own definition of hashCode • Need to know that the values are uniformly distributed • If you can’t guarantee that, then a skip list or AVL tree is often faster

Your turn • Now do the worksheet to implement hash table with buckets • Run down linked list for contains test • Think about how to do remove • Keep track of number of elements • Resize table if load factor is bigger than 3.0 Questions??

CS 261 – Data Structures

CS 261 – Data Structures

Presentation Transcript

CS 240: Data Structures

CS 302 Data Structures

CS 163 – Data Structures

CS-2852 Data Structures

CS-2852 Data Structures

CS 261 – Data Structures

CS 302 Data Structures

CS 261 – Data Structures

CS-2852 Data Structures

CS-2852 Data Structures

CS 302 Data Structures

CS-2852 Data Structures

CS-2852 Data Structures

CS 261 – Data Structures

CS 233 Data Structures

CS 240: Data Structures

CS 2133: Data Structures

CS 261

CS 340 Data Structures

CS-2852 Data Structures

CS 340 Data Structures