Understanding Hashing & Map Interface in C++

CS2 in C++ Peer Instruction Materials by Cynthia Bailey Lee is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.Permissions beyond the scope of this license may be available at http://peerinstruction4cs.org. CS106X – Programming Abstractions in C++ Cynthia Bailey Lee

Today’s Topics: Map Interface, continued • Hashing: collisions • The birthday problem • Hashing: live coding • Tree traversals (time permiting)

Hash collisions • We may need to worry about hash collisions • Hash collision: • Two keys a, b, a≠b, have the same hash code index (i.e. hash(a) = hash(b))

Hash key collisions • We can NOT overwrite the value the way we would if it really were the same key • Use a linked list chain extending from each hash table array entry to allow multiple distinct keys to share a single hash table location

Map Interface: hash-map.h … private: struct node { Key key; Value value; node *next; }; node **buckets; intnumBuckets; int count; int hash(const Key& key) const; }; #include "hash-map-impl.h“

HashMap

HashMap TRUE or FALSE: We don’t actually need to store the keys in the nodes—we only really need the key to find which index, then store the value there • TRUE • FALSE

Hash key collisions & Big-O of HashMap • If there are no collisions, find/add/remove are all O(1)—just compute the key and go! • Two factors for ruining this magical land of instantaneous lookup: • Too-small table (worst case = 1) • Hash function doesn’t produce a good spread intawfulHashFunction(char* input, int size) { return 4; } • Find/add/remove all O(n) worst case http://xkcd.com/221/

One simple, common approach • Hash table (array) size, numBuckets, is some prime number • Keep numBuckets big enough that at most 70% of the slots are filled (resize as needed) • Somehow convert your data to an int and then intarray_index = input % numBuckets; • Can we still get super unlucky? YES • Hashmap analysis of O(1) is “average case” • For a nicely designed hash: 99.9% of the time

The Birthday Problem • Commonly mentioned fact of probability theory: • In a group of n people, what are the odds that 2 people have the same birthday? • (assume birthdays are uniformly distributed across 365 days, which is wrong in a few ways) • For n ≥ 23, it is more likely than not • For n ≥ 57, >99% chance • Means that hash tables almost certainly have at least one collision http://commons.wikimedia.org/wiki/File:Birthday_Paradox.svg Creative Commons share-alike license.

(Live coding of hash-map.h)

Quickly Back to Trees: Tree Traversals!

What does this print? void traverse(Node *node) { if (node != NULL) { cout << node->key << " "; traverse(node->left); traverse(node->right); } } • A B C D E F • A B D E C F • D B E F C A • D E B F C A • Other/none/more A B C D E F

What does this print? void traverse(Node *node) { if (node != NULL) { traverse(node->left); cout << node->key << " "; traverse(node->right); } } • A B C D E F • A B D E C F • D B E F C A • D E B F C A • Other/none/more A B C D E F

How can we get this to print our ABCs in order? A void traverse(Node *node) { if (node != NULL) { cout << node->key << " "; traverse(node->left); traverse(node->right); } } • You can’t—we already tried moving the cout to all 3 possible places. • You can but you use a stack instead of recursion. • You can but you use a queue instead of recursion. • Other/none/more B C D E F

That prints the ABC nodes in order, but notice that our ABC tree isn’t a BST. How do we print BST nodes in order? void traverse(Node *node) { if (node != NULL) { traverse(node->left); traverse(node->right); } } • Use the BFS (queue instead of recursion). • Use the DFS and put cout first. • Use the DFS and put cout between. • Use the DFS and put cout last. • Other/none/more

We can play the same game with non-alpha characters as keys: What does this print? void traverse(Node *node) { if (node != NULL) { traverse(node->left); traverse(node->right); cout<< node->key << " "; } } • * + / 3 4 8 2 • * + 3 4 / 2 • 3+4 * 8 /2 • 34 + 8 2 / * • Other/none/more * + / 3 4 8 2

Remember our old friend:

Practical uses of inorder/preorder/postorder traversals void traverse(Node *node) { if (node != NULL) { traverse(node->left); traverse(node->right); cout<< node->key << " "; } } To calculate the answer: postorder To print this in human-friendly format: inorder * + / 3 4 8 2

Understanding Hashing & Map Interface in C++