Hash Tables

Hash Tables Briana B. Morrison Adapted from William Collins

Hashing

Sequential Search • Given a vector of integers: v = {12, 15, 18, 3, 76, 9, 14, 33, 51, 44} • What is the best case for sequential search? • O(1) when value is the first element • What is the worst case? • O(n) when value is last element, or value is not in the list • What is the average case? • O(1/2 * n) which is O(n) Hashing

Hashing

Binary Search • Given a vector of integers: v = {3, 9, 12, 14, 15, 18, 33, 44, 51, 76} • What is the best case for binary search? • O(1) when element is the middle element • What is the worst case? • O(log n) when element is first, last, or not in list • What is the average case? • O(log n) Hashing

Hashing

Map vs. Hashmap • What are the differences between a map and a hashmap? • Interface • Efficiency • Applications • Implementation Hashing

Hashing

CONTIGUOUS array?vector?deque? heap? LINKED Linked? list? map? BUT NONE OF THESE WILL GIVE CONSTANT AVERAGE TIME FOR SEARCHES, INSERTIONS AND REMOVALS. Hashing

Hashing

To make these values fit into the table, we need to mod by the table size; i.e., key % 1000. 210 256 816 OOPS! Hashing

Hashing

Hash Codes • Suppose we have a table of size N • A hash code is: • A number in the range 0 to N-1 • We compute the hash code from the key • You can think of this as a “default position” when inserting, or a “position hint” when looking up • A hash function is a way of computing a hash code • Desire: The set of keys should spread evenly over the N values • When two keys have the same hash code: collision Hashing

Hash Functions • A hash function should be quick and easy to compute. • A hash function should achieve an even distribution of the keys that actually occur across the range of indices for both random and non-random data. • Calculation should involve the entire search key. Hashing

Examples of Hash Functions • Usually involves taking the key, chopping it up, mix the pieces together in various ways • Examples: • Truncation – ignore part of key, use the remaining part as the index • Folding – partition the key into several parts and combine the parts in a convenient way (adding, etc.) • After calculating the index, use modular arithmetic. Divide by the size of the index range, and take the remainder as the result Hashing

Example Hash Function Hashing

Devising Hash Functions • Simple functions often produce many collisions • ... but complex functions may not be good either! • It is often an empirical process • Adding letter values in a string: same hash for strings with same letters in different order • Better approach: size_t hash = 0; for (size_t i = 0; i < s.size(); ++i) hash = hash * 31 + s[i]; Hashing

Devising Hash Functions (2) • The String hash is good in that: • Every letter affects the value • The order of the letters affects the value • The values tend to be spread well over the integers Hashing

Devising Hash Functions (3) • Guidelines for good hash functions: • Spread values evenly: as if “random” • Cheap to compute • Generally, number of possible values much greater than table size Hashing

Memory address: We reinterpret the memory address of the key object as an integer Good in general, except for numeric and string keys Integer cast: We reinterpret the bits of the key as an integer Suitable for keys of length less than or equal to the number of bits of the integer type (e.g., char, short, int and float on many machines) Component sum: We partition the bits of the key into components of fixed length (e.g., 16 or 32 bits) and we sum the components (ignoring overflows) Suitable for numeric keys of fixed length greater than or equal to the number of bits of the integer type (e.g., long and double on many machines) Hash Code Maps Hashing

Polynomial accumulation: We partition the bits of the key into a sequence of components of fixed length (e.g., 8, 16 or 32 bits)a0 a1 … an-1 We evaluate the polynomial p(z)= a0+a1 z+a2 z2+ … … +an-1zn-1 at a fixed value z, ignoring overflows Especially suitable for strings (e.g., the choice z =33gives at most 6 collisions on a set of 50,000 English words) Polynomial p(z) can be evaluated in O(n) time using Horner’s rule: The following polynomials are successively computed, each from the previous one in O(1) time p0(z)= an-1 pi(z)= an-i-1 +zpi-1(z) (i =1, 2, …, n -1) We have p(z) = pn-1(z) Hash Code Maps (cont.) Hashing

Hashing

Hash Tables

Hash Tables

Presentation Transcript

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables