480 likes | 487 Vues
Hash Table. Chapter 12. Hash Table. So far, the best worst-case time for searching is O(log n). Hash tables average search time of O(1). worst case search time of O(n). Learning Objectives. Develop the motivation for hashing. Study hash functions.
E N D
Hash Table Chapter 12
Hash Table • So far, the best worst-case time for searching is O(log n). • Hash tables • average search time of O(1). • worst case search time of O(n).
Learning Objectives • Develop the motivation for hashing. • Study hash functions. • Understand collision resolution and compare and contrast various collision resolution schemes. • Summarize the average running times for hashing under various collision resolution schemes. • Explore the java.util.HashMap class.
12.1 Motivation • Let's design a data structure using an array for which the indices could be the keys of entries. • Suppose we wanted to store the keys 1, 3, 5, 8, 10, with a guaranteed one-step access to any of these.
12.1 Motivation • The space consumption does not depend on the actual number of entries stored. • It depends on the range of keys. • What if we wanted to store strings? • For each string, we would first have to compute a numeric key that is equivalent to it. • java.lang.String.hashCode() computes the numeric equivalent (or hashcode) of a string by an arithmetic manipulation involving its individual characters.
12.1 Motivation • Using numeric keys directly as indices is out of the question for most applications. • There isn't enough space
12.2 Hashing • A simple hash function • table size of 10 • h(k) = k mod 10
12.2 Hashing • ear collides with cat at position 4. • There is empty space in the table, and it is up to the collision resolution scheme to find an appropriate position for this string. • A better mapping function • For any hash function one could devise, there are always hashcodes that could force the mapping function to be ineffective by generating lots of collisions.
12.3 Collision Resolution • There are two ways to resolve collisions. • open addressing • Find another location for the colliding key within the hash table. • closed addressing • store all keys that hash to the same location in a data structure that “hangs off” that location.
12.3.1 Linear Probing • As more and more entries are hashed into the table, they tend to form clusters that get bigger and bigger. • The number of probes on collisions gradually increases, thus slowing down the hash time to a crawl.
12.3.1 Linear Probing • Insert "cat", "ear", "sad", and "aid"
12.3.1 Linear Probing • Clustering is the downfall of linear probing, so we need to look to another method of collision resolution that avoids clustering.
12.3.2 Quadratic Probing • Avoids Clustering • When the probing stops with a failure to find an empty spot, as many as half the locations of the table may still be unoccupied. • A hash to 2,3,6,0,7, and 5 are endlessly repeated, and an insertion is not done, even though half the table is empty.
12.3.2 Quadratic Probing • For any given prime N, once a location is examined twice, all locations that are examined thereafter are also ones that have been already examined.
12.3.3 Chaining • If a collision occurs at location i of the hash table, it simply adds the colliding entry to a linked list that is built at that location.
Running times • We assume that the hashing process itself (hashcode and mapping) takes O(1). • Running time of insertion is determined by the collision resolution scheme.
12.4 The java.util.HashMap Class • Consider a university-wide database that stores student records. • Every student is assigned a unique id (key), with which is associated several pieces of information such as name, address, credits, gpa, etc. • These pieces of information constitute the value.
12.4 The java.util.HashMap Class • A StudentInfo dictionary that stores (id, info) pairs for all the students enrolled in the university. • The operations corresponding to this relationship can be found in hava.util.Map<K,V>
12.4 The java.util.HashMap Class • The Map interface also provides operations to enumerate all the keys, enumerate all the values, get the size of the dictionary, check whether the dictionary is empty, and so on. • The java.util.HashMap implements the dictionary abstraction as specified by the java.util.Map interface. It resolves collisions using chaining.
12.4.1 Table and Load Factor • When the no-arg constructor is used • Default initial capacity 16 • Default load factor of 0.75. • The table size is defined as the actual number of key-value mappings in the has table.
12.4.1 Table and Load Factor • We can choose an initial capacity • Only uses capacities that are powers of 2. • 101 becomes 128
12.4.1 Table and Load Factor • An initial capacity of 128.
12.4.2 Storage of Entries • Relevant fields in the HashMap class. • threshold is the size threshold • Product of the capacity and the threshold load factor (N* t)
12.4.2 Storage of Entries • Entry[] table sets up an array of chains. • Map.Entry<K,V> is defined inside the Map<K,V> interface. • next holds a reference to the next Entry in its linked list.
12.4.3 Adding an Entry • Example • Name serves as a key to the phone number value.
12.4.3 Adding an Entry • If the key argument is null, a special object, NULL_KEY is returned, otherwise the argument key is returned as is.
12.4.3 Adding an Entry • Example • h = 25 and length = 16 • The binary representation of h and length-1 (11001 and 01111).
12.4.3 Adding an Entry • Since length is a power of 2, the binary representation of length will be 100...0 with k zeros. • Any h is expressible as 2c * k + r. • r is a result of the bit-wise and, since the 2c * k part is a higher order bit that will be zeroed out in the process.
12.4.3 Adding an Entry • The if statement triggers a rehashing process if the size is equal to or greater than the threshold.
12.5 Quadratic Probing: Repetition of Probe Locations • Quadratic probing only examines N/2 locations of the table before starting to repeat locations. • Suppose a key is hashed to location h, where there is a collision. • Following locations are examined.
12.5 Quadratic Probing: Repetition of Probe Locations • If two different probes (i and j) end up at the same location?
12.5 Quadratic Probing: Repetition of Probe Locations • Since N is a prime number, it must divide one of the factors (i + j) or (i - j). • N divides (i - j) only when at least N probes have been made already. • N divides (i + j) when (i + j = N), at the very least. • j = N - i
12.6 Summary • A hash table implements the dictionary operations of insert, search, and delete on (key, value) pairs. • Given a key, a hash function for a given hash table computes an index into the table as a function of the key by first obtaining a numeric hashcode, and then mapping this hashcode to a table location.
12.6 Summary • When a new key hashes to a location in the hash table that is already occupied, it is said to collide with the occupying key. • Collision resolution is the process used upon collision to determine an unoccupied location in the hash table where the colliding key may be inserted. • In searching for a key, the same hash function and collision resolution scheme must be used as for its insertion.
12.6 Summary • A good hash function must be O(1) time and must distribute entries uniformly over the hash table. • Open addressing relocates a colliding entry in the hash table itself. Closed addressing stores all entries that hash to a location, in a data structure that “hangs off” that location. • Linear probing and quadratic probing are instances of open addressing, while chaining is an instance of closed addressing.
12.6 Summary • Linear probing leads to clustering of entries with the clusters becoming increasingly larger as more and more collisions occur. Clustering degrades performance significantly. • Quadratic probing attempts to reduce clustering. On the other hand, quadratic probing may leave as many as half the hash table empty while reporting failure to insert a new entry.
12.6 Summary • Chaining is the simplest way to resolve collisions and also results in better performance than linear probing or quadratic probing. • The worst-case search time for linear probing, quadratic probing, and chaining is O(n). • The load factor of a hash table is the ratio of the number of keys, n, to the capacity, N.
12.6 Summary • The average performance of chaining depends on the load factor. For a perfect hash function that always distributes keys uniformly, the average search time for chaining is O(1).