1 / 25

Chapter 8 Hashing

Chapter 8 Hashing . Part II. Introduction. Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Is it possible to perform these operations in O(1) ? . Introduction.

genesis
Télécharger la présentation

Chapter 8 Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 8 Hashing Part II

  2. Introduction • Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Is it possible to perform these operations in O(1) ?

  3. Introduction • If we find a mapping from a key to an index, then we can locate a record quickly according its key and perform random access. S1 S2 S3 … 0 1 2 …

  4. Introduction • This mapping can be illustrated as follows: • Hashing: define a function h so that h(Key) = i, where h is called a hash function. • Two kinds • Static hashing • Dynamic hashing h Key i

  5. 8.2 Static Hashing

  6. Definition • In static hashing, identifiers/keys are stored in table with a fixed size that is called hash table. slot1 slot2 • Bucket: • Each bucket has its own address and is capable of holding a key. Bucket 0 Bucket 1 Bucket 2 h x h(x) Identifier Bucket address Bucket n Hash function

  7. Definition • Slot: Each bucket may consists of s slots to hold synonym (同義字) • i1 and i2 are synonyms if h(i1) = h(i2). • Distinct synonyms enter into the same bucket as long as the bucket has slots available.

  8. Example • Number of buckets: • Number of slots for each bucket: • Define hashing function f(x) f(x) = {i | i is the order of the initial of x}. • A and A2 are synonyms. • GA and GB are synonyms. • If “Doll” enters, it will be put at buckect _______ (according to the hash function). slot1 slot2 A A2 Bucket 0 Bucket 1 Bucket 2 Bucket 3 D GA GB Bucket 25

  9. Overflow and Collision • Overflow occurs when a new identifier is mapped into a full bucket. • Collision occurs when two non-identical identifiers are hashed into the same bucket. • If the number of slot is 1, then overflow and collision occur simutaneously. slot1 slot2 Bucket 0 A A2 If A3 enters bucket 0, A3 collides with A and A2. The bucket overflows as well. Bucket 1 Bucket 2

  10. 8.2.2 Hash Functions • Ideally, we expect to find a hash function that is one-to-one and easy to compute. • The hash function f(x) where f(x) = {i | i is the order of the initial of x}. The hash function can result in a lot of collisions because it only considers the initial character. • Key points:use every character in the identifier as possible.

  11. Common Approaches • Division • Mid-square • Folding • Digit Analysis

  12. Division • The most widely used hash function • The key k is divided by some number D, and the remainder is used as the bucket address. h(k) = k % D • Since the bucket address is from 0 to b-1 if there are b buckets, D is usually selected as the number of buckets.

  13. Selecting The Divisor • When the divisor is an even number, odd integers hash into odd home buckets and even integers into even home buckets. • 20%14 = 6, 30%14 = 2, 8%14 = 8 • 15%14 = 1, 3%14 = 3, 23%14 = 9 • When the divisor is an odd number, odd (even) integers may hash into any home. • 20%15 = 5, 30%15 = 0, 8%15 = 8 • 15%15 = 0, 3%15 = 3, 23%15 = 8 • The bias in the keys does not result in a bias toward either the odd or even home buckets. • Better chance of uniformly distributed home buckets. • So do not use an even divisor.

  14. Selecting The Divisor • Similar biased distribution of home buckets is seen, in practice, when the divisor is a multiple of prime numbers such as 3, 5, 7, … • The effect of each prime divisor p of b decreases as p gets larger. • Ideally, choose b so that it is a prime number. • Alternatively, choose b so that it has no prime factor smaller than 20.

  15. Mid-square • Squaring the key and then using an appropriate number of bits from the middle of the square. • Example: • Suppose a character is represented in 6 bits and the bucket size is 2r. A 1 0 1 3 4 0 0 0 0 0 1 0 1 1 0 1 0 92 92x92=8464 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 r bits

  16. Mid-square • Example • Key = 113586, m =10000, where 9999 is the largest bucket address. • Squaring the key, and then we have 1 2 9 0 1 7 7 9 3 9 6 h(x) = 1779

  17. Folding • The key k is partitioned into several parts, all of the same length. These partitions are then added together to obtain the hash address of k. • Two schemes • Shift folding • Folding at the boundaries P1 P2 P3 P4 P5 1 2 3 2 0 3 2 4 1 1 1 2 2 0

  18. Folding P1 P1 1 2 3 1 2 3 P2 P2 2 0 3 3 0 2 P3 P3 2 4 1 2 4 1 1 1 2 P4 2 1 1 P4 2 0 2 0 P5 P5 6 9 9 8 9 7 Shift folding Folding at the boundaries

  19. Overflow Handling • An overflow occurs when the home bucket for a new pair (key, element) is full. • We may handle overflows by: • Search the hash table in some systematic fashion for a bucket that is not full. • Linear probing (linear open addressing). • Quadratic probing. • Rehashing. • Eliminate overflows by permitting each bucket to keep a list of all pairs for which it is the bucket address. • Array linear list. • Chain.

  20. Linear Probing • Also called linear opening addressing • Search one by one until a empty slot is found. • Procedures: suppose b denotes the bucket size. • Compute h(k). • Examine the hash table buckets in the order ht[h(k)], ht[(h(k)+1)%b],…, ht[(h(k)+j)%b] until one of the following happens: • ht[(h(k)+j)%b] has a pair whose key is k; k is found. • ht[(h(k)+j)%b] is empty; k is not in the table. • Return to ht[h(k)]; the table is full.

  21. 0 4 8 12 16 Linear Probing • divisor = b (number of buckets) = 17. • Bucket address = key % 17. 34 0 45 6 23 7 28 12 29 11 30 33 • Insert pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45

  22. 34 0 45 6 23 7 28 12 29 11 30 33 0 4 8 12 16 Linear Probing Consider: when 51 enters, how many comparisons are required? Linear opening addressing tends to create “cluster”. These clusters become larger as more synonyms enter.

  23. Quadratic Probing • Suppose i is used as the increment. • When overflow occurs, the search is carried out by examining h(x), (h(x)+i2)%b, and (h(x)-i2)%b. • For 1≦i ≦(b-1)/2 and b is a prime number of 4j+3. • For example, b=3, 7, 11,…,43, 59..

  24. Rehashing • If overflow occurs at hi(x), then try hi+1(x). • Use a series of hash function h1, h2, …, hm to find an empty bucket. h1 h2 hm x hm(x)

  25. [0] 0 34 [4] 6 23 7 [8] 11 28 45 [12] 12 29 30 33 [16] Chaining • Disadvantage of linear probing • Comparison of identifiers with different hash values. • Use linked list to connect the identifiers with the same hash value and to increase the capacity of a bucket.

More Related