1 / 9

Hashing

Hashing. Motivation. The primary goal is to locate the desired record in a single access of disk. Sequential search: O(N) B+ trees: O( log k N) Hashing: O(1) In hashing, the key of a record is transformed into an address and the record is stored at that address.

rona
Télécharger la présentation

Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hashing

  2. Motivation • The primary goal is to locate the desired record in a single access of disk. • Sequential search: O(N) • B+ trees: O(logkN) • Hashing: O(1) • In hashing, the key of a record is transformed into an address and the record is stored at that address. • Hash-based indexes are best for equality selections. Cannot support range searches. • Static and dynamic hashing techniques exist. CENG 351

  3. Hash-based Index • Data entries are kept in buckets (an abstract term) • Each bucket is a collection of one primary block and zero or more overflow blocks. • Given a search key value, k, we can find the bucket where the data entry k* is stored as follows: • Use a hash function, denoted by h • The value of h(k) is the address for the desired bucket. h(k) should distribute the search key values uniformly over the collection of buckets CENG 351

  4. Hash Functions • Key mod N: • N is the size of the table, better if it is prime. • Folding: • e.g. 123|456|789: add them and take mod. • Truncation: • e.g. 123456789 map to a table of 1000 addresses by picking 3 digits of the key. • Squaring: • Square the key and then truncate • Radix conversion: • e.g. 1 2 3 4 treat it to be base 11, truncate if necessary. CENG 351

  5. Static Hashing • Primary Area: # primary pages fixed, allocated sequentially, never de-allocated; (say M buckets). • A simple hash function: h(k) = f(k) mod M • Overflow area: disjoint from the primary area. It keeps buckets which hold records whose key maps to a full bucket. • Adding the address of an overflow bucket to a primary area bucket is called chaining. • Collisiondoes not cause a problem as long as there is still room in the mapped bucket. Overflow occurs during insertion when a record is hashed to the bucket that is already full. CENG 351

  6. Example • Assume f(k) = k. Let M = 5. So, h(k) = k mod 5 • Bucket factor = 3 records. Insert records with keys: 12, 35, 44, 60, 6, 46,57,33,62,17 35 60 6 46 17 12 57 62 33 overflow 44 Primary area CENG 351

  7. # of records in the file # of spaces in primary area Load Factor (Packing density) • To limit the amount of overflow we allocate more space to the primary area than we need (i.e. the primary area will be, say, 70% full) • Load Factor = => Lf = n M * Bkfr CENG 351

  8. Effects of Lf and Bkfr • Performance can be enhanced by the choice of bucket size and load factor. • In general, a smaller load factor means • less overflow and a faster fetch time; • but more wasted space. • A larger Bkfr means • less overflow in general, • but slower fetch. CENG 351

  9. Insertion and Deletion • Insertion: New records are inserted at the end of the chain. • Deletion: Two ways are possible: • Mark the record to be deleted • Consolidate sparse buckets when deleting records. • In the 2nd approach: • When a record is deleted, fill its place with the last record in the chain of the current bucket. • Deallocate the last bucket when it becomes empty. CENG 351

More Related