1 / 19

Multiple Choice Hash Tables with Moves on Deletes and Inserts

Multiple Choice Hash Tables with Moves on Deletes and Inserts. Adam Kirsch Michael Mitzenmacher. Hashing : Modern Perspective. For many situations (e.g., hardware for routers) multiple choice hash tables are state-of-the-art. Each item gets d possible hash locations, placed in one.

crevan
Télécharger la présentation

Multiple Choice Hash Tables with Moves on Deletes and Inserts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Choice Hash Tables with Moves on Deletes and Inserts Adam Kirsch Michael Mitzenmacher

  2. Hashing : Modern Perspective • For many situations (e.g., hardware for routers) multiple choice hash tables are state-of-the-art. • Each item gets d possible hash locations, placed in one. • Moving items among choices (e.g., cuckoo hashing) greatly improves space utilization. • Only cost : may take many moves per insert.

  3. Previously • Schemes that move at most 1 item per insertion. • Limit cost of cuckoo hashing. • Schemes that batch move operations in a queue. • Amortize cost of cuckoo hashing. • Using content addressable memories (CAMs) to reduce chance of overflow. • Small CAMs yield big gains.

  4. Contributions • Consider potential of moving items on deletions. • Focus on one move per deletion/insertion. • Examine alternative approach using weaker hashing from [KTC, Peacock Hashing]. • Analyze limits of performance.

  5. Multilevel Hash Table [BK90] • Use a multilevel hash table (MHT) • Can store n elements with d = log log n + O(1) levels in O(n) space with high probability • Example with d = 4 hash functions Level 1 2 x 3 Skew: more elements placed by early hash functions (double exponential decay) 4

  6. Second Chance (SC) Scheme • Standard MHT fills from top down • elements cascade from table to table. • We try to slow cascade at every step. x Standard MHT Insertion

  7. Second Chance (SC) Scheme • Standard MHT fills from top down • elements cascade from table to table. • We try to slow cascade at every step. x

  8. Second Chance (SC) Scheme • Standard MHT fills from top down • elements cascade from table to table. • We try to slow cascade at every step. x

  9. CAMs • Last few collisions hard to stop. • Can waste lots of space on few items. • Solution : content addressable memory. • CAMs fully asociative. • Hold small numbers of items.

  10. Moves on Deletions • Harder to manage. • What item to move up? Level 1 2 x 3 4

  11. Hint-Based Approach • Each cell stores hint for where an item to move on delete is held. • Hints can be kept fairly small. • About log n bits. • Various hint approaches possible. • We found “replace hint on any collision” works well. • May depend on item lifetime distribution, etc. • One move, recursive move variations.

  12. Simulation Data • No current method of analysis for hints. • Use simulations. 10,000 trials per data point. • MHT levels decreasing in size by factor of 2. Plus small CAM. • With n items, top level has size n. • Space usage just above 50%. • Load table to n elements, alternate inserts/deletes for 218 steps. • Exponentially distributed lifetimes. • Goal : how many hash functions needed?

  13. Simulation Results

  14. Lessons from Simulations • No moves very weak. • Second Chance (move on insert) more powerful than hint-based move on delete. • But the two combine well. • Four hash functions: better than 50% load, small CAM.

  15. Alternative : Weak Hashes • To avoid hints, overflow at each bucket splits to two buckets at next level. • Each bucket receives from four buckets. • Less spreading of items, but know where to look on deletes. • Conjecture : loss of randomness implies weak performance.

  16. Picturing Weak Hashes

  17. Two Idealized Schemes • Each bucket holds random item, splits rest. • Each bucket counts items passed to bucket A and bucket B at next level, greedily holds item from bucket with larger count. • Assume invariants kept over insertions/deletions at all times. • Can be analyzed recursively level by level. • Get distribution of bucket loads at each level. • Obtain average case peformance.

  18. Results

  19. Conclusions • Weak hashes, based on buckets, much less effective than hints. • Even under optimistic assumptions. • One move approaches effective. • Move on insert/delete complement each other. • Need methods for analysis. • Challenging dependencies; hard to get exact numbers.

More Related