Multiple Choice Hash Tables with Moves on Deletes and Inserts

# Multiple Choice Hash Tables with Moves on Deletes and Inserts

Télécharger la présentation

## Multiple Choice Hash Tables with Moves on Deletes and Inserts

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Multiple Choice Hash Tables with Moves on Deletes and Inserts Adam Kirsch Michael Mitzenmacher

2. Hashing : Modern Perspective • For many situations (e.g., hardware for routers) multiple choice hash tables are state-of-the-art. • Each item gets d possible hash locations, placed in one. • Moving items among choices (e.g., cuckoo hashing) greatly improves space utilization. • Only cost : may take many moves per insert.

3. Previously • Schemes that move at most 1 item per insertion. • Limit cost of cuckoo hashing. • Schemes that batch move operations in a queue. • Amortize cost of cuckoo hashing. • Using content addressable memories (CAMs) to reduce chance of overflow. • Small CAMs yield big gains.

4. Contributions • Consider potential of moving items on deletions. • Focus on one move per deletion/insertion. • Examine alternative approach using weaker hashing from [KTC, Peacock Hashing]. • Analyze limits of performance.

5. Multilevel Hash Table [BK90] • Use a multilevel hash table (MHT) • Can store n elements with d = log log n + O(1) levels in O(n) space with high probability • Example with d = 4 hash functions Level 1 2 x 3 Skew: more elements placed by early hash functions (double exponential decay) 4

6. Second Chance (SC) Scheme • Standard MHT fills from top down • elements cascade from table to table. • We try to slow cascade at every step. x Standard MHT Insertion

7. Second Chance (SC) Scheme • Standard MHT fills from top down • elements cascade from table to table. • We try to slow cascade at every step. x

8. Second Chance (SC) Scheme • Standard MHT fills from top down • elements cascade from table to table. • We try to slow cascade at every step. x

9. CAMs • Last few collisions hard to stop. • Can waste lots of space on few items. • Solution : content addressable memory. • CAMs fully asociative. • Hold small numbers of items.

10. Moves on Deletions • Harder to manage. • What item to move up? Level 1 2 x 3 4

11. Hint-Based Approach • Each cell stores hint for where an item to move on delete is held. • Hints can be kept fairly small. • About log n bits. • Various hint approaches possible. • We found “replace hint on any collision” works well. • May depend on item lifetime distribution, etc. • One move, recursive move variations.

12. Simulation Data • No current method of analysis for hints. • Use simulations. 10,000 trials per data point. • MHT levels decreasing in size by factor of 2. Plus small CAM. • With n items, top level has size n. • Space usage just above 50%. • Load table to n elements, alternate inserts/deletes for 218 steps. • Exponentially distributed lifetimes. • Goal : how many hash functions needed?

13. Simulation Results

14. Lessons from Simulations • No moves very weak. • Second Chance (move on insert) more powerful than hint-based move on delete. • But the two combine well. • Four hash functions: better than 50% load, small CAM.

15. Alternative : Weak Hashes • To avoid hints, overflow at each bucket splits to two buckets at next level. • Each bucket receives from four buckets. • Less spreading of items, but know where to look on deletes. • Conjecture : loss of randomness implies weak performance.

16. Picturing Weak Hashes

17. Two Idealized Schemes • Each bucket holds random item, splits rest. • Each bucket counts items passed to bucket A and bucket B at next level, greedily holds item from bucket with larger count. • Assume invariants kept over insertions/deletions at all times. • Can be analyzed recursively level by level. • Get distribution of bucket loads at each level. • Obtain average case peformance.

18. Results

19. Conclusions • Weak hashes, based on buckets, much less effective than hints. • Even under optimistic assumptions. • One move approaches effective. • Move on insert/delete complement each other. • Need methods for analysis. • Challenging dependencies; hard to get exact numbers.