270 likes | 405 Vues
This guide provides an in-depth exploration of hashing techniques in C++11, focusing on hash functors, their instantiation, and usage. We delve into different data structures such as hash tables, including closed addressing through chaining with linked lists, and examine their efficiency in terms of time complexity and load factors. Key concepts include the importance of immutability in hashed values, effective use of bitwise operations, and strategies for preventing collisions. This comprehensive resource caters to developers looking to optimize their hashing strategies.
E N D
std::Hash Functors • C++11 STL includes hashfunctors • Instantiate and use as function:
std::Hash Functors • C++11 STL includes hashfunctors • Instantiate and use as function: • One line version:
Other Types • How do we hash: • Point? • Employee? • BitmapImage?
Other Types • Cover as many bits as possible
Other Types • Cover as many bits as possible • Combine all values that vary • "John Smith" K100203 vs "John Smith" K923424
Other Types • Cover as many bits as possible • Combine all values that vary • "John Smith" K100203 vs "John Smith" K923424 • Try to make the lowest bits most random • 2013/05/28 day << 20 + month << 10 + yearyear << 20 + month << 10 + day
Bitwise XOR • Bitwise XOR : ^ • combines binary values, preserves entropy 0101 ^ 1111 = 1010 0101 ^ 0000 = 0101 0101 ^ 1011 = 0001
Other Types • Uses existing hash functions: • Combine with bitwise xor
Other Types • Use bit shifts to spread out values if needed
Hashing Danger • Person p1:"John Smith" • Say hash code forJohn Smith is17…
Hashing Danger • Person p1:"John Smith" • Say hash code forJohn Smith is17… p1.firstName = "Bob"
Hashing Danger • Person p1:"John Smith" • Say hash code forJohn Smith is17… p1.firstName = "Bob" hash(p1) just changedwon't find p1!
Hashing Danger • NEVER modify something being used as a hashed value in hash table!!! • Remove, modify, reinsertor • Use immutable values for hashing
Probing Review • Probing Issues: • Clusters • Extra work proportional to 1/(1-)
Chaining • Chaining (Closed Addressing) :Each bucket can hold multiple values
Chaining • Chaining (Closed Addressing) :Each bucket can hold multiple values • Implementation • Linked List • Holds a few/zero items efficiently • Time efficiency not a big concern
IntHashSet Storage = array of std::list
IntHashSet • Contains: • Find right linked list • Search it
IntHashSet • Insert: • If not there • Find right list and add value
IntHashSet • Remove: • Find right list • Look for item in list • If found, remove
Efficiency • Avg time proportional to load factor • O() = O()
Efficiency • Avg time proportional to load factor • O() = O() • If k is constant, technically O(n) • Massive constant divisor • If k grows proportionally with n = O(1)
Real World • Hash table grows when load factor too large • Cost of all ops O(1) • Insert is amortized O(1) • Cache use oftendetermining factor
But • No natural ordering
Ordered - O(1) • Space vs Time trade offs • Hybrid/Duplicative representations
HashMap • Map • Key/Value pairsJohn Smith521-1234 • HashMap • Identity determined by key • Only hash key • Value stored with key in table