Storage by Hashing

Travis Roe Topics of Computer Science Chapter 43 2-5-2006 Storage by Hashing

Outline • A Problem • A Solution: Hashing • Questions • Q & A

A Problem • Company organizing data using social security numbers, or similar. • Need to add and search through collections of identifiers to find objects.

Potential Solution: 1-1 • One index per potential location • Adding: O(1) • Searching: O(1) • Pros: Very fast, very easy to implement • Cons: Far too much memory, much of it unused

Potential Solution: Unsorted Array • Adding. O(1) • Searching. O(n) • Pros: Easy to implement, fast adding. • Cons: Everything else. O(n) ridiculously slow.

Potential Solution: Sorted Array • Adding. O(n) • Searching. O(lg n) • Pros: Fast searching. • Cons: Slow adding.

Potential Solution: Balanced BST • Adding. O(lg n) • Searching. O(lg n) • Pros: Fast speed for adding, searching. • Cons: Hard to program. Not O(1).

A New Solution: Hashing • Adding. • Use the keys to choose an index. • Place the object at the index. • O(1) • Searching. • Use the keys to find the index. • Get the object from that index. • O(1)‏

Hashing: An Example 154-38-1287 1287 • Social Security Numbers are the key • The hash-key is based off the last 4 digits of the number ... 987-65-4321 4321 ... 123-45-6789 6789 ... 192-83-7465 7465

Collisions • Expected problems: • Two objects with the same key • Two keys, after hashing, with same value. • Ways to solve the problems: • Chaining • Probing

Collision Handling: Chaining • Every node is a list of some sort. • Whenever there is a collision, put the new item into the list.

Collision Handling: Probing • Whenever there is a collision, go to another location some distance away and attempt to fill that location. • Can cause grouping. • h(k) + a * x; a = 2 123-45-6789 543-21-6789

Reducing Collisions • Use prime numbers for array sizes • Take more space than you'll need • Choose a better hash function

References • Dewdney, A.K. “Storage By Hashing”. The New Turing Omnibus. 1993. Computer Science Press. • “Hash Tables”. Recording My Programming Path. http://qiang-ma.blogspot.com/2007/10/hash-tables.html <Accessed last 2-5-2008> • Standish, Thomas. Data Structures, Algorithms & Software Principles in C. 1995. Addison-Wesley Publishing Company, Inc. pp450-475(ish)‏

Questions • What are the two methods for handling collisions that were discussed in this lecture? • What is one situation where hash-tables are not useful in?

Storage by Hashing

Storage by Hashing

Presentation Transcript

Hashing

Hashing

Hashing

Hashing

Hashing

Disk Storage, Basic File Structures, and Hashing

Disk Storage, Basic File Structures, and Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

HASHING

Hashing

Hashing

Hashing, Hashing Tables

Hashing

Hashing

Hashing