Understanding Storage by Hashing in Computer Science**
This chapter discusses the implementation of data storage using hashing techniques, focusing on the importance of quick data retrieval. It outlines various storage solutions including unsorted arrays, sorted arrays, and balanced binary search trees (BST), evaluating their pros and cons in terms of time complexity for adding and searching data. The main approach is hashing, which allows for O(1) time complexity for both adding and searching data using keys, such as social security numbers. The chapter also addresses common issues like collisions and their handling through methods such as chaining and probing.
Understanding Storage by Hashing in Computer Science**
E N D
Presentation Transcript
Travis Roe Topics of Computer Science Chapter 43 2-5-2006 Storage by Hashing
Outline • A Problem • A Solution: Hashing • Questions • Q & A
A Problem • Company organizing data using social security numbers, or similar. • Need to add and search through collections of identifiers to find objects.
Potential Solution: 1-1 • One index per potential location • Adding: O(1) • Searching: O(1) • Pros: Very fast, very easy to implement • Cons: Far too much memory, much of it unused
Potential Solution: Unsorted Array • Adding. O(1) • Searching. O(n) • Pros: Easy to implement, fast adding. • Cons: Everything else. O(n) ridiculously slow.
Potential Solution: Sorted Array • Adding. O(n) • Searching. O(lg n) • Pros: Fast searching. • Cons: Slow adding.
Potential Solution: Balanced BST • Adding. O(lg n) • Searching. O(lg n) • Pros: Fast speed for adding, searching. • Cons: Hard to program. Not O(1).
A New Solution: Hashing • Adding. • Use the keys to choose an index. • Place the object at the index. • O(1) • Searching. • Use the keys to find the index. • Get the object from that index. • O(1)
Hashing: An Example 154-38-1287 1287 • Social Security Numbers are the key • The hash-key is based off the last 4 digits of the number ... 987-65-4321 4321 ... 123-45-6789 6789 ... 192-83-7465 7465
Collisions • Expected problems: • Two objects with the same key • Two keys, after hashing, with same value. • Ways to solve the problems: • Chaining • Probing
Collision Handling: Chaining • Every node is a list of some sort. • Whenever there is a collision, put the new item into the list.
Collision Handling: Probing • Whenever there is a collision, go to another location some distance away and attempt to fill that location. • Can cause grouping. • h(k) + a * x; a = 2 123-45-6789 543-21-6789
Reducing Collisions • Use prime numbers for array sizes • Take more space than you'll need • Choose a better hash function
References • Dewdney, A.K. “Storage By Hashing”. The New Turing Omnibus. 1993. Computer Science Press. • “Hash Tables”. Recording My Programming Path. http://qiang-ma.blogspot.com/2007/10/hash-tables.html <Accessed last 2-5-2008> • Standish, Thomas. Data Structures, Algorithms & Software Principles in C. 1995. Addison-Wesley Publishing Company, Inc. pp450-475(ish)
Questions • What are the two methods for handling collisions that were discussed in this lecture? • What is one situation where hash-tables are not useful in?