Searching / Hashing
140 likes | 242 Vues
Learn about searching algorithms like Sequential and Binary Search, Hashing basics, resolving collisions, handling linear probing and chaining, hash functions, and proper array sizes for minimizing space wastage.
Searching / Hashing
E N D
Presentation Transcript
Big-O of Search Algorithms • Sequential Search - O(n) • unsorted list in an array (did not do this term) • linked list, even if sorted (gradelnklist files) • Binary Search - O(log2n) • sorted list in an array (gradelistarray files) • BST if reasonably balanced (tree files) • Hashing - O(1) - constant search time!
Hashing Fundamentals • Records (structs) are stored in an array • Records are not sorted on a particular key • Hash function – calculates the position in the array in which a record is stored based on the key • Ideally, hash function should be one-to-one, i.e., two different keys should not "hash" to the same position
Hashing Fundamentals • To add an item to a hash table, use the hash function to calculate its position and store it directly there • To locate (search for) an item in a hash table, use the hash function to calculate its position and look for it directly there • Unused positions in the hash table need to have a default "empty" value stored
Example 1 Student Records with SSN as Key Hash function: h(ssn) = ssn const int MAXSTUDENTS = 1,000,000,000; struct StudentType { long ssn; string lastname; string firstname; char midinit; float gpa; } StudentType students[MAXSTUDENTS];
Example 1 Pros ? Cons ? a LOT of wasted space this example wastes 99.9999% of array positions • very simple hash function • hash function is one-to-one
Example 2 Student Records with SSN as Key Hash function: h(ssn) = ssn % 10000 const int MAXSTUDENTS = 10,000; struct StudentType { long ssn; string lastname; string firstname; char midinit; float gpa; } StudentType students[MAXSTUDENTS];
Example 2 Pros ? Cons ? still some wasted space, but not as much (only wasting 90% of array positions) hash function is no longer guaranteed to be one-to-one no longer guaranteed O(1) searching • still a relatively simple hash function
Collisions • A collision occurs when two keys hash to the same value • As seen in example 1, a perfect hash function can waste a lot of space, but ... • ... reducing the wasted space can introduce the possibility of collisions! • Want to find optimal array size and hash function to minimize wasted space and minimize collisions
Ways to Handle CollisionsLinear Probing • To insert a record • Start by calculating the hash value • Starting at that position, do sequential search for an empty spot • Store record in empty spot indx = h(insertssn) while (students[indx].ssn != empty value) indx = (indx + 1) % MAXSTUDENTS students[indx] = newstudentrecord
Ways to Handle CollisionsLinear Probing • To locate (search for) a record • Start by calculating the hash value • Starting at that position, do sequential search for the record • If an empty spot is encountered before finding record, record is not there indx = h(searchssn) while (students[indx].ssn != searchssn && students[indx].ssn != empty value) indx = (indx + 1) % MAXSTUDENTS if (students[indx].ssn == searchssn ) found student with searchssn else no student in table with searchssn
Ways to Handle CollisionsChaining • Have each element in the array be the head pointer to a linked list of records whose keys hash to the same value • Slightly better than linear probing - limits the length of the sequential search required once collisions start to occur • Requires more storage than linear probing even if same table size is used because of space required for pointers
Possible Hash Functions • Division Method h(key) = key % MAXSTUDENTS • Folding break key into "pieces" and do calculations with the pieces ex: h(123 45 6321) = 12+34+56+32+1 = 135
For more info • Read pages 647-662 in text • Look at problems 29, 32, 33(only columns for 29 and 32) • Food for thought: Do you think a hash table is a good storage option for a group of records that you want to display in various sorted orders?