1 / 40

Data Structures and Algorithms for Information Processing

Data Structures and Algorithms for Information Processing. Lecture 10: Searching II. Outline. One more O/A scheme – Ordered Hashing (Tough Schoolboy problem) Analysis of hashing algorithms Some practical considerations Radix searching. Open vs. Chained Hashing. How big should the table be?

lis
Télécharger la présentation

Data Structures and Algorithms for Information Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Structures and Algorithms for Information Processing Lecture 10: Searching II Lecture 10: Searching

  2. Outline • One more O/A scheme – Ordered Hashing (Tough Schoolboy problem) • Analysis of hashing algorithms • Some practical considerations • Radix searching Lecture 10: Searching

  3. Open vs. Chained Hashing • How big should the table be? • Open addressing can be inconvenient when the number of insertions and deletions is unpredictable - overflow. • Simple solution to overflow: Resize (double) table, rehashing everything into the new table • Use Knuth’s approach and double hashing to avoid clustering. Lecture 10: Searching

  4. Variant: Ordered Hashing • In linear probing, we stop search when we find an empty cell or a record with a key equal to the search key • In ordered hashingwe stop when we find a key less than or equal to the search key (tough schoolboy hashing) Lecture 10: Searching

  5. Tough Schoolboy hashing • 13 chairs in the classroom • Each boy has a preferred seat • Each boy has a jump value • Boys later in the alphabet are bigger Lecture 10: Searching

  6. Class in the morning • Inserts Don prefers 3 jumps 2 Bill prefers 5 jumps 4 Al prefers 3 jumps 6 Joe prefers 3 jumps 4 Lecture 10: Searching

  7. Lecture 10: Searching

  8. Lecture 10: Searching

  9. Lecture 10: Searching

  10. Lecture 10: Searching

  11. Lecture 10: Searching

  12. Lecture 10: Searching

  13. Lecture 10: Searching

  14. Lecture 10: Searching

  15. Searching the classroom • Search for Don, Bill, Al, and Joe • Search for Ken who prefers 3 and jumps 1 Lecture 10: Searching

  16. Variant: Ordered Hashing • This reduces the time of unsuccessful search to about the same as successful search • Useful for applications where we expect to have a large number of unsuccessful searches Lecture 10: Searching

  17. Summary of Basic Searching • Hashing is preferred to binary tree methods in general, since it is faster. • But binary search trees are truly dynamic (no advance info on size needed). • BSTs also give worst case guarantees (hash function could be lousy). • BSTs support more operations — sorting. Lecture 10: Searching

  18. Time Analysis • Open address hashing methods store N records in a table of size M. M > N • The performance of the operations depends on the load factor alpha = N/M • For chained hashing, alpha may be greater than 1. Lecture 10: Searching

  19. Linear Probing • Open address hashing with linear probing requires, on average: 1/2 ( 1 + 1/(1-alpha)^2) operations for an unsuccessful search 1/2 ( 1 + 1/(1-alpha)) operations for a successful search • E.g., for alpha = 2/3 we’ll make 5 probes for an average unsuccessful search, and 2 for a successful search Lecture 10: Searching

  20. Double Hashing • Open address hashing with double hashing requires, on average: 1/(1-alpha) operations for an unsuccessful search -log(1-alpha)/alpha operations for a successful search • E.g., for alpha = 2/3 we’ll make 3 probes for an average unsuccessful search, and 1.65 for a successful search Lecture 10: Searching

  21. Chained Hashing • Chained hashing requires, on average: 1+alpha operations for an unsuccessful search 1+alpha/2 operations for a successful search • E.g., for alpha = 2/3 we’ll make 1.66 probes for an average unsuccessful search, and 1.33 for a successful search Lecture 10: Searching

  22. Time Analysis • These formulas require significant mathematical analysis, which we won’t go into. Lecture 10: Searching

  23. Average Number of Probes Successful Search Lecture 10: Searching

  24. Radix Searching • For many applications, keys can be thought of as numbers • Searching methods that take advantage of digital properties of these keys are called radix searches • Radix searches treat keys as numbers in base M (the radix) and work with individual digits Lecture 10: Searching

  25. Radix Searching • Provide reasonable worst-case performance without complication of balanced trees. • Provide way to handle variable length keys. • Biased data can lead to degenerate data structures with bad performance. Lecture 10: Searching

  26. The Simplest Radix Search • Digital Search Trees — like BSTs but branch according to the key’s bits. • Key comparison replaced by function that accesses the key’s next bit. Lecture 10: Searching

  27. A E S C H R Digital Search Example A 00001 S 10011 E 00101 R 10010 C 00011 H 01000 Lecture 10: Searching

  28. Digital Search • Requires O(log N) comparisons on average • Requires b comparisons in the worst case for a tree built with N random b-bit keys Lecture 10: Searching

  29. Digital Search • Problem: At each node we make a full key comparison — this may be expensive, e.g. very long keys • Solution: store keys only at the leaves, use radix expansion to do intermediate key comparisons Lecture 10: Searching

  30. Radix Tries • Used for Retrieval [sic] • Internal nodes used for branching, external nodes used for final key comparison, and to store data Lecture 10: Searching

  31. Radix Trie Example A 00001 S 10011 E 00101 R 10010 C 00011 H 01000 H E A C S R Lecture 10: Searching

  32. Radix Tries • Left subtree has all keys which have 0 for the leading bit, right subtree has all keys which have 1 for the leading bit • An insert or search requires O(log N) bit comparisons in the average case, and b bit comparisons in the worst case Lecture 10: Searching

  33. Radix Tries • Problem: lots of extra nodes for keys that differ only in low order bits (See R and S nodes in example above) • This is addressed by Patricia trees, which allow “lookahead” to the next relevant bit • Practical Algorithm To Retrieve Information Coded In Alphanumeric (Patricia) • In the slides that follow the entire alphabet would be included in the indexes Lecture 10: Searching

  34. // Insert word K (see Drozdek and Simon – needs work) i=0; p=root; While not inserted if (K[i] == ‘\0’) set end-of-word marker in p to true else if (p.ptrs[K[i]] == null) create leaf containing K and put its address in p.ptrs[K[i]] else if (refernce p.ptrs[k[i]] refers to a leaf) K_L = key in leaf p.ptrs[K[i]]; do create a non-leaf and put its address in p.ptrs[K[i]] p = the new non-leaf; i++; while (K[i] == K_L[i]); create a leaf containing K and put its address in p.ptrs[K[--i]] if (end-of-word K reached) set end-of-word marker in p to true else create leaf containing K_L and put address in p.ptrs[K_L[i]] else p = p.ptrs[K[i++]] Lecture 10: Searching

  35. Empty Radix Trie Insert “ARA” # A E I P R ARA Lecture 10: Searching

  36. # A E I P R ARA # A E I P R P # A E I P R P # A E I P R K_L K ARA AREA Insert “AREA” Lecture 10: Searching

  37. # A E I P R Insert “A” P P # A E I P R A # A E I P R ARA AREA Lecture 10: Searching

  38. # A E I P R # A E I P R # A E I P R # A E I P R # A E I P R # A E I P R PIER EIRE IPA IRE EERIE A # A E I P R # A E I P R ARA # A E I P R ERA ERIE ERE PEER ARE PEAR PER AREA Lecture 10: Searching

  39. A L Radix Trie O ADAM G G E I A N D R LOGGIA LOGGING LOGGED LOGGERHEAD Lecture 10: Searching

  40. A L E N A 5 0 0 Patricia Tree 0 4 ADAM I 5 D R 0 0 LOGGIA LOGGING LOGGERHEAD LOGGED Lecture 10: Searching

More Related