1 / 37

Lecture 11: Why I Like Hash

CSC 213 – Large Scale Programming. Lecture 11: Why I Like Hash. Today’s Goal. Consider what will be important when searching Why search in first place? What is its purpose? What should we expect & handle when searching? What factors matter to our users (and ourselves)?

cassia
Télécharger la présentation

Lecture 11: Why I Like Hash

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 213 – Large Scale Programming Lecture 11: Why I Like Hash

  2. Today’s Goal • Consider what will be important when searching • Why search in first place? What is its purpose? • What should we expect & handle when searching? • What factors matter to our users (and ourselves)? • (Besides bad joke source) What is hashing? • Why important for searching? How can it help? • What are critical factors of good hash function? • Commonly-used hash function example examined

  3. Key Ideas Behind Map • Used to convert the keyinto value • valuescannot share a keyand be in same Map • In searching failure is normal, not exceptional

  4. Entry ADT • Needs 2 pieces: what we have & what we want • First part is the key: data used in search • Item we want is value; the second part of an Entry • Implementations must define 2 methods • key()& value()return appropriate item • Usually includes setValue()but NOTsetKey()

  5. Sequence-Based Map • Sequence’s perspective of Mapthat it holds Positions elements

  6. Sequence-Based Map • Outside view of Map and how it is stored Positions Entrys

  7. Sequence-Based Map • Mapimplementation’s view of data and storage Positions Elements/Entrys

  8. Emergency

  9. Please hold while the machine searches 100,000 records for your location

  10. Map Performance • In all seriousness, can be matter of life-or-death • 911 Operators immediatelyneed addresses • Google’s search performance in TB/s • O(log n) time too slow for these uses • Would love to use arrays • Get O(1) time to add, remove, or lookup data • This HUGE array needs massive RAM purchase

  11. Monster Amounts of RAM • Java requires using int as array index • Limit to int and RAM available in a machine • Integer.MAX_VALUE = 2,147,483,647 • 8,200,000,000 pages in Google’s index (2005) • In US, possible phone numbers = 10,000,000,000 • Must do more for O(1) array usage time

  12. Monster Amounts of RAM • Java requires using int as array index • Limit to int and RAM available in a machine • Integer.MAX_VALUE = 2,147,483,647 • 8,200,000,000 pages in Google’s index (2005) • In US, possible phone numbers = 10,000,000,000 • Must do more for O(1) array usage time • As with all life’s problems we turn to hash

  13. Monster Amounts of RAM • Java requires using int as array index • Limit to int and RAM available in a machine • Integer.MAX_VALUE = 2,147,483,647 • 8,200,000,000 pages in Google’s index (2005) • In US, possible phone numbers = 10,000,000,000 • Must do more for O(1) array usage time • As with all life’s problems we turn to hash

  14. Hashing To The Rescue • Hash function turns keyinto intfrom 0 – N-1 • Result is usable as index for an array • Specific for key’stype; cannot be reused • Store the Entrysin array (“hash table”) • (Great name for shop in Amsterdam, too) • Begin by computing key’s hash value • Result is array index for that Entry • Now is possible to use array for O(1) time!

  15. Hash Table Example • Example shows table of Entry<Long,String> • Simple hash function ish(x) = xmod 10,000 • x is/from Entry’skey • h(x) computes index to use • Always is mod array length • Not all locations used • Holes willappear in array • Empties: set to null-or- use sentinel value

  16. Properties of Good Hash • To really be useful, hash must have properties Reliable Fast Use entire table

  17. Properties of Good Hash • To really be useful, hash must have properties Reliable Fast Use entire table Make good brownies

  18. Reliability of Hash Function • Implement Mapwith a hash table • To use Entry, get key toeasily look up its index • Always computes same indexfor that key

  19. Speed of Hash Function • Hash must be computed on each access • Goal: O(1) efficiency by using an array • Efficiency of array wasted if hash is slow • If O(1) computation performed by hash function • It is possible to performgetin O(1) time • O(1) time for put& removecould also occur • None of this is guaranteed; many problems can occur

  20. Use Entire Table Important • Hashing take lots of space because array is used • When creating, make array big enough to hold all data • Can copy to larger array, but this notO(1) operation • Use prime number lengths but these quickly get large • Spreads out Entrys equally across entire table • Further apart it's spread, easier to find opening

  21. Hash Function Analogy

  22. Hash Function Analogy Hash table

  23. Hash Function Analogy Hash function Hash table

  24. Examples of Bad Hash • h(x) = 0 • Reliable,fast, little use of table • h(x) = random.nextInt() • Unreliable,fast, uses entire table • h(x) = current index -or- free index • Reliable, slow,uses entire table • h(x) = x34 + 2x33+ 24x32 + 10x31… • Reliable,moderate,too large

  25. Incredibly Bad Hash

  26. Incredibly Bad Hash • Using only part of key& not whole thing • No matter what, inevitably, you will guess wrong

  27. Incredibly Bad Hash • Using only part of key& not whole thing • No matter what, inevitably, you will guess wrong

  28. Incredibly Bad Hash • Using only part of key& not whole thing • No matter what, inevitably, you will guess wrong Part used for hash

  29. Incredibly Bad Hash • Using only part of key& not whole thing • No matter what, inevitably, you will guess wrong Part used for hash Part that matters

  30. Censored Good Hash • Hash must first turnkeyinto int • Easy for numbers, but rarely that simple in real life • For a String, could add value of each character • Would hash to same index “spot”, “pots”, “stop” • Instead we usually use polynomial code:

  31. Censored Good Hash • Hash must first turnkeyinto int • Easy for numbers, but rarely that simple in real life • For a String, could add value of each character • Would hash to same index “spot”, “pots”, “stop” • Instead we usually use polynomial code:

  32. Censored Good Hash • Hash must first turnkeyinto int • Easy for numbers, but rarely that simple in real life • For a String, could add value of each character • Would hash to same index “spot”, “pots”, “stop” • Instead we usually use polynomial code:

  33. Good, Fast Hash • Polynomial codes good, but veryslow • Major bummer since we use hash for its speed • Cause of slowdown: computing antakes n operations • Horner’s method better by piggybacking work

  34. Compression • Hash’s only use is computing array indices • Useless if larger than table’s length: no index exists! • When a=33, “spot” hashed to 4,293,383 • Some hash incalculable (like “triskaidekaphobia”) • To compress result, work like array-based queue hash=(result+length)%length • % returns by modulus (the remainder from division) • Serves exact same purpose: keeps index within limits

  35. Before Next Lecture… • Continue working on week #4 assignment • Due at usual time Tues. so may want to get cracking • Start thinking of designs & CRC cards for project • Due next Friday as projects completed in stages • Read sections 9.2.1 & 9.2.5 – 9.2.8 of the book • Consider better ways of handling this situation:

More Related