 Download Download Presentation the hash table

# the hash table

Télécharger la présentation ## the hash table

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. the hash table

2. hash table

3. hash table A hash table consists of two major components …

4. hash table … a bucket array

5. hash table … and a hash function

6. hash table Performance is expected to be O(1)

7. bucket array

8. bucket array hash table • A bucket array is an array A of size N • A[i] is a bucket, i.e. a collection of <key,value> pairs • N is the capacity of A • <k,e> is inserted in A[k] • if keys are well distributed between 0 .. N-1 • if keys are unique integers in range 0 .. N-1 • then each bucket holds at most one entry. • consequently O(1) for get, insert, delete • downside: space is proportional to N • if N is much larger than n (number of entries) we waste space • downside: keys must be in range 0 .. N • this may not be the case (think matric number)

9. bucket array hash table 0 1 2 3 4 5 6 7 8 9 10 (7,Q) (1,D) (3,C) (6,C) Bucket array of size 11 for the entries (1,D), (3,C), (3,F), (6,C) and (7,Q) If hashed keys unique entries in range [0..11] then each bucket holds at most one entry. Otherwise we have a collision and need to deal with it.

10. collision bucket array hash table When two different entries map to the same bucket we have a collision 11

11. collision bucket array hash table When two different entries map to the same bucket we have a collision It’s good to avoid collisions 12

12. hash function

13. hash function hash table A hash function maps each key to an integer in the range [0,N-1] Given entry <k,e> … h(k) is the index into the bucket array store entry <k,e> in A[h(k)] • h is a good hash function if • h maps keys so as to minimise collisions • h is easy to compute/program • h is fast to compute • h(k) has two actions • map k to a hash code • map hash code into range [0,N-1]

14. hash function hash codes in java hash table But care should be taken as this might not be “good”

15. a bit of maths … that you know (af2)

16. af2 • Let A and B be sets • A function is • a mapping from elements of A • to elements of B • and is a subset of AxB • i.e. can be defined by a set of tuples!

17. af2 • A is the domain • B is codomain • f(x) = y • y is image of x • x is preimage of y • There may be more than one preimage of y • There is only one image of x • otherwise not a function • There may be an element in the codomain with no preimage • Range of f is the set of all images of A • the set of all results

18. Injection (aka one-to-one, 1-1) af2 a u a x b v b c w c y x d y d z z not an injection injection If an injection then preimages are unique

19. Injection (aka one-to-one, 1-1) af2 • Ideally we want our hash function to be • injective (no collisions) • have a small codomain and range • may need to compress range a u a x b v b c w c y x d y d z z not an injection injection If an injection then preimages are unique

21. hash code & hash function Just to clear this up (but lets not make too big a deal about it) …

22. hash code & hash function Just to clear this up (but lets not make too big a deal about it) … We assume hash code is an integer in the codomain Hash function brings hash codes into the range [0,N-1] We will examine just a few hash functions, acting on strings

23. Polynomial hash codes hash code & hash function Assume we have a key s that is a character String Here is a really dumb hash code public int dumbHash(String s){ int code = 0; for (int i=0;i<s.length();i++) code = code + s.charAt(i); return code; } • What would we get for • dumbHash(“spot”) • dumbHash(“pots”) • dumbHash(“tops”) • dumbHash(“post”)

24. Polynomial hash codes hash code & hash function Take into consideration the “position” of elements of the key So, this doesn’t look any different from an every-day number It’s to the base a and the coefficients are the components of the key

25. Polynomial hash codes hash code & hash function Good values for a appear to be 33, 37, 39, 41

26. Yikes! Look at that range!!!! Polynomial hash codes hash code & hash function • Small scale experiments on unix dictionary • a = 33 • 25104 words/strings • minimum hash value -9165468936209580338 • maximum hash value 8952279818009261254 • collision count 7

27. Cyclic shift hash codes hash code & hash function Start moving bits around

28. Cyclic shift hash codes hash code & hash function

29. Cyclic shift hash codes hash code & hash function Thanks to Arash Partow

30. Cyclic shift hash codes hash code & hash function

31. Cyclic shift hash codes hash code & hash function

32. Cyclic shift hash codes hash code & hash function

33. Cyclic shift hash codes hash code & hash function

34. Cyclic shift hash codes hash code & hash function

35. Cyclic shift hash codes hash code & hash function

36. Cyclic shift hash codes hash code & hash function

37. Compression Functions hash code & hash function So, you think you’ve found something that produces a good hash code … How do we compress its range to fit into our machine?

38. Compression Functions hash code & hash function Assume we want to limit storage to buckets in range [0,N-1] The division method NOTE: keep N prime int i = (int)(hash(s) % N); S[i] = s; … ideally, but there may be collisions 

39. Compression Functions hash code & hash function Assume we want to limit storage to buckets in range [0,N-1] The multiply add and divide (MAD) method • N is prime • a > 1 is scaling factor • b ≥ 0 is a shift • a % N ≠ 0

40. hash tables Collision handling schemes

41. Collision handling schemes hash tables Separate Chaining

42. Collision handling schemes Separate Chaining hash tables • bucket[i] is a small map • implemented as a list bucket[i] should be a short list It may be sorted It might be something other than a list

43. Collision handling schemes Separate Chaining hash tables Let N be number of buckets and n the amount of data stored load factor is n/M • Upside: • simple • Downside: • requires auxiliary data structures (to resolve collisions) • this may put additional burden on space

44. Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list 0 1 2 3 4 5 6 7

45. Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list put(Jon,plumber) hash(Jon) = 3 0 1 2 3 4 5 6 7

46. Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list put(Jon,plumber) hash(Jon) = 3 0 1 2 3 Jon,plumber 4 5 6 7

47. Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list put(Fred,painter) hash(Fred) = 6 0 1 2 3 Jon,plumber 4 5 6 7

48. Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list put(Fred,painter) hash(Fred) = 6 0 1 2 3 Jon,plumber 4 5 6 Fred,painter 7

49. Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list put(Joe,prof) hash(Joe) = 1 0 1 2 3 Jon,plumber 4 5 6 Fred,painter 7