1 / 12

Hashing

Hashing. 1. Def. Hash Table a data structure in which objects are associated with an integer and stored in an array at that integer location (i.e. a key field is used to determine the index of the item). This provides constant insert and lookup efficiency.

hakan
Télécharger la présentation

Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hashing 1. Def. Hash Tablea data structure in which objects are associated with an integer and stored in an array at that integer location (i.e. a key field is used to determine the index of the item). This provides constant insert and lookup efficiency. Applications that need quick lookup include 911 calls and airline control tower flight number tables. Ex. Student records stored in an array where each student is assigned an id no. and that number is used for the index. Are there any problems with this idea? Space may be wasted and insertions of new students are limited by the original size of array. Knowing the student id no. is not convenient.

  2. 2. Def. Hash Function- a function used to convert numbers from a large range into numbers in a small range. (The key field is usually the large range and the index of the array is usually the small range.) Ex. Dictionary of 50,000 words. Use the word itself as the key field, but code it numerically to determine a unique location to store the word in the array. Let a = 1, b = 2, c = 3, …z = 26 and let positions of letters in the word have power of ten values: Ex. dab = 4 * 102 + 1 * 101 + 2 * 100 = 412 What size array would be needed to store these 50,000 words, if no word is longer than 10 characters?

  3. zzzzzzzzzz would have the code 28,888,888,890! (too big - bigger than largest int - no array could be that big) Also, if locations were chosen this way, there would be manymany empty cells. What size array should be needed for this dictionary? 100,000 - usually twice as large as the no. of items to allow room for collisions (def. obvious but coming up) A hash function is needed to convert the numeric code to a smaller range.

  4. Commonly used hash function: index = largerange % arraysize Ex. Hash the word gave to find its location in the array dictionary. 7*103 + 1*102 + 22*101 + 5*100 = 7325 Ex. Hash the word gaty to find its location in the array dictionary. 7*103 + 1*102 + 20*101 + 25*100 = 7325 COLLISION!

  5. 3. Def. Collision- hashvalue of occupied cell occurs. 4. There are 2 methods to resolve collisions: Def. Open addressing - in case of collision, search for or store in some other available cell. Def.Separate chaining - install a linked list at each index of the array and insert all items that hash to an index at the beginning of the list.

  6. Ex. Gaty would be stored in location 7326 (if available) otherwise location 7327, or 7328, etc. 5. Types of open addressing: Linear probe method - if collision occurs at index x, search locations x+1, x+2, etc. Note: resolves collisions but primary clusters occur. Quadratic probe method - search x+1, x+22, x+23 etc. Note: resolves primary clusters, but secondary clusters occur.

  7. Rehashing (also called double hashing) - when collision occurs determine step to search for available cell by hashing the key value again by a new function. Ex. Step = 5 - key % 5 What steps result? 5,4,3,2,1 How is this different from the linear & quadratic probe methods? The step is different for different keys. Note: table size must be prime in order to probe all cells. (ex. size=20, step=5, x=0: 0,5,10,15,0,5, 10,15,… try size=19, step=5, x=0: 0,5,10,15,1,6,11,16,2,7,12,17,3,8,13,18,4,9,14

  8. Hashval += step Wrap around: hashval %= arraysize Should not be allowed. When first item with key is found, search stops. Second item with same key would never be found. Select a key value that is unique to the item. (ex. Social security no.) Write code to increase a hash value by step. What do we do if a hash value becomes greater than the size of the array? What do we do about duplicate key values?

  9. Replace one field by -1 rather than replace entire object by null. Often object info may be needed in the future. Ex. Even when employee leaves, pension & tax info is needed. However, there is another reason in this code. Something undesirable occurs if the object is replaced by null. Demonstrate what and explain why. While (hashRay[hashVal] != null && hashRay[hashVal].iData != -1) How do we handle deletions? What method requires this condition and why?

  10. The more full a table is the worse clustering becomes. Therefore, hash tables should be designed to never become more than 1/2 to 2/3 full when open addressing is used. No. n items or more can be placed in a table of size n and the load factor will be 1 or more.(i.e.some locations will hold 1 or more items in its linked list.) 6. Def. Load factor - the ratio of the no. of items in a hash table to the size of the table (array). 7. When separate chaining is used to avoid collisions, is load factor a concern?

  11. Duplicates are allowed and will be stored in the same list. Note: search process slows as list is searched linearly. Deletions can be made from a linked list, if appropriate for the application, without empty cell problems resulting. How do we handle duplicates with separate chaining? How do we handle deletions?

  12. 7. What is the advantage of a hash table? O(1) complexity to search for or insert an item (i.e. constant time regardless of the number of items). 8. Disadvantage? Must know size of array needed in advance (in Java arrays can not be resized - another bigger array would be needed). This problem is reduced when separate chaining is used. Also, there is no way to access items in order.

More Related