1 / 41

Hashing

Hashing. Problem: store and retrieving an item using its key (for example, ID number, name). Linked List takes O(N) time Binary Search Tree take O(logN) time Array List take O(1) time. 0. 99999. Array. ID: 41 12041 Name: Somsri Faculty: Science. ID: 41 63490 Name: Sompong

bkeller
Télécharger la présentation

Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hashing Problem: store and retrieving an item using its key (for example, ID number, name) • Linked List • takes O(N) time • Binary Search Tree • take O(logN) time • Array List • take O(1) time 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  2. 0 99999 Array ID: 4112041 Name: Somsri Faculty: Science ID: 4163490 Name: Sompong Faculty: Engineering Problem: a lot of empty space 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  3. 0 9999 Hashing ID: 4112041 Name: Somsri Faculty: Science ID: 4163490 Name: Sompong Faculty: Engineering Map the key into some number between 0 to ArraySize-1 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  4. Hashing • Map the key into an array position using a “hash function” • ArrayIndex = hash(key) • Take O(1) time to access an item • Much less empty space than using normal array 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  5. Hash Function • Must return a valid array index. • Should be 1-to-1 mapping. • If key1 != key2 then hash(key1) != hash(key2) • A collision occurs when two distinct keys hash to the same location in the array • Should distribute the keys evenly • Any key value k is equally likely to hash to any of the m array locations. 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  6. Simple Hash Function • ArrayIndex = key mod TableSize • Example: • 4112041 -> 12041 mod 1000 -> 41 • 4163490 -> 63490 mod 1000 -> 490 • TableSize should be a prime number for even distribution 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  7. Another Hash Function • ArrayIndex = (k0 + 37k1 + 372k2 + . . . ) mod TableSize • Example: 3-character key ArrayIndex = (k0 + 37k1 + 372k2) mod TableSize ArrayIndex = k0 + 37 * (k1 + 37 * (k2)) mod TableSize 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  8. Hash Function public static int hash( String key, int tableSize ) int hashVal = 0; for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key.charAt( i ); hashVal %= tableSize; if ( hashVal < 0 ) // overflow hashVal += tableSize; return hashVal; } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  9. Collision • When an element is inserted, if it hashes to the same value as an already inserted element, then we have a collision. • Collision resolving techniques • Separate Chaining • Open Addressing • Linear Probling, Quadratic Probling, Double Hashing 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  10. 0 999 Separate Chaining 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  11. Separate Chaining • Load factor l = number of elements / table size • average length of list = l • successful search cost 1 + (l/2) link traversals • cost depends on l 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  12. 0 999 Separate Chaining: evenly distributed 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  13. 0 10 20 999 Separate Chaining: last digit is zero Solution: TableSize is prime 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  14. Open Addressing • No linked-list. All items are in the array • If a collision occurs, alternative locations are tried until an empty cell is found • try h0(x), h1(x), h2(x), … • hi(x) = (hash(x) + f(i)) mod TableSize • f(i) is a collision resolution strategy • Require bigger table, l should be below 0.5 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  15. Linear Probing • If a collision occurs, try the next cell sequentially • f(i) = i • hi(x) = (hash(x) + i) mod TableSize • Try hash(x) mod TableSize, (hash(x) + 1) mod TableSize, (hash(x) + 2) mod TableSize, (hash(x) + 3) mod TableSize, . . . 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  16. 0 49 1 58 2 69 3 4 5 6 7 8 18 9 89 Linear Probing Insert: 89, 18, 49, 58, 69 89 is directly inserted into cell 9 18 is directly inserted into cell 8 49 has a collision at cell 9 and finally put into cell 0 58 has collisions at cell 8, 9, 0 and finally put into cell 1 69 has a collisions at cell 9, 0, 1 and finally put into cell 2 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  17. Primary Clustering • Forming of blocks of occupied cells (called clusters) • A collision occurs if a key is hashed into anywhere in a cluster. Then there may be several attempts to resolve the collision before a free space is found. The new data is added into the cluster. 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  18. Linear Probing • Problem: Primary Clustering • Normal deletion cannot be performed (some following find operations will fail because the link of collisions that leads to the data is cut) Use lazy deletion • Insertion cost = number of probes to find an empty cell = 1/(fraction of empty cells) = 1/(1- l) 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  19. Quadratic Probing • Eliminate primary clustering • f(i) = i2 • hi(x) = (hash(x) + i2) mod TableSize • Try hash(x) mod TableSize, hash(x)+12 mod TableSize, hash(x)+22 mod TableSize, hash(x)+32 mod TableSize, . . . • Table must be at most half full and table size must be prime, otherwise insertion may fail (always have a collision) 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  20. 0 49 1 2 58 3 69 4 5 6 7 8 18 9 89 Quadratic Probing Insert: 89, 18, 49, 58, 69 Insert 89, try cell 9 Insert 18, try cell 8 Insert 49, try cell 9, 0 Insert 58, try cell 8, 9, 2 Insert 69, try cell 9, 0, 3 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  21. 0 10 1 2 3 4 5 6 7 8 9 40 Quadratic Probing Insert: 10, 20, 30, 40, 50, 60, 70 Insert 10, try cell 0 Insert 20, try cell 0, 1 Insert 30, try cell 0, 1, 4 Insert 40, try cell 0, 1, 4, 9 Insert 50, try cell 0, 1, 4, 9, 6 (16) Insert 60, try cell 0, 1, 4, 9, 6 (16), 5 (25) Insert 70, try cell 0, 1, 4, 9, 6 (16), 5 (25), 6 (36), 9 (49), 4 (64), 1 (81), 0 (100), 1 (121), 4 (144), 9 (169), 6 (196), . . . 20 30 60 50 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  22. Quadratic Probing • Secondary clustering • elements that hash to the same position will probe the same alternative cells and put into the next available space, forming a cluster. • In the first example, inserting 89, 49, 69 forms a secondary cluster. Inserting 18, 58 forms another secondary cluster. 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  23. Double Hashing • f(i) = i * hash2(x) • hi(x) = (hash(x) + i* hash2(x)) mod TableSize • Try hash(x) mod TableSize, (hash(x) + hash2(x)) mod TableSize, (hash(x) + 2*hash2(x)) mod TableSize, . . . • Example: hash2(x) = R - (x mod R) • R is a prime number smaller than TableSize 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  24. 0 69 1 2 3 4 58 5 6 7 49 8 18 9 89 Double Hashing Insert: 89, 18, 49, 58, 69, 23 hash2(49) = 7-(49 mod 7) = 7 hash2(58) = 7-(58 mod 7) = 5 hash2(69) = 7-(69 mod 7) = 1 hash2(23) = 7-(23 mod 7) = 5 Insert 49, try 9, (9+7) mod 10 = 6 Insert 58, try 8, (8+5) mod 10 = 3 Insert 69, try 9, (9+1) mod 10 = 0 Insert 23, try 3, (3 + 5) mod 10 = 8, (3 + 10) mod 10 = 3, (3+15) mod 10 = 8, . . . 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  25. Rehashing • When the table is too full, create a new table at least twice as big (and size is prime), compute the new hash value of each element, insert it into the new table. • Rehash when the table is half full, or when an insertion fails, or when a certain load factor is reached. • Because of lazy deletion, deleted cells are also counted when the load factor is calculated. • Rehashing time is O(N). But the cost is shared by preceding N/2 insertions. So, it adds constant cost to each insertion. 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  26. public interface Hashable { int hash( int tableSize ); } public class MyInteger implements Comparable, Hashable { public int hash( int tableSize ) { if ( value < 0 ) return -value % tableSize; else return value % tableSize; } } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  27. public static void main( String [ ] args ) { SeparateChainingHashTable H = new SeparateChainingHashTable( ); final int NUMS = 4000; final int GAP = 37; for( int i = GAP; i != 0; i = ( i + GAP ) % NUMS ) H.insert( new MyInteger( i ) ); for( int i = 1; i < NUMS; i+= 2 ) H.remove( new MyInteger( i ) ); for( int i = 2; i < NUMS; i+=2 ) if( ((MyInteger)(H.find( new MyInteger( i ) ))). intValue( ) != i ) System.out.println( "Find fails " + i ); } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  28. public class SeparateChainingHashTable { private LinkedList[ ] theLists; public SeparateChainingHashTable( ) public SeparateChainingHashTable( int size ) public void insert( Hashable x ) public void remove( Hashable x ) public void find( Hashable x ) public void makeEmpty( ) public static int hash( String key, int tableSize ) private static final int DEFAULT_TABLE_SIZE = 101 private static int nextPrime( int n ) private static boolean isPrime( int n ) } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  29. public class SeparateChainingHashTable { public SeparateChainingHashTable( ) { this( DEFAULT_TABLE_SIZE ); } public SeparateChainingHashTable( int size ) { theLists = new LinkedList[ nextPrime( size ) ]; for( int i = 0; i < theLists.length; i++ ) theLists[ i ] = new LinkedList( ); } public void makeEmpty( ) { for( int i = 0; i < theLists.length; i++ ) theLists[ i ].makeEmpty( ); } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  30. public static int hash( String key, int tableSize ) { int hashVal = 0; for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key.charAt( i ); hashVal %= tableSize; if( hashVal < 0 ) hashVal += tableSize; return hashVal; } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  31. public void insert( Hashable x ) { LinkedList whichList = theLists[ x.hash( theLists.length ) ]; LinkedListItr itr = whichList.find( x ); if( itr.isPastEnd( ) ) whichList.insert( x, whichList.zeroth( ) ); } public void remove( Hashable x ) { theLists[ x.hash( theLists.length ) ].remove( x ); } public Hashable find( Hashable x ) { return (Hashable)theLists[x.hash(theLists.length)]. find( x ).retrieve( ); } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  32. public class Employee implement Hashable { public int hash( int tableSize ) { return SeparateChainingHashTable.hash( name, tableSize ); } public boolean equals( Object rhs ) { return name.equals( ((Employee)rhs).name ); } private String name; private double salary; private int seniority; } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  33. public class QuadraticProbingHashTable { public static final int DEFAULT_TABLE_SIZE = 11; protected HashEntry [ ] array; private int currentSize; public QuadraticProbingHashTable( ) public QuadraticProbingHashTable( int size ) public void makeEmpty( ) public Hashable find ( Hashable x) public void insert( Hashable x ) public void remove( Hashable x ) public static int hash( String key, int tableSize ) } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  34. class HashEntry { Hashable element; // the element boolean isActive; // false is deleted public HashEntry( Hashable e ) { this( e, true ); } public HashEntry( Hashable e, boolean i ) { element = e; isActive = i; } } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  35. public class QuadraticProbingHashTable { public QuadraticProbingHashTable( ) { this( DEFAULT_TABLE_SIZE ); } public QuadraticProbingHashTable( int size ) { allocateArray( size ); makeEmpty( ); } public void makeEmpty( ) { currentSize = 0; for( int i = 0; i < array.length; i++ ) array[ i ] = null; } private void allocateArray( int arraySize ) { array = new HashEntry[ arraySize ]; } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  36. public Hashable find( Hashable x ) { int currentPos = findPos( x ); return isActive( currentPos ) ? array[ currentPos ].element : null; } private int findPos( Hashable x ) { int collisionNum = 0; int currentPos = x.hash( array.length ); while( array[ currentPos ] != null && !array[ currentPos ].element.equals( x ) ) { currentPos += 2 * ++collisionNum - 1; if( currentPos >= array.length ) currentPos -= array.length; } return currentPos; } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  37. private boolean isActive( int currentPos ) { return array[ currentPos ] != null && array[ currentPos ].isActive; } public void insert( Hashable x ) { int currentPos = findPos( x ); if( isActive( currentPos ) ) return; array[ currentPos ] = new HashEntry( x, true ); if( ++currentSize > array.length / 2 ) rehash( ); } public void remove( Hashable x ) { int currentPos = findPos( x ); if( isActive( currentPos ) ) array[ currentPos ].isActive = false; } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  38. private void rehash( ) { HashEntry [ ] oldArray = array; // Create a new double-sized, empty table allocateArray( nextPrime( 2 * oldArray.length ) ); currentSize = 0; // Copy table over for( int i = 0; i < oldArray.length; i++ ) if( oldArray[ i ] != null && oldArray[ i ].isActive ) insert( oldArray[ i ].element ); return; } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  39. private static int nextPrime( int n ) { if( n % 2 == 0 ) n++; for( ; !isPrime( n ); n += 2 ) ; return n; } private static boolean isPrime( int n ) { if( n == 2 || n == 3 ) return true; if( n == 1 || n % 2 == 0 ) return false; for( int i = 3; i * i <= n; i += 2 ) if( n % i == 0 ) return false; return true; } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  40. Summary • insert and find take constant average time • load factor affects performance • load factor of separate chaining hashing should be close to 1 • load factor of open addressing hashing should not exceed 0.5 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

  41. Summary • Hashing is good when ordering information is not required • Applications: • symbol table • on-line spelling checker 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

More Related