440 likes | 532 Vues
Chapter 11. Searching. Chapter Outline. Objectives Do a simulation by hand of a serial search or binary search Demonstrate why binary search has a logarithmic worst-cast performance Do a simulation by hand of an insertion or removal of an element from an open-address or chained hash table
E N D
Chapter 11 Searching
Chapter Outline • Objectives • Do a simulation by hand of a serial search or binary search • Demonstrate why binary search has a logarithmic worst-cast performance • Do a simulation by hand of an insertion or removal of an element from an open-address or chained hash table • Implement a hash table class using open-address hashing • Contents • Serial search and binary search • Open-address hashing • Chained hashing • Time analysis of hashing Data Structure
Serial Search • Serial search • Steps through part of an array one element at a time looking for a “desired element” • Stops when the element is found or when the search has examined each element without success • Easy to write and applicable to many situations • Serial Search –analysis • Worst-case time • When the target is not in the array : requires n array access • Average-case • Averaging the different running times for all different inputs of a particular kind Data Structure
Serial Search • Average case • Suppose the array has 10 elements • First location target, second location target … • 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = 5.5 10 • Generalization • Using n as the size of the array • 1 + 2 + 3 + … + n = n ( n + 1 ) / 2 = ( n + 1 ) / 2 n n • Worst-cast time, average caste time = O(n) • But average case is about half the time of the worst case Data Structure
Serial Search • Best-case time • Smallest of all the running times on inputs of a particular size • For an array of n, best-case time is just one array access • Unless the best case occurs with high probability, this time is not used during an analysis Data Structure
Binary search • May be used only if the array is sorted • Three examples for which binary search is applicable • Searching an array of integers in which the array is sorted from the smallest integer to the largest integer • Searching an array of strings in which the strings are sorted alphabetically • Searching an array in which each component is an Object containing information about some item such as an auto part and the array is sorted by “part numbers” in order Data Structure
Binary search • search • public static int search(int []a, int first, int size, int target) • Search part of a sorted array for a specified target • Parameters • a – the array to search • first – the first index of the part of the array to search • size – the number of elements to search, starting at a[first] • target – the element to search for • Returns • Index of a location that contains target Data Structure
Binary search • Design • Suppose the array of numbers is a list of invalid credit card numbers and it is so long that it takes a book to list them • Open the book to the middle and see if it is in there • If not and it is smaller than the middle.. • If not and it is larger than the middle.. if (size <= 0) return -1; else { middle = index of the approximate midpoint of the array segment; else if (target < a[middle]) search for the target in the area before the midpoint else if (target > a[middle]) search for the target in the area after the midpoint } Data Structure
Binary Search public static int search(int[] a, int first, int size, int target) { int middle; if(size <= 0) return -1; else { middle = first + size/2; if(target == a[middle]) return middle; else if (target < a[middle]) return search(a, first, size/2, target); else return search(a, middle+1, (size-1)/2, target); } } Data Structure
7 22 29 32 42 52 59 66 69 76 7 22 29 32 42 52 59 66 69 76 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Binary Search • Suppose search(a, 0, 10, 42) middle = first + size/2 search(a, 0, 5, 42) Data Structure
7 22 29 32 42 52 59 66 69 76 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Binary Search middle = 2 search(a, 3, 2, 42) search(a, 0, 5, 42) middle = 4 search finds the target 42 at a[4] return the index 4 Data Structure
Binary Search Analysis • Let n be the number f array elements • how many operations the algorithm will perform in the worst case? • One operation • test to see if the array segment is empty(size<=0) • Three more operations • middle = first + size /2 • one each of division, addition, assignment • Two more operations • a[middle] -> 1 operation • == -> 1 operation • Another two operations • assuming that target is not in the array • (target < a[middle]) Data Structure
Binary Search Analysis • How many operations the algorithm will perform in the worst case? (continue..) • The exact number of operations depends on the way in which the methods are activated and the return value provided • Just use a symbol c for the number of operations for passing the arguments and obtaining the return value • if consider the recursive call • T(n) = (8 + c) * (the length of the longest chain of recursive calls) + (the number of operations performed in the stopping case) Data Structure
Binary Search Analysis • How many operations the algorithm will perform in the worst case? (continue..) • There are two possible stopping cases • when the size becomes zero • when the target is found • Worst case time • T(n) = (8 + c) * (the length of the longest chain of recursive calls) + 2 • T(n) = (8 + c) * (the depth of recursive calls) + 2 Data Structure
Binary Search Analysis • We will determine an upper bound approximation to the number of recursive call • Calculate this upper bound on the length of the longest string of recursive calls • Each recursive call is made on the half array • divide n in half, then divide that half in half again, then divide that result in half until the array is “all gone” • T(n) = (8+c) * (the number of times that n can be divided by two, stopping when the result is less than one) + 2 Data Structure
Binary Search Analysis • Halving Function H(n) • is definded by H(n) = (the number of times that n can be divided by two, stopping when the result is less than one) • T(n) = (8 + c) * H(n) + 2 • Value of the halving function • has value H(n) = log2n + 1 • Worst case time for binary search • T(n) = (8 + c)* H(n) + 2 = (8 + c) * (log2n + 1) + 2 • is logarithmic ( O(log n) ) Data Structure
Open-Addressing Hashing • is a means used to order and access elements in a list quickly -- the goal is O(1) time -- by using a function of the key value to identify its location in the list. • The function of the key value is called a hash function. FOR EXAMPLE . . . Data Structure
Using a hash function values HandyParts company makes no more than 100 different parts. But the parts all have four digit numbers. This hash function can be used to store and retrieve parts in an array. Hash(key) = partNum % 100 [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] . . . Empty 4501 Empty 8903 8 10 7803 Empty . . . Empty 2298 3699 [ 97] [ 98] [ 99] Data Structure
Placing elements in the array values [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] . . . Empty 4501 Empty 8903 8 10 Use the hash function Hash(key) = partNum % 100 to place the element with part number 5502 in the array. 7803 Empty . . . Empty 2298 3699 [ 97] [ 98] [ 99] Data Structure
Placing elements in the array values Next place part number 6702 in the array. Hash(key) = partNum % 100 6702 % 100 = 2 But values[2] is already occupied. COLLISION OCCURS [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] . . . Empty 4501 5502 8903 8 10 7803 Empty . . . Empty 2298 3699 [ 97] [ 98] [ 99] Data Structure
How to resolve the collision? values [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] . . . Empty 4501 5502 8903 8 10 One way is by linear probing. This uses the rehash function (HashValue + 1) % 100 repeatedly until an empty location is found for part number 6702. 7803 Empty . . . Empty 2298 3699 [ 97] [ 98] [ 99] Data Structure
Resolving collision values [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] . . . Empty 4501 5502 8903 8 10 Still looking for a place for 6702 using the function (HashValue + 1) % 100 7803 Empty . . . Empty 2298 3699 [ 97] [ 98] [ 99] Data Structure
Collision resolved values [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] . . . Empty 4501 5502 8903 8 10 Part 6702 can be placed at the location with index 4. 7803 6702 . . . Empty 2298 3699 [ 97] [ 98] [ 99] Data Structure
Collision resolved values [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] . . . Empty 4501 5502 8903 8 10 Part 6702 is placed at the location with index 4. Where would the part with number 4598 be placed using linear probing? 7803 6702 . . . Empty 2298 3699 [ 97] [ 98] [ 99] Data Structure
Open-address hashing • Collisions are resolved by placing the object in the next open spot of the array • one way of dealing with collisions • For an object with key value given by key, compute the index hash(key). • If data[hash(key)] does not already contain an object, then store the object in data[hash(key)] and the storage algorithm • If the location data[hash(key)] already contains an object, then try data[hash(key)+1]. If that location already contains an object, try data[hash(key)+2] and so forth until a vacant position is found Data Structure
Table ADT-specification • Table • is a collection of objects with operations for adding, removing, and locating objects • each table operation is controlled by a single key rather than being controlled by the entire object’s value • The constructor • creates a hash table with a specified fixed capacity • Table catalog = new Table(811) • The size and capacity method • returns the number of elements currently in the table Data Structure
Table ADT-specification • The put method • public Object put(Object key, Object element) • add a new element to this table using the specified key • Parameters • key – non-null key to use for the new element • element – new element being added to this table • Postcondition • If this table already has an object with the specified key, than that object is replaced by element, and return value is a reference to the replaced object. • Otherwise, the new element is added with the specified key Data Structure
Table ADT-specification • The containskey Methods • public boolean containskey(Object key) • Determine whether a specified key is in this table • Parameters • key – non-null key to look for • Returns • true (If this table contains an object with the specified key) • Get methods • public Object get(Object key) • Retrieve an object for a specified key • Returns • A reference to the object with the specified key Data Structure
Table ADT-Design public class Table { private int manyItems; private Object[] keys; private Object[] data; private boolean[] hasBeenUsed; . . . } Data Structure
Table ADT-Design • Invariant for the Table ADT • The number of elements in the table is in the instance variable manyItems • The preferred location for an element with a given key is at index hash(key). If a collision occurs, then next-Index is used to search forward to find the next open address. When an open address is found at an index I, then the element is placed in data[i] and the element’s key is placed at keys[i] • An index I that is no currently used has data[i] and keys[i] set to null • If an index I has been used at some point, then hasBeenUsed[i] is true Data Structure
Table ADT-Implementation • The put Method • if (index != -1) • The key is already in the table, so replace the data at the index • else if (manyItems < data.length) 2a. Use a loop to set index so that data[index] is the first vacant location at or after data[hash(key)]. If the loop reaches the end of the array, then it should continue searching at data[0] 2b. Set keys[index], data[index], hasBeenUsed[index] • else • There is no room for another element Data Structure
public Object put(Object key, Object element) { int index = findIndex(key); Object answer; if(index != -1) { answer = data[index]; data[index] = element; return answer; } else if (manyItems < data.length) { index = hash(key); while (keys[index] != null) index = nextIndex(index); keys[index] = key; data[index] = element; hasBeenUsed[index] = true; manyItems++; return null; } else { throw new IllegalStateException(“Table is full.”); } } Table ADT-Implementation Data Structure
Table ADT-Implementation • The remove Method • int index = findIndex(key); • Object answer = null; • if (index != -1) • An element has been found with the specified key. Set answer to data[index]; then set data[index] and keys[index] to null public void object remove(key) { int index = findIndex(key); object answer = null; if (index != -1) { answer = data[index]; keys[index] = null; data[index] = null; manyItems--; } return answer; } Data Structure
Choosing a hash function to reduce collisions • Division hash function • depends on the remainder upon division • Math.abs(key.hash(code))%data.length • Certain table sizes are better • a prime number of 4K + 3 by C.E. Radke’s 1970 study • Mid-square hash function • The key is converted to an integer and multiplied by itself • the hash function returns some middle digits of the result • Multiplicative hash function • The key is converted to an integer and multiplied by a constant less than one • The hash function returns the first few digits of the fractional part of the result Data Structure
Double Hashing to reduce clustering • Linear probing • When you put a new element in a hash table • The put moves forward from the original index until a vacant spot is found • Problem • When several different keys are hashed to the same array location, the result is a small cluster of elements • As the table approaches its capacity, these clusters tend to merge into larger and larger clusters • = The problem of Clustering Data Structure
Double Hashing to reduce clustering • Double hashing • Technique to avoid clustering • use a second hash function to determine how we move through an array to resolve a collision • Example • When an element is added, double hashing begins by hashing the key to an array index using hash1 • If there is a collision, we calculate hash2(key) Data Structure
Double Hashing to reduce clustering • As adding hash2(key) to the index at each step, There are two considerations • The array index must not leave the valid range of 0 to data.length-1 • we can keep the index in this rage with “%” operation • next index to examine : (i + hash2(key)) % data.length • We must ensure that every array position is examined • Make sure the array’s capacity is relatively prime with respect to the value returned by hash2 • Choose data.length as a prime number and have hash2 return values in the rage 1 through data.length-1 Data Structure
Double Hashing to reduce clustering • We must ensure that every array position is examined • Possibility by Donald Knuth • Both data.length and data.length-2 should be prime numbers • hash1(key) = Math.abs(key.hashcode()) % data.length • hash2(key) = 1 + (Math.abs(key.hashCode()) % (data.length -2)) Data Structure
Chained Hashing • Hash table contains an array in which each component can hold more than one element of the hash table • If there is collision • simply place the new element in its proper array component along with other elements that happened to hash to the same array index • How does chaining place more than one element in each component of the array? • each array component must have some underlying structure • array component is a head reference for a linked list Data Structure
Chained Hashing • Each node of the linked list is an object of the class: class ChainedhashNode { Object element; Object key; ChainedhashNode link; } public class ChainedTable { private ChainedHashNode[] table = new ChainedHashNode[811]; . . . Data Structure
Chained Hashing [0] [1] [2] [3] [4] Data Structure
Time Analysis of hashing • Worst case for hashing • occurs when every key gets hashed to the same array index • Load factor of a hash table • α= Number of elements in the table The size of the table’s array • Open-address hashing • each array component holds at most one element, so the load factor can never exceed 1 • Open-address hashing with chaining • each array component can hold many elements, the load factor might be higher than 1 Data Structure
Time Analysis of hashing • Average search time for open addressing (Linear Probing) • In Open address hashing with linear probing, a nonfull hash table and no removals • Average number of table elements examined in a search • 1 1 + 1 2 1 - α Assumes that we have not removed any elements and that the hash function does a good job of uniformly distributing all possible keys throughout the array (when not full hash table) Uniform hashing Data Structure
Time Analysis of hashing • Average search time for open addressing (Double hashing) • Provide some relief from clustering • A smaller average time • - ln ( 1 – α) α • Average search time for chained hashing • Each component of table’s array is a reference to the head of a linked list • Average search time • 1 + α 2 Data Structure