Data Structures and Algorithms Searching Algorithms

Data Structures and Algorithms Searching Algorithms M. B. Fayek CUFE 2006

Agenda • Introduction • Sequential Search • Binary Search • Interpolation Search • Indexed Search

1. Introduction • What is a Search? “ Searching is the task of finding a certain data item (record) in a large collection of such items.” • A key field that identifies the item sought for is given. (For simplification we consider only the key field instead of the complete record.) • If the item is foundeither its location or the complete item is returned. • If the item is not found an indication is given, usually by returning a non-existing index such as -1.

2. Sequential Search • Sequential Search is also called Exhaustive Search because the complete collection is searched.

13 15 20 21 17 8 41 6 2. Sequential Search Keyitem = 20 i =0 i =1 i =2 YES NO Key item = list[i] ? return i =2 as found location !

2. Sequential Search • The first implementation will be: for i = 0 to n do get next item Ai if Ai == k return i endfor return -1

2. Sequential Search • Another pseudo code is: i =0 while i < n and item Ai <> k i <-- i+1 if i < n return i else return -1 Check boundary conditions! ←

2. Sequential Search • How is the algorithm implemented? • The way the collection is constructed affects the way the next item Ai is retrieved. • In a static array: Ai is the indexed item A[i] • In a linked list: Ai is the next node to be fetched by following the “next pointer” in the present node. In this case usually the address of the node found (a pointer to the found node) is returned or a NULL pointer to indicate that it was not found • In a file: Ai is the next record retrieved from the file

2. Sequential Search • Complexity: • The basic operation is the comparison • For a collection of n data items there are several cases: • Best case: item found at the first location • Number of comparisons = 1 • Worst Case: item found at the last location or item not found • Number of comparisons = n • Average case = (1+n)/2

2. Sequential Search Enhancements • Sequential Search may be enhanced using several techniques: • Sorting before searching (Presorting) • Sentinel Search • Probabilistic Search

2. Sequential Search Enhancements1. Presorting • Agood questionto ask before searching is whether the collection is sorted or not? • How do we use that info? Ifsorted the search is terminated as soon as the value of the indexed item in the collection exceeds that of the search item. • What is the effect? This will not affect the worst case of finding the element at the last position, but it will decrease the average number of comparisons if logic position of the item were somewhere before the end of the list and the element was not found. • A more efficient search is the binary search.

2. Sequential Search Enhancements2. Sentinel Search • The basic loop in sequential sort include 2 comparisons at each iteration while( (i< n) && (key < > A [ i ]) ) • To decrease the number of comparisons to one per iteration a sentinel value = key is inserted at the end of the array (beyond its end, i.e. at n) • Hence the first comparison is redundant. The search will always stop finding key either within A (if it already existed) or outside A if it originally did not exist. • A check on the location of key will indicate if it existed or not.

2. Sequential Search Enhancements3. Probabilistic Search • The basic idea here is that popular elements of the list that are searched for more frequently should require less comparisons to find • This is implemented by enhancing the location of an element found in the array when searched for, one location ahead by swapping it with the element before it. • Hence, each time an element is found the number of comparisons needed to find it next time is decremented by one

2. Sequential Search • Modifying the first sequential algorithm for the case of sorted list would be : for i = 0 to n do if Ai > k return -1 // as list is sorted the // possible location has been passed if Ai == k return i return -1

2. Sequential Search • Modifying the second sequential algorithm for the case of sorted list would be : i =0 while i < n and next item Ai < k i <-- i+1 if Ai == k and i < n return i else return -1

3. Binary Search • How does it work? • Basic idea that dividing the list at each search step into 2 sublists and checking the mid item the range to be searched for possible location is either the left or right sublist (i.e. desreased to half ). • Note however, that the determination of the middle item in the collection is a simple task if the data collection is represented in memory by a sequential array, whereas it is not so if the collection is represented using a linked list. Hence we will assume that the collection is a sequential array.

13 15 20 21 27 38 41 65 2. Sequential Search Keyitem = 20 n = 8 mid =4 mid =2 mid =3 YES NO Key item = list[mid] ? Key item < list[mid] Key item > list[mid] return i =2 as found location ! 3 comparisons!

3. Binary Search • For the same input and output specs as before the algorithm is: low = 0; high = n-1; while (low < high) do { mid = (low+high)/2 if ( k < A [mid] ) then high = mid -1 else if ( k > A [mid] then low = mid +1 else return mid // found } return -1 // not found

3. Binary Search • Complexity: • For a collection of n data items: • In each step: the mid item is compared to k and the range of search is divided by 2 • This is repeated until the range is zero (at the worst case). • i.e. we should ask: how many times will we divide n by 2 till the length of sublists is zero? → log2 n … which is better than n

4.Interpolation Search • What is meant by interpolation? • Here we try to guess more precisely where the search key resides. • Instead of calculating the middle as the physical middle (low+high)/2 it is calculated in a weighted manner w.r.t. to the value of k relative to max and min values in the list

4. Interpolation Search • Analysis: • Calculations are more complex for mid • Significant Improvement in search time especially when values of data items in collection are evenly distributed.

5. Indexed Search • What is an index? • Similar to the index of a book (e.g. telephone book), items in the index point to significant items in the collection. • This implies that in this search an additional table is used … the index table, where each item in the index table points to a specific location in the original search list.

5. Indexed Search • Algorithm: // Input: Search array A of n items + index table of d items + key item k //Output: Location of item with search key or false key Step 1: Determine search range for key within index table by specifying (imin to imax) inside original search list Step 2: Search sequentially for key in range (imin to imax) inside original search list

Pos 5. Indexed Search Step 2 • Algorithm: Step 1 Searching for key =53 { 1 Index Pos = 5+1= 6 Table

5. Indexed Search • Analysis: Assuming that: • the original table is of size n • Index is of size d Step 1: Determine search range has average complexity: O( d/2) Step 2: Search for key in range (imin to imax) inside original search list, assume average range length = n/k

Data Structures and Algorithms Searching Algorithms

Data Structures and Algorithms Searching Algorithms

Presentation Transcript

Data Structures and Algorithms

Data Structures and Algorithms

Data Structures and Algorithms

Data Structures and Algorithms

Data Structures and Algorithms

Data Structures and Algorithms

Algorithms and Data Structures

Data Structures and Algorithms

Data Structures and Algorithms

Algorithms and Data Structures

Algorithms and Data Structures

DATA STRUCTURES AND ALGORITHMS

Algorithms and Data Structures

Data Structures and Algorithms

Data Structures and Algorithms

Algorithms and Data Structures

Data Structures and Algorithms

Data Structures and Algorithms

Algorithms and Data Structures

Algorithms and data structures