COP 3540 Data Structures with OOP

COP 3540 Data Structures with OOP Chapter 7 - Part 2 Advanced Sorting

Quicksort • Very famous and popular. • For many (not all) cases, it provides excellent performance, generally O(nlog2n). • Excellent for internal sorting (not disk files). • Quick sort is based on Partitioning • Operates by partitioning an array into two parts, as expected. • Calls itself recursively to quicksort each two partitions.

Consider the algorithm: public void recQuickSort (int left, int right) { if (right-left <=0) return; // base case If size = 1, it is already sorted else { int partition = partitionIt(left, right); // can you explain this code? recQuickSort (left, partition-1); // sort left side recursively rec QuickSort (partition+1,right); // sort right side recursively } } // end recQuickSort() Note: We first check to see if we have the trivial case: the base case. If not, go! Note now, we partition the array into smaller (left) and larger (right) keys. note: not saying what ‘left’ and ‘right’ are though…or where pivot is. Now: recursive routine: Sort the left side: recursively and then the right side: recursively.  But, recursively calling recQuickSort invokes the partition algorithm again and a recursive call to recQuickSort (on the left…) again…

 So what does this actually mean?? • Consider the real operation here: We are recursively calling recQuickSort unless the base case is encountered (eventually it will) We invoke the partitionalgorithm again (and again and again) (which successively divides the left side into two subarrays – a ‘smaller left side’ and a ‘small right side’ of the original left side…) and a recursive call to recQuickSort (on the left…) again…again, and again…. Note: as we keep on going to the left ….to the left, there is a corresponding right ‘side’ that is also becoming smaller and smaller…  So, we are sorting the subarrays by recursively calling ourselves and executing the partitioningalgorithm as the first step in this call eachtime! We then continue to partition and ultimately arrive at a base case. From this ‘smallest of arrays’ we will then recursively call the rightsubarray (for the first time) and then ‘essentially’ start over calling the perhaps partition and recursively call the left subarray …. over and over and over….

Selection of the Pivot Value • The partitioned method requires a pivot value to do the partitioning. • Ideally, the pivot value should be one of the key values you are trying to sort. • Simpleapproach: select the rightmost item of the sub-array being partitioned. (At least this is an element in the array to be sorted.) • Afterthepartition, would be nice if this pivot is in its final place between the left and right sub-arrays. But we cannot assert this. We only know that the pivot value will be on the left and all items to the right are>= than the pivot value, just not sorted.

Pivot Values • But since all values to the right are greater than the pivot and are unsorted, we merely swap the pivot with the left-scan at the conclusion of the partitioning (left scan > right scan). •  This will put the original pivot in its final position…(See?) •  And we have our original array partitioned at the place where the left scan and original pivot were exchanged. • (Remember, the left scan proceeded to the right until it was greater than the pivot value; right scan proceeded to the right until it was less than pivot value. AT end of the scan, left > right and we can make the conclusion above…) • This works because we know the pivot is greater than any elements in the left partition and the pivot value will be in the left partition somewhere…We now have two smaller sub-arrays. • Now go to the left sub-array and partition this, etc. recursively. • Using this approach in selecting the rightmost item in a sub-array as the pivot requires minor changes in the quicksort routine. Reflected in quickSort1.java - ahead.

Our course, we have the driver: class QuickSort1App { public static void main(String[] args) { int maxSize = 16; // array size ArrayIns arr; arr = new ArrayIns(maxSize); // create array. Generate the array and for(int j=0; j<maxSize; j++) // fill array with random numbers. { long n = (int)(java.lang.Math.random()*99); // remember how random() worked?? arr.insert(n); }// end for arr.display(); // display items arr.quickSort(); // quicksort them – here’s the easy stuff. arr.display(); // display them again } // end main() } // end class QuickSort1App

recQuickSort itself: public void recQuickSort(int left, int right) { if(right-left <= 0) // if size <= 1, already sorted (base case) return; else // size is 2 or larger { long pivot = theArray[right]; // rightmost item (note argument) // theArray is instance variable…) int partition = partitionIt(left, right, pivot); // send pivot to partition // “partition” modified; when done, swap left-scan with pivot recQuickSort(left, partition-1); // sort left side // note: we’re one ‘in’ from where we moved the original pivot // Appears each time we call recQuickSort and do the // partitioning, that new pivot IS in its right place with respect to // the new sub-array. So little by little, elements are moved to // correct position… recQuickSort(partition+1, right);. // sort right side (takes a while to } // end if // get here! But note how the recursion works!! // note what happens when the return occurs } // end recQuickSort()

Let’s look at the applet: QuickSort1.html • Show Lafore Applets…. • Show quicksort1, size = 100. Random. • Dashed line shows subarrays. • Can see the pivot points selected and that the algorithm successively goes to the left, to the left to the left and to the right and then to the left, to the left, etc. • Successively smaller subarrays are created. • Sample: swaps: 170; comparisons: 663. • If you wish to spend some time on these, the book gives a VERY detailed explanation on the presence of the solid line, dashed line, etc.

Some particulars (Things to Notice) • (Looking at the code in the algorithm: • The left scan starts at left-1 and the right scan starts at right (both out of bounds). • But they will each be incremented / decremented prior to their being accessed the first time. • So, not to worry…)

 QuickSort1 can provide horrible performance! • What if: 100 bars inversely sorted: • Swaps 99; Comparisons 5098! • More and larger subarrays are being processed. •  Problem is in selecting of the pivot. • This really comes to bear if data is way out of wack! May not inversely be sorted but containing extraneous / extreme values? • This would certainly impact the choice of a pivot and resulting size of the sub-arrays. • Ideally, perhaps the pivot should be median? • Seems like this might provide better performance?

A problem: • When sub-arrays are out of balance (like having some extreme values or skewed data) each sub-array must be divided more times causing degraded performance. •  In inversely-sorted data, (that is, data comes in descending and we want to sort it ascending) we have sub-arrays of 1 thru n-1 as we progress. •  This phenomenon degenerates the sort into an O(n2) sort!!! • (Recall: Quicksort1 used rightmost element as pivot)

That’s not all the problems in inversely sorted arrays! • Because of the requirement for n partitions, the number of recursive calls would become great. • Could cause stackoverflow in the system and may cause your operating system to hang! •  So, in QuickSort1, choosing the rightmost element as the pivot point may be good if the data is really random. • If the data is inversely sorted, thisselection of a pivot is disastrous and degenerates the sort into an O(n2) sort losing all the potential advantages! • Need a better approach.

Median of Three Partitioning • Need a better approach to avoid selecting the largest or smallest value as the pivot. • How to do this? • Take the median of first, last, and middle elements and use this as pivot. • Faster than examining all elements • Avoids selecting the largest or smallest. • May still have a bad number, but this approach is pretty sound.

Applet: Run QuickSort2 using Median of Three Partitioning. • Given the applet’s random selection of 100 values, we see: • QuickSort1: 100 bars inversely sorted: • Swaps 99 • Comparisons 5098! • QuickSort2: 100 bars inversely sorted: • Swaps: 217 • Comparisons: 712!

Constraints • The median of three approach for partitioning eliminates the likelihood of using this sort for partitions of three or fewer items to be sorted. • For small partitions, we might want to use the insertionsort so we don’t have to worry about the cutoff = 3 for the median of three partitioning. • Studies are available on different cutoff sizes… •  Your book presents the algorithm in Listing 7.5 where an insertion sort is used to handle sub-arrays with fewer than 10 cells. This makes sense for small ‘n.’ • Let’s look at the operative routine…QuickSort3. • (Quicksort1 used rightmost element as pivot; • Quicksort2 used median-of-three as pivot…)

recQuickSort method in QuickSort3 public void recQuickSort(int left, int right) { int size = right-left+1; if(size < 10) // insertion sort if small insertionSort(left, right); else // quicksort if large { long median = medianOf3(left, right); int partition = partitionIt(left, right, median); // fed the median as the pivot. All else, same. recQuickSort(left, partition-1); recQuickSort(partition+1, right); }// end if } // end recQuickSort() (QuickSort3 uses insertionSort for array < 10 and median-of-three for pivot selection for arrays >=10)

Efficiency of QuickSort • One older approach uses stacks to store deferred array bounds and using loopsinstead of recursive calls to oversee partitioning of smaller and smaller sub-arrays. • This goal was to eliminate costly recursive method calls and costly system overhead. • Older machines had real performance penalties in realizing successive function calls. Not really a big deal nowadays!

Efficiency of QuickSort • QuickSort operates in O(nlog2n) time, which is very good. • (Recall Shell Sort operated in O(n(log2n)2 ) time) • QuickSort sorts are typical of divide and conquer algorithms, where targets are successively divided into smaller and smaller ‘halves’ which are processed recursively. • No need to plow deeper into this algorithm • Except for some real fine tuning, we have the idea on how to use this.

Comparisons • See Table 7.2: • O(n) value Type of Sort n 100n 1000n 10,000n n2 Insertion 10 10,000 1,000,000 100,000,000 n(logn)2 Shell Sort 10 400 9,000 160,000 n lognQuickSort 10 200 3,000 40,000

Read about the Radix Sort!! • Study end of chapter questions and terms.

COP 3540 Data Structures with OOP