Exploring Sorting Algorithms: Divide, Conquer, and Compare

Analyzing sorting algorithms • Recall that T(n) represents the time required for an algorithm to process input of size n. • For a sorting algorithm, T(n) represents the time needed to sort. • this time may be the worst-case time, the average-case time, or something else

Comparison-based sorting algorithms • Many sorting algorithms are comparison based – they depend on being able to compare pairs of elements. • this constraint is naturally represented in Java by the Comparable and Comparator interfaces • In this case we often let T(n) be the number of element comparisons.

Divide-and-conquer sorting algorithms • For divide-and-conquer, we define T(n) in terms of the value of T on smaller arguments. • For example, selection sort applies divide-and-conquer to the output, as follows: • find a piece of size 1 of the output sequence (find the smallest item and swap with the first item) • sort the remaining n-1 elements • Here T(n) = n-1 + T(n-1), where T(1) = 0 • since the first phase uses n-1 comparisons, and the second phase takes time T(n-1)

Insertion sort • The insertion sort algorithm has the following form: • insert a piece of size 1 (usually, the last element) • into the result of sorting a piece of size n-1 • This gives the same constraint as above for T(n) • if T(n) is interpreted as the worst-case number of comparisons • note that this worst-case can be achieved

Time complexity of insertion sort and selection sort • We need to solve the recurrence relation T(n) = n-1 + T(n-1); T(1) = 0 • But this is the relation that describes the sum of the first n nonnegative integers • cf. the first S expression on p.5 of Weiss, but with index bounds 0 and n-1 • So T(n) = (n-1)n/2 • And selection sort is Q(n2) • And insertion sort is Q(n2) in theworst case

Other divide-and-conquer sorting algorithms • Mergesort: • sort two pieces of size n/2 • merge the sorted pieces • Quicksort constructs the output in a divide-and-conquer way: • preprocess the input so that small items are to the left and large items are to the right • sort both pieces

Sorting algorithms we’ve seen • Binary search tree (BST) sort • insert all items into a BST, and traverse • time complexity is Q(n2) in worst case • and Q(n log n) in best case • Heapsort • build a heap, then delete all items • building a heap takes time Q(n) • deletions take time Q(n log n) in the worst case • so sorting also has worst-case time Q(n log n)

Nonrecursive sorting algorithms • Insertion sort, selection sort, and mergesort are easy to formulate nonrecursively. • Quicksort can be formulated recursively by using an explicit stack • some optimization is possible by doing so.

Bottom-up mergesort • Mergesort is easy to state in a bottom-up manner. • Initial sorted subsequences (runs) may be created in any convenient manner • or may simply be taken to be sequences of size 1 • A single pass merges pairs of adjacent runs may into larger runs • passes may copy data alternately into and output a temporary array

Space use in mergesort • Top-down: all recursive calls can share the same temporary array • those subarrays that overlap in time don’t overlap in space • In the bottom-up version, passes may copy data alternately into and out of a temporary array

Time complexity of mergesort • In the bottom-up version, there are Q(log n) passes, each requiring time Q(n), for Q(n log n) time altogether. • The data flow here (and in also in the top-down case) may be modeled by a binary merge tree of height Q(log n). • where each level takes time Q(n) to process • The relevant recursion T(n) = 2T(n/2) + cn has solution T(n) = Q(n log n)

Merge tree (bottom up, n = 19) XXXXXXXXXXXXXXXXXXX / \ XXXXXXXXXXXXXXXX XXX / \ | XXXXXXXX XXXXXXXX XXX / \ / \ | XXXX XXXX XXXX XXXX XXX /\ /\ /\ /\ / \ XX XX XX XX XX XX XX XX XX X

Merge tree (top down, n = 19) XXXXXXXXXXXXXXXXXXX / \ XXXXXXXXX XXXXXXXXXX / \ / \ XXXX XXXXX XXXXX XXXXX / \ / \ / \ / \ XX XX XX XXX XX XXX XX XXX XX / \ / \ / \ X XX X XX X XX

Quicksort • Recall that for quicksort, a preprocessing step is needed • to get small elements to the left of the array • and large elements to the right of the array • A partitition function performs this step. • Partition compares each array element to a pivot element. • pivot elements usually come from the input array • If so, partition can put them between the small and large items

Quicksort details • Small input needn’t be sorted recursively • another sorting algorithm can be used • “small” means of size less than about 10 or 20 • Partition typically works in terms of two index variables i and j • i starts at the left and moves right, looking for large values to move right • j starts at the right and moves left, looking for small values to move left • the loops moving i and j stop when i and j cross

Quicksort issues • how to choose the pivot • the pivot element should be unlikely to be large or small (even for nonrandom input) • how to initialize i and j • what if i or j finds a copy of the pivot? • how to keep i and j from passing the end of the array • where does the pivot element go?

Weiss suggests: • letting the pivot be the median of the left, center, and right elements • sorting these three values in place • swapping the pivot with the element in position right-1

Weiss also says: • i and j should be advanced first, and then referenced • When either i or j sees the pivot element, it should stop • Explicit tests shouldn’t be needed to keep i and j from running off the end fo the array • instead, sentinels should be available • The pivot element should be swapped into position i

Bucket sort and radix sort • There's an important family of sorting algorithms that don’t depend on comparing pairs of elements • If the elements being sorted needn't all be distinct, these algorithms can run in time faster than n log n

Conditions for bucket sort • Bucket sort can be used when there is a function f that assigns indices to input elements so that if A <= B, f(A) <= f(B). • Here f is similar to a hash function. • It's used as an index into a table, where the table locations are called buckets. • However f is supposed to preserve regularity, while a hash function is supposed to destroy it.

Two special cases • For a character string s, f(s) can be the first character of s • or its character code • For an integer i, f(i) can be the leftmost digit of i • provided that integers are padded with leading 0s.

Bucket sort • The top-down bucket sort algorithm is then very simple: • assign elements to buckets • sort the buckets (perhaps recursively) • append the sorted buckets • For both strings and integers, recursive sorting of the buckets is possible • by ignoring the first character(s) or digit(s)

Radix sort • There’s also a bottom-up version of bucket sort called radix sort, which is easiest to state for character strings of the same length p: • for i from p down to 1 • for each string s, assign s to the bucket corresponding to its ith character • concatenate the buckets into an output list • clear each bucket • For b buckets, the time is Q(b+n) per iteration and thus Q(p(b+n)) overall

Radix sort details • Concatenation is easiest if linked lists are used for the individual buckets. • It is important that distribution into buckets be stable – elements should appear in the buckets in the order of the original input. • If strings have different lengths, they can be padded (explicitly or implicitly) with nulls on the right

Radix sort analysis • Note that if p and b are independent of n, then radix sort has Q(n) time complexity • However if p is independent of n, then there can be at most Q(bp) distinct strings. • So if all strings are distinct, then n is O(bp), so p is W(log n). • And thus the time complexity is W(n log n)

Selection using bucket sort • Top-down bucket sort can easily be converted to a selection algorithm • To find the kth smallest item, distribute the items into buckets, counting the number of buckets • Then select recursively from the appropriate bucket, replacing k by a value that depends on the counts of the preceding buckets

Radix sort example • To sort: • 123, 12, 313, 321, 212, 112, 221, 132, 131 • Pass 1 assignment to buckets: • 0: • 1: 321, 221, 131 • 2: 12, 212, 112, 132 • 3: 123, 313 • Concatenated result • 321, 221, 131, 12, 212, 112, 132, 123, 313

Pass 2 • From previous pass • 321, 221, 131, 212, 112, 132, 123, 313 • Pass 2 assignment to buckets: • 0: • 1: 12, 212, 112, 313 • 2: 321, 221, 123 • 3: 131, 132 • Concatenated result • 12, 212, 112, 313, 321, 221, 123, 131, 132

Pass 3 • From previous pass • 12, 212, 112, 313, 321, 221, 123, 131, 132 • Pass 3 assignment to buckets: • 0: 12 • 1: 112, 123, 131, 132 • 2: 212, 221 • 3: 313, 321 • Concatenated result • 12, 112, 123, 131, 132, 212, 221, 313, 321

Exploring Sorting Algorithms: Divide, Conquer, and Compare

Exploring Sorting Algorithms: Divide, Conquer, and Compare

Presentation Transcript

Sorting Algorithms

Sorting Algorithms

Sorting Algorithms

Sorting Algorithms

Sorting Algorithms

Sorting Algorithms

Sorting Algorithms

Sorting Algorithms

Sorting Algorithms

Sorting Algorithms

Sorting Algorithms

Sorting Algorithms

Sorting Algorithms

Analyzing sorting algorithms

Sorting Algorithms

Sorting algorithms

Sorting Algorithms

Sorting Algorithms

Sorting Algorithms