1 / 52

Introduction to Algorithms

Introduction to Algorithms. Chapter 7: Linear-Time Sorting Algorithms. Sorting So Far. Insertion sort: Easy to code Fast on small inputs (less than ~50 elements) Fast on nearly-sorted inputs O(n 2 ) worst case O(n 2 ) average (equally-likely inputs) case O(n 2 ) reverse-sorted case.

sheena
Télécharger la présentation

Introduction to Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Algorithms Chapter 7: Linear-Time Sorting Algorithms

  2. Sorting So Far • Insertion sort: • Easy to code • Fast on small inputs (less than ~50 elements) • Fast on nearly-sorted inputs • O(n2) worst case • O(n2) average (equally-likely inputs) case • O(n2) reverse-sorted case

  3. Sorting So Far • Merge sort: • Divide-and-conquer: • Split array in half • Recursively sort subarrays • Linear-time merge step • O(n lg n) worst case • Doesn’t sort in place

  4. Sorting So Far • Heap sort: • Uses the very useful heap data structure • Complete binary tree • Heap property: parent key > children’s keys • O(n lg n) worst case • Sorts in place • Fair amount of shuffling memory around

  5. Sorting So Far • Quick sort: • Divide-and-conquer: • Partition array into two subarrays, recursively sort • All of first subarray < all of second subarray • No merge step needed! • O(n lg n) average case • Fast in practice • O(n2) worst case • worst case on sorted input • Address this with randomized quicksort

  6. How Fast Can We Sort? • We will provide a lower bound, then beat it • How do you suppose we’ll beat it? • First, an observation: all of the sorting algorithms so far are comparison sorts • The only operation used to gain ordering information about a sequence is the pairwise comparison of two elements • Theorem: all comparison sorts are (n lg n) • A comparison sort must do O(n) comparisons (why?) • What about the gap between O(n) and O(n lg n)

  7. Decision Trees • Decision trees provide an abstraction of comparison sorts • A decision tree represents the comparisons made by a comparison sort. Every thing else ignored • What do the leaves represent? • How many leaves must there be?

  8. >  a1:a2 a1:a3 a2:a3  >  a1:a3 a2:a3 <1 2 3> <3 2 1>  >  > <1 3 2> <3 1 2> <2 1 3> <2 3 1> Lower Bound of Comparison Sort Decision Tree Model Represent a comparisons performed by a sorting algorithm when it operates on an input of a given size

  9. Decision Trees • Decision trees can model comparison sorts. For a given algorithm: • One tree for each n • Tree paths are all possible execution traces • What’s the longest path in a decision tree for insertion sort? For merge sort? • What is the asymptotic height of any decision tree for sorting n elements? • Answer: (n lg n)

  10. Lower Bound For Comparison Sorts • Thus the time to comparison sort n elements is (n lg n) • Heapsort and Mergesort are asymptotically optimal comparison sorts • But the name of this lecture is “Sorting in linear time”! • How can we do better than (n lg n)?

  11. Sorting In Linear Time • The sorting algorithms we introduced so far determine the sort order based only on comparisons between the input elements (comparison sorts). • We will examine three sorting algorithms: counting sort, radix sort, and bucket sort, that run in linear time • These algorithms use operations time. other than comparisons to determine the sorted order.

  12. Sorting In Linear Time • Counting sort • No comparisons between elements! • But…depends on assumption about the numbers being sorted • We assume numbers are in the range 1..k • The algorithm: • Input: A[1..n], where A[j]  {1, 2, 3, …, k} • Output: B[1..n], sorted (notice: not sorting in place) • Also: Array C[1..k] for auxiliary storage

  13. Counting sort • Counting sort assumes that each of the n input elements is an integer in the range 0 to k, for some integer k. • The basic idea of counting sort is to determine, for each input element x, the number of elements less than x. This information can be used to place element x directly into its position in the output array. • In the code for counting sort, we assume that the input is an array [1 … n], and thus length [A]= n. • We require two other arrays: the array B[1 …n] holds the sorted output, and the array C[0 … k] provides temporary working storage.

  14. Counting Sort Andy Cora Bill Dora Edith Fred Gina Herb Ian Joe Kay n Array Elements, k Distinct Keys

  15. Count Number of Each Key Andy Cora Bill Dora Edith Fred Gina Herb Ian Joe Kay Single Pass through array, time O(n) 4 1 3 3

  16. Change to Cumulative Counts Andy Cora Bill Dora Edith Fred Gina Herb Ian Joe Kay Single Pass through count table, time O(k) 4 5 8 11

  17. Copy to Second Array Andy Cora Bill Dora Edith Fred Gina Herb Ian Joe Kay Kay 4 Single Pass through array, starting on right. At each step decrement an entry in the count table. Time O(n). 5->4 8 11

  18. Andy Cora Bill Dora Edith Fred Gina Herb Ian Joe Kay Kay Joe 43 4 8 11

  19. Andy Cora Bill Dora Edith Fred Gina Herb Ian Joe Kay Ian Kay Joe 3 4 87 11

  20. Andy Cora Bill Dora Edith Fred Gina Herb Ian Joe Kay Ian Herb Kay Joe 32 4 7 11

  21. Andy Cora Bill Dora Edith Fred Gina Herb Ian Joe Kay Gina Ian Herb Kay Joe 2 4 76 11

  22. Andy Cora Bill Dora Edith Fred Gina Herb Ian Joe Kay Gina Ian Herb Kay Joe Fred 2 4 6 1110

  23. Andy Cora Bill Dora Edith Fred Gina Herb Ian Joe Kay Gina Ian Herb Kay Joe Edith Fred 2 4 6 109

  24. Andy Cora Bill Dora Edith Fred Gina Herb Ian Joe Kay Gina Ian Dora Herb Kay Joe Edith Fred 21 4 6 9

  25. Andy Bill Dora Edith Fred Gina Herb Ian Joe Kay Cora Cora Gina Ian Dora Herb Kay Joe Edith Fred 1 4 65 9

  26. Andy Bill Dora Edith Fred Gina Herb Ian Joe Kay Cora Cora Gina Ian Dora Herb Kay Bill Joe Edith Fred 1 4 5 98

  27. The Sorting Algorithm is stable---original order maintained among records with equal keys. Total running time is O(n+k). If k is constant and small compared to n, this is O(n) sorting algorithm. Andy Bill Dora Edith Fred Gina Herb Ian Joe Kay Cora Andy Cora Gina Ian Dora Herb Kay Bill Joe Edith Fred 10 4 5 8

  28. Counting Sort 1 CountingSort(A, B, k) 2 for i=1 to k 3 C[i]= 0; 4 for j=1 to n 5 C[A[j]] += 1; 6 for i=2 to k 7 C[i] = C[i] + C[i-1]; 8 for j=n downto 1 9 B[C[A[j]]] = A[j]; 10 C[A[j]] -= 1;

  29. Takes time O(k) Takes time O(n) Counting Sort 1 CountingSort(A, B, k) 2 for i=1 to k 3 C[i]= 0; 4 for j=1 to n 5 C[A[j]] += 1; 6 for i=2 to k 7 C[i] = C[i] + C[i-1]; 8 for j=n downto 1 9 B[C[A[j]]] = A[j]; 10 C[A[j]] -= 1; What will be the running time?

  30. Counting Sort • Total time: O(n + k) • Usually, k = O(n) • Thus counting sort runs in O(n) time • But sorting is (n lg n)! • No contradiction--this is not a comparison sort (in fact, there are no comparisons at all!) • Notice that this algorithm is stable

  31. Counting Sort • Cool! Why don’t we always use counting sort? • Because it depends on range k of elements • Could we use counting sort to sort 32 bit integers? Why or why not? • Answer: no, k too large (232 = 4,294,967,296)

  32. Counting Sort Non-comparison sort. Precondition: n numbers in the range 1..k. Key ideas: For each x count the number C(x) of elements ≤ x Insert x at output position C(x) and decrement C(x).

  33. 1 2 3 4 5 6 7 2 2 2 2 1 0 2 Auxiliary storage C[1..k] 1 2 3 4 5 6 7 2 4 6 8 9 9 11 C An Example 1 2 3 4 5 6 7 8 9 10 11 Input 7 1 3 1 2 4 5 7 2 4 3 A[1..n] k for i= 2 to 7 do C[i] =C[i] +C[i–1] two 1’s in A 6 elements ≤ 3

  34. 1 2 3 4 5 6 7 8 9 10 11 Output B[1..n] 1 2 3 4 5 6 7 C 2 4 6 8 9 9 11 1 2 3 4 5 6 7 2 4 5 8 9 9 11 C Example (continued) A[1..n] 3 B[6] =B[C[3]] =B[C[A[11]]] =A[11] = 3 C[A[11]] =C[A[11]] – 1

  35. 1 2 3 4 5 6 7 2 4 5 7 9 9 11 C Example (cont’d) 1 2 3 4 5 6 7 8 9 10 11 3 4 B B[C[A[10]]] =A[10] = 4 1 2 3 4 5 6 7 C 2 4 5 8 9 9 11 C[A[10]] =C[A[10]]– 1

  36. Example (cont’d) 1 2 3 4 5 6 7 8 9 10 11 2 3 4 4 5 7 B B[C[A[6]]] =A[6] = 4 1 2 3 4 5 6 7 C 2 3 5 7 8 9 10 C[A[6]] =C[A[6]] – 1 1 2 3 4 5 6 7 2 3 5 6 8 9 10 C

  37. Stability Counting sort is stable: Input order maintained among items with equal keys. Stability is important because data are often carried with the keys being sorted. radix sort (which uses counting sort as a subroutine) relies on it to work correctly. How is stability realized? for j = length[A] downto 1 do B[C[A[j]]] =A[j] C[A[j]] =C[A[j]] – 1

  38. Radix Sort • How did IBM get rich originally? • Answer: punched card readers for census tabulation in early 1900’s. • In particular, a card sorter that could sort cards into different bins • Each column can be punched in 12 places • Decimal digits use 10 places • Problem: only one column can be sorted on at a time

  39. Pass 1: Radix Sort Sort a set of numbers in multiple passes, starting from the rightmost digit, then the 10’s digit, then the 100’s digit, etc. Example: sort 23, 45, 7, 56, 20, 19, 88, 77, 61, 13, 52, 39, 80, 2, 99 99 77 80 2 13 39 7 88 20 61 52 23 45 56 19 Pass 2: 88 7 23 56 19 61 77 80 99 20 52 2 13 39 45

  40. Analysis of Radix Sort Correctness follows by induction on the number of passes. Sort n d-digit numbers. Let k be the range of each digit. Each pass takes time (n+k). // use counting sort There are d passes in total. The running time for radix sort is(dn+dk). Linear running time when d is a constant and k = O(n).

  41. Radix Sort • you might sort on the most significant digit, then the second msd, etc…. try it. • Key idea: sort the least significant digit first RadixSort(A, d) for i=1 to d StableSort(A) on digit I • Problem: sort 1 million 64-bit numbers • Treat as four-digit radix 216 numbers • Can sort in just four passes with radix sort!

  42. Radix Sort • Can we prove it will work? • Assume lower-order digits {j: j<i}are sorted • Show that sorting next digit i leaves array correctly sorted • If two digits at position i are different, ordering numbers by that digit is correct (lower-order digits irrelevant) • If they are the same, numbers are already sorted on the lower-order digits. Since we use a stable sort, the numbers stay in the right order

  43. Radix Sort • What sort will we use to sort on digits? • Counting sort is obvious choice: • Sort n numbers on digits that range from 1..k • Time: O(n + k) • Each pass over n numbers with d digits takes time O(n+k), so total time O(dn+dk) • When d is constant and k=O(n), takes O(n) time

  44. Radix Sort • In general, radix sort based on counting sort is • Fast • Asymptotically fast (i.e., O(n)) • Simple to code • A good choice

  45. Another Radix Sort Example

  46. Bucket Sort • Assumption: input - n real numbers from [0, 1) • Basic idea: • Create n linked lists (buckets) to divide interval [0,1) into subintervals of size 1/n • Add each input element to appropriate bucket and sort buckets with insertion sort • Uniform input distribution  O(1) bucket size • Therefore the expected total time is O(n)

  47. Bucket Sort • Bucket-Sort(A) n length(A) fori 0 to n do insert A[i] into list B[floor(n*A[i])] fori 0 to n –1 do Insertion-Sort(B[i]) • Concatenate lists B[0], B[1], … B[n –1] in order

  48. Bucket Sort

  49. .12 .17 .21 .23 .26 .39 .68 .72 .78 .94 Bucket Sort Example 0 1 .12 .17 2 .21 .23 .26 3 .39 4 5 6 .68 7 .72 .78 8 9 .94 Bucket i holds values in the half-open interval [i/10, (i + 1)/10).

More Related