1 / 157

Introduction to Sorting Methods

Introduction to Sorting Methods. Basics of Sorting Elementary Sorting Algorithms Selection sort Insertion sort Shellsort. Sorting.

lakia
Télécharger la présentation

Introduction to Sorting Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Sorting Methods • Basics of Sorting • Elementary Sorting Algorithms • Selection sort • Insertion sort • Shellsort

  2. Sorting • Given nrecords, R1 … Rn , called a file. Each record Ri has a key Ki, and may also contain other (satellite) information. The keys can be objects drawn from an arbitrary set on which equality is defined. There must be an order relation defined on keys, which satisfy the following properties: • Trichotomy: For any two keys a and b, exactly one of a < b, a = b, or a > b is true. • Transitivity: For any three keys a, b, and c, if a < b and b < c, then a < c. The relation > is a total ordering (linear ordering) on keys.

  3. Basic Definitions • Sorting: determine a permutation P = (p1, … , pn) of n records that puts the keys in non-decreasing order Kp1 < … <Kpn. • Permutations: a one-to-one function from {1, …, n} onto itself. There are n! distinct permutation of n items. • Rank: Given a collection of n keys, the rank of key is the number of keys that are  than it. That is, rank(Kj) = |{Ki| Ki <Kj}|. If the keys are distinct, then the ranks of a key gives its position in the output file.

  4. Terminology • Internal (the file is stored in main memory and can be randomly accessed) vs.External (the file is stored in secondary memory & can be accessed sequentially only) • Comparison-based sort: where it uses only the relation among keys, not any special property of the presentation of the keys themselves • Stable sort: records with equal keys retain their original relative order; i.e. i < j&Kpi =Kpj  pi <pj • Array-based (consective keys are stored in consecutive memory locations) vs. List-based sort (may be stored in nonconsecutive locations in a linked manner) • In-place sort: it needs a constant amount of extra space in addition to that needed to store keys

  5. Elementary Sorting Methods • Easier to understand the basic mechanisms of sorting • Maybe more suitable for small files • Good for well-structured files that are relatively easy to sort, such as those almost sorted • Can be used to improve efficiency of more powerful methods

  6. Sorting Categories • Sorting by Insertion insertion sort, shellsort • Sorting by Exchange bubble sort, quicksort • Sorting by Selection selection sort, heapsort • Sorting by Merging merge sort • Sorting by Distribution radix sort

  7. Selection Sort 1. for i = n downto 2 do { 2. max  i 3. for j = i - 1 downto 1 do{ 4. if A[max] < A[j] then 5. max  j 6. } 7. t  A[max] 8. A[max]  A[i] 9. A[i]  t 10. }

  8. Algorithm Analysis • In-place sort • Not stable • The number of comparison is (n2) in the worst case, but it can be improved by a sequence of modifications, which leads to heapsort (see next lecture).

  9. Insertion Sort 1. for j = 2 to n do { 2. key  A[j] 3. i  j - 1 4. while i > 0 and key < A[i] { 5. A[i+1]  A[i] 6. i  i - 1 7. } 8. A[i+1]  key 9. }

  10. Algorithm Analysis • In-place sort • Stable • If A is sorted: (n) comparisons • If A is reversed sorted: (n2) comparisons • If A is randomly sorted: (n2) comparisons

  11. Worst Case Analysis • The maximum number of comparison while inserting A[i] is (i-1). So, the number of comparison is Cwc(n)  i = 2 to n (i -1)  j = 1 to n-1 j = n(n-1)/2 = (n2)

  12. Average Case Analysis • Consider when we try to insert the key A[i]. There are i places where it can end up after insertion. Assume all possibilities are equally likely with probability of 1/i. Then, the average number of comparisons to insert A[i] is j = 1 to i-1[ 1/i * j ] + 1/i * (i - 1) = (i+1)/2 - 1/i • Summing over insertion of all keys, we get Cavg(n) =  i = 2 to n[(i+1)/2 - 1/i] = n2/4 + 3n/4 - 1 - ln n = (n2) • Therefore, Tavg(n) = (n2)

  13. Analysis of Inversions in Permutation • Worst Case: n(n-1)/2 inversions • Average Case: • Consider each permutation  and its transpose permutationT. Given any, T is unique and T • Consider the pair (i, j) with i < j, there are n(n-1)/2 pairs. • (i, j) is an inversion of  if and only if (n-j, n-i) is not an inversion of T. This implies that the pair (, T) together have n(n-1)/2 inversions.  The average number of inversions is n(n-1)/4.

  14. Theorem • Any algorithm that sorts by comparison of keys and removes at most one inversion after each comparison must do at least n(n-1)/2 comparisons in the worst case and at least n(n-1)/4 comparisons on the average.  If we want to do better than (n2), we have to remove more than a constant number of inversions with each comparison.

  15. Insertion Sort to Shellsort • Shellsort is a simple extension of insertion sort. It gains speed by allowing exchanges with elements that are far apart. • The idea is that rearrangement of the file by taking every hth element (starting anywhere) yield a sorted file. Such a file is “h-sorted”. A “h-sorted” file is h independent sorted files, interleaved together. • By h-sorting for some large values of “increment” h, we can move records far apart and thus make it easier for h-sort for smaller values of h. Using such a procedure for any sequence of values of h which ends in 1 will produce a sorted file.

  16. Shellsort • A family of algorithms, characterized by the sequence {hk} of increments that are used in sorting. • By interleaving, we can fix multiple inversions with each comparisons, so that later passes see files that are “nearly sorted”. This implies that either there are many keys not too far from their final position, or only a small number of keys are far off.

  17. Shellsort 1. h  1 2. While h  n { 3. h  3h + 1 4. } 5. repeat 6. h  h/3 7. for i = h to n do { 8. key  A[i] 9. j  i 10. while key < A[j - h] { 11. A[j]  A[j - h] 12. j  j - h 13. if j < h then break 14. } 15. A[j]  key 16. } 17. until h  1

  18. Algorithm Analysis • In-place sort • Not stable • The exact behavior of the algorithm depends on the sequence of increments -- difficult & complex to analyze the algorithm. • For hk = 2k - 1, T(n) = (n3/2 )

  19. Heapsort • Heapsort • Data Structure • Maintain the Heap Property • Build a Heap • Heapsort Algorithm • Priority Queue

  20. Heap Data Structure • Construct in (n)time • Extract maximum element in (lg n) time • Leads to (n lg n)sorting algorithm: • Build heap • Repeatedly extract largest remaining element (constructing sorted list from back to front) • Heaps useful for other purposes too

  21. Properties • Conceptually a complete binary tree • Stored as an array • Heap Property: for every node i other than the root, A[Parent(i)]A[i] • Algorithms maintain heap property as data is added/removed

  22. Array Viewed as Binary Tree • Last row filled from left to right

  23. Basic Operations • Parent(i) return i/2 • Left(i) return2i • Right(i) return2i+1

  24. Height • Height of a nodein a tree: the number of edges on the longest simple downward path from the node to a leaf • Height of a tree: the height of the root • Height of the tree for a heap: (lg n) • Basic operations on a heap run in O(lg n) time

  25. Maintaining Heap Property

  26. Heapify (A, i) 1. l  left(i) 2. r  right(i) 3. if l  heap-size[A] and A[l] > A[i] 4. then largest  l 5. elselargest  i 6.if r  heap-size[A] and A[r] > A[largest] 7. then largest r 8. if largest i 9. then exchange A[i]  A[largest] 10.Heapify(A, largest)

  27. (1) + T(largest) Running Time for Heapify(A, i) T(i) = 1. l  left(i) 2. r  right(i) 3. if l  heap-size[A] and A[l] > A[i] 4. then largest  l 5. elselargest  i 6.if r  heap-size[A] and A[r] > A[largest] 7. then largest r 8. if largest i 9. then exchange A[i]  A[largest] 10.Heapify(A, largest)

  28. Running Time for Heapify(A, n) • So, T(n)= T(largest) + (1) • Also, largest  2n/3 (worst case occurs when the last row of tree is exactly half full) • T(n) T(2n/3) + (1)  T(n)= O(lg n) • Alternately, Heapify takes O(h)where h is the height of the node where Heapify is applied

  29. Build-Heap(A) 1. heap-size[A] length[A] 2. for ilength[A]/2downto 1 3. do Heapify(A, i)

  30. Running Time • The time required by Heapify on a node of height h is O(h) • Express the total cost of Build-Heap as h=0 to lgn n / 2h+1 O(h)= O(n h=0tolgn h/2h ) And,h=0to  h/2h = (1/2)/(1-1/2)2= 2 Therefore,O(n h=0to lgn h/2h) = O(n) • Can build a heap from an unordered array in linear time

  31. Heapsort (A) 1. Build-Heap(A) 2. forilength[A] downto 2 3. do exchange A[1]  A[i] 4.heap-size[A] heap-size[A] - 1 5. Heapify(A, 1)

  32. Algorithm Analysis • In-place • Not Stable • Build-Heap takes O(n) and each of the n-1 calls to Heapify takes time O(lg n). • Therefore, T(n) = O(n lg n)

  33. Priority Queues • A data structure for maintaining a set S of elements, each with an associated value called a key. • Applications: scheduling jobs on a shared computer, prioritizing events to be processed based on their predicted time of occurrence. • Heap can be used to implement a priority queue.

  34. Basic Operations • Insert(S, x) - inserts the element x into the set S, i.e. S  S  {x} • Maximum(S) - returns the element ofSwith the largest key • Extract-Max(S) - removes and returns the element of S with the largest key

  35. Heap-Extract-Max(A) 1. if heap-size[A] < 1 2. then error “heap underflow” 3. maxA[1] 4.A[1]A[heap-size[A]] 5.heap-size[A] heap-size[A] - 1 6. Heapify(A, 1) 7. return max

  36. Heap-Insert(A, key) 1.heap-size[A] heap-size[A] + 1 2.iheap-size[A] 3. while i > 1 andA[Parent(i)] < key 4. doA[i]A[Parent(i)] 5.i Parent(i) 6.A[i]key

  37. Running Time • Running time ofHeap-Extract-Max is O(lg n). • Performs only a constant amount of work on top of Heapify, which takes O(lg n)time • Running time ofHeap-Insert is O(lg n). • The path traced from the new leaf to the root has length O(lg n).

  38. Examples

  39. QuickSort • Divide: A[p…r] is partitioned (rearranged) into two nonempty subarrays A[p…q] and A[q+1…r] s.t. each element of A[p…q] is less than or equal to each element of A[q+1…r]. Index q is computed here. • Conquer: two subarrays are sorted by recursive calls to quicksort. • Combine: no work needed since the subarrays are sorted in place already.

  40. Quicksort (A, p, r) 1. if p < r 2. then qPartition(A, p, r) 3. Quicksort(A, p, q) 4. Quicksort(A, q+1, r) * In place, not stable

  41. Partition(A, p, r) 1. x  A[p] 2. i  p - 1 3. j  r + 1 4. while TRUE 5. do repeat j  j - 1 6. untilA[j]  x 7.repeat i i + 1 8. untilA[i]  x 9. if i < j 10. then exchange A[i]  A[j] 11. else return j

  42. Example: Partitioning Array

  43. Algorithm Analysis The running time of quicksort depends on whether the partitioning is balanced or not. • Worst-Case Performance (unbalanced): T(n) = T(1) + T(n-1) + (n)  partitioning takes (n) = k = 1 to n(k)  T(1) takes (1) time & reiterate =  (  k = 1 to nk ) = (n2) * This occurs when the input is completely sorted.

  44. Worst Case Partitioning

  45. Best Case Partitioning

  46. Analysis for Best Case Partition • When the partitioning procedure produces two regions of size n/2, we get the a balanced partition with best case performance: T(n) = 2T(n/2) + (n) So, T(n) = (nlog n) • Can it perform better than O(n logn) on any input?

  47. Average-Case Behavior • For example, when the partitioning algorithm always produces a 7-to-3 proportional split: T(n) = T(7n/10) + T(3n/10) +n Solve the recurrence by visualizing recursion tree, each level has a cost of n with the height of lg n. So, we get T(n) = (nlog n) when the split has constant proportionality. • For a split of proportionality , where 0    1/2, the minimum depth of the tree is - lg n / lg  and the maximum depth is - lg n / lg (1-  ).

  48. Average-Case Splitting The combination of good and bad splits would result in T(n) = (nlog n), but with slightly larger constant hidden by the O-notation. (Rigorous average-case analysis later)

  49. Randomized Quicksort • An algorithm is randomizedif its behavior is determined not only by the input but also by values produced by a random-number generator. No particular input elicits worst-case behavior. Two possible versions of quicksort: • Impose a distribution on input to ensure that every permutation is equally likely. This does not improve the worst-case running time, but makes run time independent of input ordering. • Exchange A[p] with an element chosen at random from A[p…r] in Partition. This ensures that the pivot element is equally likely to be any of input elements.

  50. Randomized-Partition(A, p, r) 1. i  Random (p, r) 2. exchange A[p]  A[i] 3. return Partition(A, p, r)

More Related