1 / 45

Prof. Neeraj Suri Constantin S ârbu Overview: Heapsort Proxmap Sort Counting Sort

Introduction to Computer Science 2. Lecture 12: Sorting (Part 2). Prof. Neeraj Suri Constantin S ârbu Overview: Heapsort Proxmap Sort Counting Sort. Recap: Sorting. Fundamental problem in computer science Methods already presented: Selection sort Insertion sort Bubblesort

baka
Télécharger la présentation

Prof. Neeraj Suri Constantin S ârbu Overview: Heapsort Proxmap Sort Counting Sort

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Computer Science 2 Lecture 12: Sorting (Part 2) Prof. Neeraj Suri Constantin Sârbu Overview: Heapsort Proxmap Sort Counting Sort

  2. Recap: Sorting • Fundamental problem in computer science • Methods already presented: • Selection sort • Insertion sort • Bubblesort • Shellsort ( O(n1,2) ) • Quicksort ( O(n•log n) ) • All have complexity O(n2) • Better: Quicksort (divide and conquer) with complexity O(n•log2n) on average

  3. Sorting methods: Overview sorting themes HW supported parallel sort:Systolic arrays address calculation sorting:Proxmapsort, Radixsort, Counting sort comparison-based sorting transposition sorting:Bubblesort O(n2) Insert and keep sorted:Insertion sort, Tree sort diminishing incrementsort: Shellsort priority queue sorting:Selection sort, heapsort O(n•log n) divide and conquer:Quicksort, Mergesort

  4. The Sorting Problem • Given: • A list of n elements (or data records) E1, E2, ... , En • Each element Ei contains a key Ki • We need: • “a sorted series”, more exactly: • A permutation (reordering)  of numbers 1 to n, such that if the series is sorted according to , then K(1)  K(2)  ...  K(n)

  5. Sorting: Basic Framework • Java-Class: class Element { int key; /* sorting key */ Info info; /* real information */ } • The elements are stored in an array : Element L[n-1]; • The cost: • The number of comparisons (of keys) • The number of swaps (of elements in the array)

  6. Logarithmic Sorting method • Sorting algorithms, that have complexity O(n2), are not useable for large number of keys • Even the costs of the Shellsort rise fast with the n, O(n1.25). • We already know a sorting method with complexity O(n log2n): Quicksort • But there are others methods of sorting in this class, that are based on tree structures introduced in the previous lecture

  7. Merge Sort • Important sorting algorithm • Based on “Divide and Conquer” • John Van Neuman (1945) • mergesort(array): • Divide the array in two (equal) halves • Run mergesort on both halves • Merge the result

  8. 8 4 6 2 7 1 3 5 8 4 3 5 6 2 7 1 4 3 5 6 2 7 1 8 Mergesort 6 2 7 1 8 4 3 5

  9. 1 2 3 4 5 6 7 8 3 4 1 2 6 7 5 8 4 8 3 5 2 6 1 7 Mergesort 4 3 5 6 2 7 1 8

  10. Mergesort • Complexity • Divide: O(1) • Recursion: 2T(n/2) • Merging: O(n) • Total: n log n • Mergesort used in Perl (after 5.8), Java 1.5

  11. Heapsort • Goal: Sorting algorithm, that works in O(n log2n) without additional memory requirements • Heapsort: classic sorting method (1964 J.W.J. William) • The algorithm is based on a binary tree, the Heap, and consists of two phase: • First build of the Heap • Progressively editing the Heap to sort it

  12. Heap • Definition: A binary tree B with n nodes is a Heap of size n, when: • B is nearly complete • The keys are sorted in such a way, that for each node i, Ki Kj, where node j is the parent node of the node i. • Important: a Heap is a nearly complete binary tree but not a binary search tree. Examples for Heaps

  13. Heap properties • In one Heap each subtree is a Heap • In a Heap the biggest element is in the root • No Heaps (violation of the first or of the second condition):

  14. The Heapsort Algorithm • The construction of the Heap from the input sequence • input sequence: 2, 9, 56, 77, 12, 13, 96, 17, 56, 44 • The initial tree, not a Heap just yet: Reminder: Array position n 2n 2n+1

  15. The Heap Construction • Fulfill the heap conditions by swapping leaves

  16. The Heap Construction • Fulfill the heap conditions by swapping leaves

  17. The Heap Construction • Fulfill the heap conditions by swapping leaves

  18. The Heap Construction (2) • Now: Heap conditions fulfilled, finish Heap.

  19. Sorting Procedure • After the construction of the heap follows the sorting procedure: • Extract the biggest element from the root, place it on the corresponding position (the “last” position) • Mark the elements that will be separated by the rest of the tree • Fulfill the heap conditions again Initial Heap Heap after the biggest element was extracted and marked

  20. Sorting Procedure (2)

  21. Sorting Procedure (3)

  22. Result • The tree can be stored in a sequential representation without additional memory (no holes) • The sorted sequence is obtained after the biggest element was extracted and the Heap conditions where fulfilled n times • Heap conditions fulfillment need O(log2n) time: • Analog with the transformation of other trees with limited transformation complexity • Fulfill the Heap conditions each time a tree was constructed • The complexity is proportional with the maximal length of the path to a leave. So the height of the tree. • Total run time O(n log2n) • Can we do any better?

  23. Lower Bounds on Sorting • What we’ve seen so far is comparison sort • Based on comparing elements • Assume only comparisons cost • Still we have a lower bound of O(n log n)! • But we can do better … • Proxmapsort • Counting sort

  24. Lower Bounds on Sorting • We must somehow compare all elements to get the sorting correct • What is the minimum number of comparisons that needs to be done? • Idea: we must do pair-wise comparisons • We could build a binary decision tree …

  25. a1:a2 > ≤ a2:a3 a1:a3 ≤ > ≤ > a2,a1,a3 a1,a2,a3 a1:a3 a2:a3 ≤ > ≤ > a1,a3,a2 a3,a1,a2 a2,a3,a1 a3,a2,a1 Lower Bounds on Sorting

  26. a1:a2 > ≤ a2:a3 a1:a3 ≤ > ≤ > a2,a1,a3 a1,a2,a3 a1:a3 a2:a3 ≤ > ≤ > a1,a3,a2 a3,a1,a2 a2,a3,a1 a3,a2,a1 Lower Bounds on Sorting • What is the maximum number of comparisons? • The height of the tree! • Maximum number of nodes is n! • All permutations of n numbers • Complete binary tree has 2h leaves 2h≥ n!  h ≥ log (n!)

  27. Linear Sorting • Idea: Do not base the sorting on comparisons! • Proxmap sort • Counting sort

  28. Sorting with address computation(Proxmap Sort) • Basic idea (the same as counting sort): • With the help of an address computation, i.e. a mapping f(K)= address, a key is shifted in the first pass to the proximity of the final destination. • The final sorting is done using local fine grained sorting Steps: • Definition of the mapping • Assignment of the neighbors • The interchange of the keys • Fine grained sorting

  29. Example • Mapping: MapKey(K) =  K 

  30. Possible Improvements No linked lists but reserve areas in Array • One passes the array once and adopt MapKey • One count the frequency of the mapped values (hit counts): • H[0] = 1 • H[1] = 3 • H[2] = 0 etc. • Reserve areas in Array, that are proportional with the frequency: • 0-Region starts at A[0] • 1-Region starts at A[1] • 3-Region starts at A[4] etc.

  31. Possible Improvements (2) Placing the keys: • Placing in a second Array: • easier and faster, but demands double memory • In situ placing • more complicated (key marking) and slower, but demands only one array Better mapping: • Similar problem with hashing (later) • Must find coding for non-numeric keys • Mapping possible without generating collisions • Each collision means a local sorting (for example insertion sort)

  32. Execution Time The worst case • All the keys are mapped to the same address • degenerate to insertion sort, O(n2) • The best case • Perfect mapping, i.e. each key will be mapped to a different address • One pass to apply MapKey and to generate hit counts • One pass to generate the ProxMap • One pass to copy the key from the input array in the final position • One pass of the ProxMap • Total O(n) !!!

  33. Execution Time Average case: • We divide n keys in areas with c keys and obtain n/c areas • Each area must be sorted with insertion sort with O(c2) • Total cost is (n/c) O(c2), so (n/c) (ac2 + bc + c) = nac + nb + nd/c • For c << n is then O(n) • Linear execution time for Proxmap Sort can be experimentally verified

  34. Counting Sort • Assumption: sort n integers in the range 0..k • Idea: find the number of elements less than a number i, then we know the final position of i • Needed: 3 arrays • A[1..n]: the input • B[1..n]: the output • C[1..k]: help array (initialized to 0)

  35. Counting Sort 5 2 7 1 2 4 3 5 A Step 1: Count the occurrences of the elements, store in C 1 2 3 4 5 6 7 0 1 1 1 2 1 1 2 C Step 2: Count the number of elements less than or equal to i, update C 1 2 3 4 5 6 7 7 8 1 1 3 4 5 7 C

  36. Counting Sort 5 2 7 1 2 4 3 5 A 1 2 3 4 5 6 7 7 8 1 1 3 4 5 7 C Step 3: For each element A[n..1], place it at B[C[i]] and do C[i]-- 1 2 3 4 5 6 7 8 B

  37. Counting Sort 5 2 7 1 2 4 3 5 A 1 2 3 4 5 6 7 7 8 1 1 3 4 5 6 C Step 3: For each element A[n..1], place it at B[C[i]] and do C[i]-- 1 2 3 4 5 6 7 8 5 B

  38. Counting Sort 5 2 7 1 2 4 3 5 A 1 2 3 4 5 6 7 7 8 1 1 3 3 5 6 C Step 3: For each element A[n..1], place it at B[C[i]] and do C[i]-- 1 2 3 4 5 6 7 8 3 5 B

  39. Counting Sort 5 2 7 1 2 4 3 5 A 1 2 3 4 5 6 7 7 8 1 1 3 3 4 6 C Step 3: For each element A[n..1], place it at B[C[i]] and do C[i]-- 1 2 3 4 5 6 7 8 3 4 5 B

  40. Counting Sort 5 2 7 1 2 4 3 5 A 1 2 3 4 5 6 7 7 8 1 1 2 3 4 6 C Step 3: For each element A[n..1], place it at B[C[i]] and do C[i]-- 1 2 3 4 5 6 7 8 2 3 4 5 B

  41. Counting Sort 5 2 7 1 2 4 3 5 A 1 2 3 4 5 6 7 7 8 0 1 2 3 4 6 C Step 3: For each element A[n..1], place it at B[C[i]] and do C[i]-- 1 2 3 4 5 6 7 8 1 2 3 4 5 B

  42. Counting Sort 5 2 7 1 2 4 3 5 A 1 2 3 4 5 6 7 7 7 0 1 2 3 4 6 C Step 3: For each element A[n..1], place it at B[C[i]] and do C[i]-- 1 2 3 4 5 6 7 8 1 2 3 4 5 7 B

  43. Counting Sort 5 2 7 1 2 4 3 5 A 1 2 3 4 5 6 7 7 7 1 0 1 3 4 6 C Step 3: For each element A[n..1], place it at B[C[i]] and do C[i]-- 1 2 3 4 5 6 7 8 1 2 2 3 4 5 7 B

  44. Counting Sort 5 2 7 1 2 4 3 5 A 1 2 3 4 5 6 7 7 7 1 0 1 3 4 5 C Step 3: For each element A[n..1], place it at B[C[i]] and do C[i]-- 1 2 3 4 5 6 7 8 1 2 2 3 4 5 5 7 B

  45. Counting Sort • Costs: • Step 1 in Θ(n) [count occurrences] • Step 2 in Θ(k) [sum up the elements ≤ i] • Step 3 in Θ(n) [place the elements in B] • Overall Θ(k+n) • Usually in practice k=O(n) so • Complexity is Θ(n) • Note the stability of the sorting order! • Equal elements appear in the same order as in the input

More Related