1 / 8

7.11 External Sorting

This article discusses external sorting methods to efficiently handle large datasets that exceed internal memory limits. It explains how access to secondary storage is significantly slower than memory access and presents strategies to minimize this access. Through clear examples, it elaborates on a simple merge sort with multiple tapes, demonstrating the concept of runs and passes needed for sorting. Additionally, it explores multiway merge techniques with k input devices, detailing the calculations required for the number of passes necessary based on available memory and record size.

yan
Télécharger la présentation

7.11 External Sorting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 7.11 External Sorting • Access to secondary storage is orders of magnitude slower than memory access. • Minimize access to secondary storage (tape or disk). • Also may want to read data sequentially (tapes).

  2. 7.11 External Sorting • Simple merge example - sorting M records at a time (M=3), with 4 tapes (Ta1,Ta2, Tb1, Tb2) Ta1 81 94 11 ; 96 12 35 ; 17 99 28 ; 58 41 75 ; 15 Ta2 Tb1, Tb2 empty

  3. 7.11 External Sorting Ta1, Ta2 empty Tb1 11 81 94 ; 17 28 99 ; 15 Tb2 12 35 96 ; 41 58 75 Ta1 11 12 35 81 94 96 ; 15 Ta2 17 28 41 58 75 99 Tb1, Tb2 empty

  4. 7.11 External Sorting • read M records at a time and sort internally • a set of sorted records is called a run • it will require log(N/M) passes, plus the initial run-constructing pass • given 10 million records of 128 bytes, and 4 M bytes of internal memory N=10*106, M=4*106/128, # of runs = N/M = 320 # of passes = log(N/M) + 1= 10

  5. 7.11 External Sorting Ta1, Ta2 empty Tb1 11 12 17 28 35 41 58 75 81 94 96 99 Tb2 15 Ta1 11 12 15 17 28 35 41 58 75 81 94 96 99 Ta2 Tb1, Tb2 empty

  6. 7.11 External Sorting • Multiway Merge • k inputdevices instead of just 2 • e.g, k=3 for the previous example Ta1 81 94 11 ; 96 12 35 ; 17 99 28 ; 58 41 75 ; 15 Ta2 Ta3 Tb1, Tb2, Tb3 empty

  7. 7.11 External Sorting Ta1, Ta2, Ta3 empty Tb1 11 81 94 ; 41 58 75 Tb2 12 35 96 ; 15 Tb3 17 28 99 Ta1 11 12 17 28 35 81 94 96 99 Ta2 15 41 58 75 Ta3 Tb1, Tb2, Tb3 empty

  8. 7.11 External Sorting Ta1, Ta2, Ta3 empty Tb1 11 12 15 17 28 35 41 58 75 81 94 96 99 Tb2 , Tb3 empty • it will require logk(N/M) passes, plus the initial run-constructing pass • for N=10*106, M=4*106/128, # of passes = log5(10*128/4) + 1= 5 Skip rest of Chapter 7

More Related