Parallel Sorting Algorithms and Concepts in Computer Science

CS 484

Sorting • One of the most common operations • Definition: • Arrange an unordered collection of elements into a monotonically increasing or decreasing order. • Two categories of sorting • internal (fits in memory) • external (uses auxiliary storage)

Sorting Algorithms • Comparison based • compare-exchange • O(n log n) • Noncomparison based • Uses known properties of the elements • O(n) - bucket sort etc.

Parallel Sorting Issues • Input and Output sequence storage • Where? • Local to one processor or distributed • Comparisons • How compare elements on different nodes • # of elements per processor • One (compare-exchange --> comm.) • Multiple (compare-split --> comm.)

Parallel Sorting Algorithms • Merge Sort • Quick Sort • Bitonic Sort • Others …

Merge Sort • Simplest parallel sorting algorithm? • Steps • Distribute the elements • Everybody sort their own sequence • Merge the lists • Problem • How to merge the lists

Quicksort • Simple, low overhead • O(n log n) • Divide and conquer • Divide recursively into smaller subsequences.

Quicksort • n elements stored in A[1…n] • Divide • Divide a sequence into two parts • A[q…r] becomes A[q…s] and A[s+1…r] • make all elements of A[q…s] smaller than or equal to all elements of A[s+1…r] • Conquer • Recursively apply Quicksort

Quicksort • Partition the sequence A[q…r] by picking a pivot. • Performance is greatly affected by the choice of the pivot. • If we pick a bad pivot, we end up with a O(n2) algorithm.

Parallelizing Quicksort • Task parallelism • At each step of the algorithm 2 recursive calls are made. • Farm out one of the recursive calls to another processor. • Problems • The work of partitioning is done by one processor.

Parallelizing Quicksort • Consider domain decomposition. • Hypercube • a d dimensional hypercube can be split into two (d-1) dimensional hypercubes such that each processor in one cube is connected to one in the other cube. • If all processors know the pivot, neighbors split their respective lists and all elements larger than the pivot are distributed to one subcube and smaller elements are distributed to the other subcube

Parallelizing Quicksort • After we go through each dimension, if n>p the numbers are not totally sorted. • Why? • Each processor then sorts their own sublist using a sequential quicksort. • Pivot selection is particularly important • Bad pivots eliminate some processors

Pivot Selection • Random selection • During the ith split one of the processors in each subcube picks a random element from its list and broadcasts to others. • Problem • What if a bad pivot is selected at first?

Pivot Selection • Median selection • If the distribution is uniform then each processor's list is a representative sample thus the median is representative • Problem • Is the distribution really uniform? • Can we assume that a single processor's list has the same distribution as the full list?

Procedure HypercubeQuickSort(B) sort B using sequential quicksort for I = 1 to d Select pivot and broadcast or receive pivot partition B into B1 and B2 such that B1<= pivot < B2 if ith bit of iproc is zero then send B2 to neighbor along ith dimension C = subsequence received along ith dimension Merge B1 and C into B else send B1 to neighbor along C = subsequence received along ith dimension Merge B2 and C into B endif endfor

Analysis • Iterations = log2p • Select a pivot = O(n) • keep sublist sorted • Broadcast pivot = O(log2p) • Split the sequence • split own sequence = O(log n/p) • exchange blocks with neighbor = O(n/p) • merge blocks = O(n/p)

Hypercube Quicksort Model • Execution Time = MyPortionSortTime + NumSteps * (PivotSelection + Exchange + CompareData) • Execution Time = n/p * log2(n/p) * CompareTime + log2(p) * ((latency + 1/bandwidth) + 2*(latency + n/(p*bandwidth) + (CompareTime * 2*n/p)

Analysis • Quicksort appears very scalable • Depends heavily on the pivot • Easy to parallelize • Hypercube sorting algorithms depend on the ability to map a hypercube onto the node communication architecture.

Sorting Networks • Specialized hardware for sorting • based on comparator x y x y max{x,y} min{x,y} min{x,y} max{x,y}

Compare-Exchange

Compare-Split

Sorting Network

Bitonic Sort • Key operation: • rearrange a bitonic sequence to ordered • Bitonic Sequence • sequence of elements <a0, a1, … , an-1> • There exists i such that <a0, … ,ai> is monotonically increasing and <ai+1,… , an-1> is monotonically decreasing or • There exists a cyclic shift of indices such that the above is satisfied.

Bitonic Sequences • <1, 2, 4, 7, 6, 0> • First it increases then decreases • i = 3 • <8, 9, 2, 1, 0, 4> • Consider a cyclic shift • i will equal 2 or 3

Rearranging a Bitonic Sequence • Let s = <a0, a1, … , an-1> • an/2 is the beginning of the decreasing seq. • Let s1= <min{a0, an/2}, min{a1, an/2 +1}…min{an/2-1,an-1}> • Let s2=<max{a0, an/2}, max{a1,an/2+1}… max{an/2-1,an-1} > • In sequence s1 there is an element bi = min{ai, an/2+i} • all elements before bi are from increasing • all elements after bi are from decreasing • Sequence s2 has a similar point • Sequences s1 and s2 are bitonic

Rearranging a Bitonic Sequence • Every element of s1 is smaller than every element of s2 • Thus, we have reduced the problem of rearranging a bitonic sequence of size n to rearranging two bitonic sequences of size n/2 then concatenating the sequences.

Rearranging a Bitonic Sequence

Bitonic Merging Network

What about unordered lists? • To use the bitonic merge for n items, we must first have a bitonic sequence of n items. • Two elements form a bitonic sequence • Any unsorted sequence is a concatenation of bitonic sequences of size 2 • Merge those into larger bitonic sequences until we end up with a bitonic sequence of size n

Wires 10 10 5 3 0000 20 20 9 5 0001 5 9 10 8 0010 9 5 20 9 0011 3 3 14 10 0100 8 8 12 12 0101 12 14 8 14 0110 14 12 3 20 0111 90 0 0 95 1000 0 90 40 90 1001 60 60 60 60 1010 40 40 90 40 1011 23 23 95 35 1100 35 35 35 23 1101 95 95 23 18 1110 18 18 18 0 1111 Creating a Bitonic Sequence

Mapping onto a hypercube • One element per processor • Start with the sorting network maps • Each wire represents a processor • Map processors to wires to minimize the distance traveled during exchange

Bitonic Merge on Hypercube

Bitonic Sort Procedure BitonicSort for i = 0 to d -1 for j = i downto 0 if (i + 1)st bit of iproc <> jth bit of iproc comp_exchange_max(j, item) else comp_exchange_min(j, item) endif endfor endfor comp_exchange_max and comp_exchange_min compare and exchange the item with the neighbor on the jth dimension

Bitonic Sort Stages

Assignment • Pick 16 random integers • Draw the Bitonic Sort network • Step through the Bitonic sort network to produce a sorted list of integers. • Explain how the if statement in the Bitonic sort algorithm works.

Parallel Sorting Algorithms and Concepts in Computer Science

Parallel Sorting Algorithms and Concepts in Computer Science

Presentation Transcript

CS 484 Parallel Programming spring 2014

Biology 484 – Ethology

CS 484

CS 484

CMPE 484

Assembly Bill 484

Introduction to MGMT 484

Homework, Page 484

484

CJA 484 COURSES/ cja484helpdotcom

CJA 484 Nerd peer Educator/cja484nerddotcom CJA 484 Nerd peer Educator/cja484nerddotcom

CS 484

EEC 484 Computer Networks

CS 484

CS 484