Broadcasting with Selective Reduction (BSR) Model Overview

Chapter 11Broadcasting with Selective Reduction-BSR- Serpil Tokdemir GSU, Department of Computer Science

What is Broadcasting with Selective Reduction? • BSR requires asymptotically no more resources than the PRAM for its implementation. • an extension of the PRAM • It consists; • N processors • M shared-memory locations • MAU (memory access unit) • Forms of memory access; • ER • EW • CR • CW

… … … The BSR Model of Parallel Computation MEMORY LOCATIONS P1 MEMORY ACCESS UNIT (MAU) P2 . . . . . . . . . PN PROCESSORS SHARED MEMORY

Broadcasting with Selective Reduction • During execution of an algorithm; • several processors may read from or write to the same memory location • all processors may gain access to all memory locations at the same time for the purpose of writing, • at each memory location, a subset of the incoming broadcast data is selected and reduced to one value. • according to an appropriate selectionand reductionoperator • this value is finally stored in the memory location, • BSR accommodates; • all forms of memory access allowed by the PRAM + broadcasting with selective reduction.

BSR Continued • the width of the resulting MAU: O(M) • the depth of the resulting MAU: O(logM) • the size of the resulting MAU: O(MlogM) • How Long Does a Step Take in BSR? • Memory access should require a(N, M)=O(logM) • We assume here that a(N, M)=O(1) • Similarly, a computational operation takes constant time; • c(N, M)=O(1)

THE BSR MODEL • Additional form of concurrent access to shared memory • BROADCAST – allows all processors to write all-shared memory locations simultaneously. • 3 phases, • A broadcasting phase, • Each processor Pi broadcasts a datum di and a tag gi, 1<=i<=N, destined to all memory locations. • A selection phase, • Each memory location Uj uses a limit lj, 1<=j<=M, and a selection rule  to test the condition gi lj. •  is selected from the set; • <, <=, =, >=, >, 

The BSR Model (Continued) • A reduction phase, • All data di selected by Uj during the selection phase are combined into one datum that is finally stored in Uj. • Reduction operator – • SUM, • PRODUCT, • AND, OR, • EXCLUSIVE-OR, • MAXIMUM, MINIMUM All three phases are performed simultaneously for all processors Pi and all memory locations Uj.

The three phases of the BROADCAST instruction g1, d1 g1 l1 g1, d1 g2 l1 gN l1 dN g1, d1 gN, dN g1 lM gN, dN g2 lM gN lM dN gN, dN

The BSR Model • If a datum or a tag is not in a processor’s local register, • obtain it from the shared memory by an ER or a CR • The limits, selection rule and reduction operator, are assumed to be known by the memory locations. • If not, they can be stored in memory by ER or CW • Notation for the BROADCAST Instruction: • A • instruction Broadcast of BSR is written as follows: • a

THE BSR MODEL • If no data are accepted by a given memory location, • Value is not affected by BROADCAST instruction • If only one datum is accepted, • Uj is assigned the value of that datum. • Comparing BSR to the PRAM • In BSR, the BROADCAST instruction requires O(1) time. On a PRAM-same # of p’s and U’s- require O(M) time, since • Broadcast is equivalent to M CW instructions • The latter is at least as powerful as the former • The BROADCAST instruction makes BSR strictly more powerful than the PRAM

THE BSR MODEL • A , in nondecreasing order • distinct numbers , in increasing order • It is required to compute, for , the sum si of all those elements of X not equal to . • On the PRAM – O(n) – obviously optimal • The sum S of all the elements of X is first computed, • Y=X is merged with L, sorted by increasing order, • Y is scanned, , , is computed by subtracting from S all the elements of X equal to . • n processors can compute one of the in O(1) time

THE BSR MODEL • BSR using one BROADCAST instruction: • Processor Pi, , broadcasts as the tag and datum pair. • Memory location Uj selects those xi not equal to , • Those xi selected by Uj are added up to obtain , • This requires O(1) time • Does not depend on X and L being sorted

BSR ALGORITHMS • Prefix Sums • Given n numbers , • prefix sums • BSR PREFIX SUMS – n processors and n memory locations • Pibroadcast index as tag and as datum. • Memory location uses its index j as limit. • Relation for selection and as a reduction operator. • holds

BSR Algorithms – Prefix Sums • Algorithm BSR PREFIX SUMS • Consists of one BROADCAST instruction • P(n)=n, t(n)=O(1), and c(n)=p(n)*t(n)=O(n) • optimal for j= 1 to n do in parallel for i= 1 to n do in parallel end for end for.

BSR Algorithms – Prefix Sums • Example: n={1, 2, 3}

BSR Algorithms – Sorting • A , rearrange the elements of X bbbbbbbbbb – in nondecreasing order • Requires n processors and n memory locations • Consists of two steps; • The rank rj of each element xj is computed • xj – Limit • < - Relation • - Reduction operator • Uj holds rj , for • xj is placed in position of the sorted sequence S. • If and are equal,

BSR Algorithms - Sorting • Second step continued • , • to position • to position • to position • The next element with the next higher rank is placed in position of S. • Pi broadcasts the pair (ri, xi) • Uj uses its index j as limit • for selection • as a reduction • When this step terminates; • Uj holds sj – that is, the jth element of the sorted sequence

BSR Algorithms - Sorting • Algorithm BSR SORT • Step 1: for j= 1 to n do in parallel • for i= 1 to n do in parallel • Step 2: for j= 1 to n do in parallel • for i= 1 to n do parallel end for end for end for end for

BSR Algorithms - Sorting • Example: • Processors broadcast the pairs to all memory locations; • (8,1), (5,1), (2,1), (5,1) • Limits are 8, 5, 2, and 5 • Since • 5 < 8, 2 < 5, and 5< 8, r1=3 • Only 2 < 5, so r2=1 • r3=0 • Only 2 < 5, so r4=1

BSR Algorithms - Sorting • Example continued; • Step 2 of the algorithm • Processors broadcast the pairs; • (4,8), (2,5), (1,2), (2,5) • Limits at the memory locations • 1, 2, 3, 4 • This gives the sorted sequence; • {2, 5, 5, 8}

BSR Algorithms - Sorting • Analysis: • BSR SORT • p(n)=n and runs in t(n)=O(1) time, c(n)=O(n) • Uniform analysis • assumed; the time required for memory access, was taken to be O(1). • Discriminating Analysis: • , is taken to be equal to O(logM) – for BSR & PRAM • BSR: N=M=O(n), thus time is O(logn) • Each step is executed once and containing a constant number of computations and memory access, so;

BSR Algorithms - Sorting • - OPTIMAL • PRAM SORT: N=M=O(n), thus time is O(logn) • executes O(logn) computational and memory access steps, therefore, • Cost is NOT optimal

BSR Algorithms – Computing Maximal Points • , • , n points in the plane • , for • A point of S is said to be maximal with respect to S if and only if it is not dominated by any other point of S. • uses n processors and n memory locations • consists of three steps: • auxiliary sequence is created, • mi, associated with point qi, is set initially to equal yi, • The largest y coordinate is found, • mj is assigned the value of that coordinate • Pi broadcasts , xi = tag, yi = datum

BSR Algorithms – Computing Maximal Points • Uj uses as its limit • The relation > for selection • for reduction, to compute mj • If , , it accepts the y-coordinate of every point • assigns the max of these to mj. • A decision is made as to whether qi is a maximal point • If mi was assigned to some point qk • If , then qk dominates qi • , • Else , neither qk nor any other point does not dominate • ,

BSR Algorithms – Computing Maximal Points • Algorithm BSR MAXIMAL POINTS Step 1: for i= 1 to n do in parallel end for Step 2: for j= 1 to n do in parallel for i= 1 to n do in parallel end for end for Step 3: for i= 1 to n do in parallel if then else end if end for.

BSR Algorithms – Computing Maximal Points • Analysis; • Each step – uses n processors & runs in O(1) time • P(n)=n, t(n)=O(1), and c(n)=O(n) • By taking memory access time O(logn), cost becomes O(nlogn) • On the other hand cost for PRAM is O(nlog2n) – not optimal • Example: are three points in the plane

BSR Algorithms – Computing Maximal Points • After step 1 of the algorithm, • m1=y1, m2=y2, m3=y3 • After step 2, • m1=y3, m2=y3, m3=y3 • Since, • m1<y1, m2>y2 and • m3=y3, • both q1 and q3 are maximal

BSR Algorithms – Maximum Sum Sebsequence • , the subsequence has the largest possible sum • among all subsequences of X. • Algorithm BSR MAXIMUM SUM SUBSEQUENCE • Step 1: for j=1 to n do in parallel for i= 1 to n do in parallel end for end for • Step 2

BSR Algorithms – Maximum Sum Subsequence • Step 2: • (2.1) for j= 1 to n do in parallel for i= 1 to n do in parallel end for end for • (2.2) for j= 1 to n do in parallel for i= 1 to n do in parallel end for end for

BSR Algorithms – Maximum Sum Subsequences • Step 3: for i= 1 to n do in parallel end for • Step 4: • (4.1) for i= 1 to n do in parallel • (i) L bi • (ii) if bi=L then u i end if end for • (4.2) MAX ARBITRARY

BSR Algorithms – Maximum Sum Subsequences • Steps of algorithm; • Prefix sums are computed – uses BSR PREFIX SUMS • For each j; • Max prefix sum to he right of sj is found. • Value and index mj, aj • (i, si) = tag and datum • Uj uses j as limit, >= for selection and for reduction. • To compute ai • Pi broadcasts (si, i) as its tag and datum pair, • Uj uses mj as limit, = for selection and for reduction. • For each i, the sum of max sum subsequence is computed • Uses EW instruction

BSR Algorithms – Maximum Sum Subsequences • Steps of algorithm continued • The sum and starting index u of the overall maximum sum subsequence are found. • Requires MAX CW instruction and an ARBITRARY CW instruction, • Analysis: Each step of algorithm runs in O(1) time and uses n processors. Thus; • p(n)=n, • t(n)=O(1) • and c(n)=O(n), • Optimal

BSR Algorithms – Maximum Sum Subsequences • Example: X={-1, 1, 2, -2} • After step 1, prefix sums - sj • -1, 0, 2, 0 • Second broadcast instruction; • mj 2, 2, 2, 0

BSR Algorithms – Maximum Sum Subsequences • Example continued • Third broadcast instruction for computing aj • aj 3, 3, 3, 4 • Step 3 computes each bi • bi 2, 3, 2, -2 • Finally; • L=3 • u=2 • v= a2=3

Broadcasting with Selective Reduction (BSR) Model Overview

Broadcasting with Selective Reduction (BSR) Model Overview

Presentation Transcript

Chapter 11, part B

S - R S - R S - R S - R S - R S - R S - R S - R S - R S - R S - R S - R

Diploma Examinations: Paper R eduction Initiative

Future CABB Data R eduction

Cost-effectiveness of harm r eduction

S OURCE R EDUCTION

SDSC S R B survey

Chapter 11 B-tree

July 11, 2006 / 2p E missions R eduction T echniques S uite

S A b i r d s

B U R N S

B R U S S E L S

Chapter 11, part B

Oxidation R eduction Reactions

S moking r eduction S moking cessation? H arm reduction????

BAFS – S.5 Elective