290 likes | 297 Vues
Extreme Models. The models of parallel computation range from very abstract to very concrete , with real parallel machine implementations and user models falling somewhere between these two extremes. At one extreme lies the abstract shared-memory PRAM model.
E N D
Extreme Models Part II
The models of parallel computation range from very abstract to very concrete, with real parallel machine implementations and user models falling somewhere between these two extremes. • At one extreme lies the abstract shared-memory PRAM model. • The other extreme is the circuit model of parallel processing. • Intermediate models Part II
PRAM and Basic Algorithms Part II
In this chapter, • the relative computational powers of several PRAM submodels • five key building-block algorithms • Data broadcasting • Semigroup or fan-in computation • Parallel prefix computation • Ranking the elements of a linked list • Matrix multiplication Part II
PRAM Submodels and Assumptions (1) • PRAM model prescribes the concurrent operation of p processors (in SIMD or MIMD mode) on data that are accessible to all of them in an m-word shared memory. • In the synchronous SIMD or SPMD version of PRAM, Processor i can do the following in the three phases of one cycle: (Not all three phases need to be present in every cycle) Part II
PRAM Submodels and Assumptions (2) • It is possible that several processors may want to read data from the same memory location or write their values into a common location. • Four submodels of the PRAM model have been defined: Part II
PRAM Submodels and Assumptions (3) • Here are a few example submodels based on the semantics of concurrent writes in CRCW PRAM: Part II
PRAM Submodels and Assumptions (4) • The following relationships have been established between some of the PRAM submodels: Part II
DATA BROADCASTING (1) • one-to-all, broadcasting is used when one processor needs to send a data value to all other processors. • In the CREW or CRCW submodels, broadcasting is trivial. (sending processor can write the data value into a memory location, with all processors reading that data value in the following machine cycle) Θ(1) • All-to-all broadcasting, where each of the p processors needs to send a data value to all other processors, can be done through p separate broadcast operations in Θ(P)steps, which is optimal. Part II
DATA BROADCASTING (2) • The above scheme is clearly inapplicable to broadcasting in the EREW model. (one-to-all) • The simplest scheme for EREW broadcasting is to make p copies of the data value, say in a broadcast vector B of length p, and then let each processor read its own copy by accessing B[J] • a method known as recursive doublingis used to copy B[0] into all elements of B in log2p steps Part II
DATA BROADCASTING (3) • The complete EREW broadcast algorithm with this provision is given below. Part II
DATA BROADCASTING (4) • To perform all-to-all broadcasting, so that each processor broadcasts a value that it holds to each of the other p - 1 processors, we let Processor j write its value into B[j], rather than into B[0]. Part II
DATA BROADCASTING (5) • Given a data vector S of length p, a naive sorting algorithm can be designed based on the above all-to-all broadcasting scheme. Part II
SEMIGROUP OR FAN-IN COMPUTATION • This computation is trivial for a CRCW PRAM. • Here too the recursive doubling scheme can be used to do the computation on an EREW PRAM. Part II
PARALLEL PREFIX COMPUTATIO • parallel prefix computation consists of the first phase of the semigroup computation. • the divide-and-conquer paradigm Part II
MATRIX MULTIPLICATION Part II
Skip Chapters • Chapter 6: More shared-Memory Algorithm • Chapter 7: Sorting and Selection Networks • Chapter 8: Other Circuit-Level Examples Part II