1 / 20

Parallel Processing & Distributed Systems

Parallel Processing & Distributed Systems. Chapter 2. Thoai Nam. Chapter 2: Parallel Computer Models & Classification. Model of Parallel Computation: PRAM, BSP, Phase Parallel Pipeline, Processor Array, Multiprocessor, Data Flow Computer Flynn Classification: SISD, SIMD, MISD, MIMD

susan
Télécharger la présentation

Parallel Processing & Distributed Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Processing & Distributed Systems Chapter 2 Thoai Nam

  2. Chapter 2: Parallel Computer Models & Classification • Model of Parallel Computation: • PRAM, BSP, Phase Parallel • Pipeline, Processor Array, Multiprocessor, Data Flow Computer • Flynn Classification: • SISD, SIMD, MISD, MIMD • Speedup: • Amdahl, Gustafson • Pipeline Computer Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  3. Models of Parallel Computation • PRAM • BSP • Phase Parallel Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  4. RAM • RAM (random access machine) Worst-case time complexity Expected time complexity Worst-case space complexity Expected space complexity Uniform cost criterion RAM instruction requires 1 time unit Every registrer requires 1 unit of space Read-only input tape r0 Program Location counter r1 r2 r3 Write-only output tape Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  5. PRAM (1) • PRAM Control Private memory Private memory Private memory P1 Pn P2 … Interconnection network Global memory Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  6. PRAM (2) • A control unit • An unbounded set of processors, each with its own private memory and an unique index • Input stored in global memory or a single active processing element • Step: (1) read a value from a single private/global memory location (2) perform a RAM operation (3) write into a single private/global memory location • During a computation step: a processor may activate another processor • All active, enable processors must execute the same instruction (albeit on different memory location) • Computation terminates when the last processor halts Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  7. PRAM(3) • Definition: The cost of a PRAM computation is the product of the parallel time complexity and the number of processors used. Ex: a PRAM algorithm that has time complexity O(log p) using p processors has cost O(p log p) Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  8. Conflicts Resolution Schemes (1) • PRAM execution can result in simultaneous access to the same location in shared memory. • Exclusive Read (ER) • No two processors can simultaneously read the same memory location. • Exclusive Write (EW) • No two processors can simultaneously write to the same memory location. • Concurrent Read (CR) • Processors can simultaneously read the same memory location. • Concurrent Write (CW) • Processors can simultaneously write to the same memory location, using some conflict resolution scheme. Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  9. Conflicts Resolution Schemes(2) • Common/Identical CRCW • All processors writing to the same memory location must be writing the same value. • The software must ensure that different values are not attempted to be written. • Arbitrary CRCW • Different values may be written to the same memory location, and an arbitrary one succeeds. • Priority CRCW • An index is associated with the processors and when more than one processor write occurs, the lowest-numbered processor succeeds. • The hardware must resolve any conflicts Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  10. PRAM Algorithm • Begin with a single active processor active • Two phases: • A sufficient number of processors are activated • These activated processors perform the computation in parallel • log p activation steps: p processors to become active • The number of active processors can be double by executing a single instruction Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  11. Parallel Reduction (1) Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  12. Parallel Reduction (2) (EREW PRAM Algorithm in Figure2-7, page 32, book [1]) SUM(EREW) Initial condition: List of n 1 elements stored in A[0..(n-1)] Final condition: Sum of elements stored in A[0] Global variables: n, A[0..(n-1)], j begin spawn (P0, P1,…, Pn/2  -1) for all Pi where 0  i  n/2  -1 do for j 0 to log n  – 1 do if i modulo 2j = 0 and 2i+2j < n the A[2i]  A[2i] + A[2i+2j] endif endfor endfor end Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  13. BSP • BSP(Bulk Synchronous Parallel) Model • Proposed by Leslie Valiant of Harvard University • Developed by W.F.McColl of Oxford University Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  14. BSP Computer (1) • Distributed memory architecture • 3 components • Node • Processor • Local memory • Router (Communication Network) • Point-to-point, message passing (or shared variable) • Barrier synchronizing facility • All or subset Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  15. P P P M M M BSP Computer (2) Node (w) Node Node Barrier (l) Communication Network (g) Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  16. Three Parameters • w parameter • Maximum computation time within each superstep • Computation operation takes at most w cycles. • g parameter • # of cycles for communication of unit message when all processors are involved in communication - network bandwidth • (total number of local operations performed by all processors in one second) / (total number of words delivered by the communication network in one second) • h relation coefficient • Communication operation takes gh cycles. • l parameter • Barrier synchronization takes l cycles. Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  17. BSP Program (1) • A BSP computation consists of S super steps. • A superstep is a sequence of steps followed by a barrier synchronization. • Superstep • Any remote memory accesses take effect at barrier - loosely synchronous Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  18. BSP Program (2) P1 P2 P3 P4 Superstep 1 Computation Communication Barrier Superstep 2 Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  19. BSP Algorithm Design • 2 variables for time complexity • N: # time steps (local operations) per super step. • h-relation: each node sends and receives at most h messages • gh is time steps for communication within a super step • Time complexity • Valiant's original: max{w, gh, l} • overlapped communication and computation • McColl's revised: N + gh + l = Ntot + Ncom g + sl • Algorithm design • Ntot = T / p, minimize Ncom and S. Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  20. Phase Parallel • Proposed by Kai Hwang & Zhiwei Xu • Similar to the BSP: • A parallel program: sequence of phases • Next phase cannot begin until all operations in the current phase have finished • Three types of phases: • Parallelism phase • Computation phase • Interaction phase • Different computation phases may execute different workloads at different speed. Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

More Related