1 / 23

ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation

ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation. Parallel Compilation. Two approaches to compilation Parallelize a program manually Sequential code converted to parallel code Develop a parallel compiler Intermediate form

barid
Télécharger la présentation

ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 669Parallel Computer ArchitectureLecture 23Parallel Compilation

  2. Parallel Compilation • Two approaches to compilation • Parallelize a program manually • Sequential code converted to parallel code • Develop a parallel compiler • Intermediate form • Partitioning • Block based or loop based • Placement • Routing

  3. Compilation technologies for parallel machines Assumptions: Input: Parallel program Output: “Coarse” parallel program & directives for: • Which threads run in 1 task • Where tasks are placed • Where data is placed • Which data elements go in each data chunk Limitation: No special optimizations for synchronization -- synchro mem refs treated as any other comm.

  4. Toy example • Loop parallelization

  5. Example • Matrix multiply • Typically, • Looking to find parallelism...

  6. a5 a0 a2 b5 b0 b2 + + + c5 c0 c2 Choosing a program representation... • Dataflow graph • No notion of storage problem • Data values flow along arcs • Nodes represent operations

  7. Compiler representation • For certain kinds of structured programs • Unstructured programs Array A Data array Index expressions LOOP LOOP nest Communication weight Data X Task B Task A

  8. Monolithic memory Memory or 2 1 P0 P1 P2 P5 ......Not very useful Process reference graph • Nodes represent threads (processes) computation • Edges represent communication (memory references) • Can attach weights on edges to represent volume of communication • Extension: precedence relations edges can be added too • Can also try to represent multiple loop produced threads as one node

  9. A01 A2 B0 B0 A0 d1 d0 d3 d2 d4 d5 1 1 1 1 P5 P2 P0 P1 1 1 C0 C1 Process communication graph • Allocate data items to nodes as well • Nodes: Threads, data objects • Edges: Communication • Key: Works for both shared-memory, object-oriented, and dataflow systems! (Msg. passing)

  10. Fine PCG 4 4 4 4 4 : Computation : Data Coarse PCG PCG for Jacobi relaxation

  11. Compilation with PCGs Fine process communication graph Partitioning Coarse process communication graph

  12. Compilation with PCGs Fine process communication graph Partitioning Coarse process communication graph MP: Placement Coarse process communication graph ... other phases, scheduling. Dynamic?

  13. Parallel Compilation • Consider loop partitioning • Create small local compilation • Consider static routing between tiles • Short neighbor-to-neighbor communication • Compiler orchestrated

  14. Flow Compilation • Modulo unrolling • Partitioning • Scheduling

  15. Modulo Unrolling – Smart Memory • Loop unrolling relies on dependencies • Allow maximum parallelism • Minimize communication

  16. Array Partitioning – Smart Memory • Assign each line to separate memory • Consider exchange of data • Approach is scalable

  17. Communication Scheduling – Smart Memory • Determine where data should be sent • Determine when data should be sent

  18. Speedup for Jacobi – Smart Memory • Virtual wires indicates scheduled paths • Hard wires are dedicated paths • Hard wires require more wiring resources • RAW is a parallel processor from MIT

  19. Partitioning • Use heuristic for unstructured programs • For structured programs... ...start from: Arrays A C B List of arrays List of loops L0 L1 L2 Loop Nests

  20. Notion of Iteration space, data space E.g. Matrix A Data space Iteration space j Represents a “thread” with a given value of i,j i

  21. + Notion of Iteration space, data space E.g. • Partitioning: How to “tile” iteration for MIMD M/Cs data spaces? Matrix A Data space This thread affects the above computation Iteration space j Represents a “thread” with a given value of i,j i

  22. Loop partitioning for caches • Machine model • Assume all data is in memory • Minimize first-time cache fetches • Ignore secondary effects such as invalidations due to writes A Memory Network Cache Cache Cache P P P

  23. Summary • Parallel compilation often targets block based and loop based parallelism • Compilation steps address identification of parallelism and representations • Graphs often useful to represent program dependencies • For static scheduling both computation and communication can be represented • Data positioning is an important for computation

More Related