1 / 24

Code Generation Framework for Process Network Models onto Parallel Platforms

Code Generation Framework for Process Network Models onto Parallel Platforms. Man-Kit Leung, Isaac Liu, Jia Zou Final Project Presentation. Outline. Motivation Demo Code Generation Framework Application and Results Conclusion. Motivation. Parallel programming is difficult…

sarai
Télécharger la présentation

Code Generation Framework for Process Network Models onto Parallel Platforms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Code Generation Framework for Process Network Models onto Parallel Platforms Man-Kit Leung, Isaac Liu, Jia Zou Final Project Presentation

  2. Outline • Motivation • Demo • Code Generation Framework • Application and Results • Conclusion

  3. Motivation • Parallel programming is difficult… • Functional correctness • Performance debugging + tuning (Basically, trial & error) • Code generation as a tool • Systematically explore implementation space • Rapid development / prototyping • Optimize performance • Maximize (programming) reusability • Correct-by-construction [E. Dijkstra ’70] • Minimize human errors (bugs)‏ • Eliminates the need for low-level testing • Because, otherwise, manual coding is too costly • Especially true for multiprocessors/distributed platforms

  4. Higher-Level Programming Model • Kahn Process Networks (KPNs) is a distributed model of computation (MoC) where a group of processing units are connected by communication channels to form a network of processes. • The communication channels are FIFO queues. • “The Semantics of a Simple Language For Parallel Programming” [GK ’74] Implicit Buffers Source Actor1 • Deterministic • Inherently parallel • Expressive Sink Actor Source Actor2 Implicit Buffers

  5. MPI Code Generation Workflow Executable • Execute code • Obtain execution statistics for tuning Model • Generate MPI code • SIMD (Single Instruction Multiple Data) Partitioning (Mapping) Given a (KPN) Model • Analyze & annotate model • Assume weights on edges & nodes • Generate cluster info (buffer & grouping) Code Generation

  6. Demo The codegen facility is in the Ptolemy II nightly release - http://chess.eecs.berkeley.edu/ptexternal/nightly/

  7. Role of Code Generation Platform-based Design [AS ‘02] Models Partitioning (Mapping) Code Generation Ptolemy II Ptolemy II Executable

  8. Implementation Space for Distributed Environment • Mapping • # of logical processing units • # of cores / processors • Network costs • Latency • Throughput • Memory Constraint • Communication buffer size • Minimization metrics • Costs • Power consumption • …

  9. Partition • Using node and edge weights abstractions • Annotation on the model • From the model, the input file to Chaco is generated. • After Chaco produces the output file, the partitions are automatically annotated onto the model.

  10. Multiprocessor Architectures • Shared Memory vs. Message Passing • We want to generate code that will run on both kinds of architectures • Message passing: • Message Passing Interface(MPI) as the implementation • Shared memory: • Pthread implementation available for comparison • UPC and OpenMP as future work

  11. Pthread Implementation void Actor1 (void) { ... } void Actor2 (void) { ... } void Model (void) { pthread_create(&Actor1…); pthread_create(&Actor2…); pthread_join(&Actor1…); pthread_join(&Actor2…); } Model

  12. MPI Code Generation • KPN Scheduling: • Determine when actors are safe to fire • Actors can’t block other actors on same partition • Termination based on a firing count MPI send/recv Local buffers MPI Tag Matching

  13. Sample MPI Program main() { if (rank == 0) { Actor0(); Actor1(); } if (rank == 1) { Actor2(); } ... } Actor#() { [1] MPI_Irecv(input); [2] if (hasInput && !sendBufferFull){ [3] output = localCalc(); [4] MPI_Isend(1, output); } }

  14. Application

  15. Execution Platform

  16. Preliminary Results

  17. Conclusion & Future Work • Conclusion • Framework for code generation to parallel platforms • Generate scalable MPI code from Kahn Process Network models • Future Work • Target more platforms ( UPC, OpenMP etc) • Additional profiling techniques • Support more partitioning tools • Improve performance on generated code

  18. Acknowledgments • Edward Lee • Horst Simon • Shoaib Kamil • Ptolemy II developers • NERSC • John Kubiatowicz Questions / Comments

  19. Extra slides

  20. Why MPI • Message passing • Good for distributed (shared-nothing) systems • Very generic • Easy to set up • Required setup (i.e. mpicc and etc.) for one “master” • Worker nodes only need to have SSH • Flexible (explicit) • Nonblocking + blocking send/recv • Cons: required explicit syntax modification (as opposed to OpenMP, Erlang, and etc.) • Solution: automatic code generation

  21. Actor-oriented design: a formalized model of concurrency object oriented actor oriented • Actor-oriented design hides the states of each actor and makes them inaccessible from other actor • The emphasis of data flow over control flow leads to conceptually concurrent execution of actors • The interaction between actors happens in a highly disciplined way • Threads and mutexes become implementation mechanism instead of part of programming model

  22. Pthread implementation • Each actor as a separate thread • Implicit buffers • Each buffer has a read and write count • Condition variable: sleeps and wakes up threads • Capacity of the buffer • A global notion of scheduling exists • OS level • All actors are at blocking-read mode implies the model should terminate

  23. MPI Implementation • Mapping of actors to cores is needed. • Classic graph partitioning problem • Nodes: actors • Edges: messages • Node weights: computations on each actor • Edge weights: amount of messages communicated • Partitions: processors • Chaco chosen as the graph partitioner.

  24. Partition Profiling • Challenge: providing the user with enough information so node weights and edge weights can be annotated and modified to achieve load balancing. • Solution 1: Static analysis • Solution 2: Simulation • Solution 3: Dynamic load balancing • Solution 4: Profiling the current run and feed the information back to the user

More Related