1 / 32

A Process Splitting Transformation for Kahn Process Networks

A Process Splitting Transformation for Kahn Process Networks. Sjoerd Meijer. Contents. Background Problem Definition and Project Goal Splitting Producer Selection Inter-process Communication Consumer Selection Implementation Conclusion And Further Work. CPU. CPU. CPU. CPU. Main

ceri
Télécharger la présentation

A Process Splitting Transformation for Kahn Process Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer

  2. Contents • Background • Problem Definition and Project Goal • Splitting • Producer Selection • Inter-process Communication • Consumer Selection • Implementation • Conclusion And Further Work

  3. CPU CPU CPU CPU Main Memory Cache Cache Cache Cache Memory Bus Background : do i, n do j, n a(i,j) = … : end do end do : • Parallelization is not new • Forking a sequential application Classic example,matrix-matrix multiplication: • Master processor executes code up to parallel loop • Execute parallel iterations on other processors • Synchronize at end of parallel loop

  4. Background Applications are specified as parallel tasks: • Example JPEG decoder:

  5. Cakesim (eCos+CCP) - profile for JPEG-KPN: Problem Definition ?

  6. Problem Definition Automatic procedure for process splitting in KPNs to take advantage of multiprocessor architectures. Split-up network: Original process network

  7. Splitting – The Concept Required: • Determine computational expensive process: profiling or pragma’s + static support • Partitioning of the Iteration Space (IS) • N = number of times a process has to be split • L = loop-nest level at which the splitting takes place To do: • Duplication of code and FIFOs • Adding control for token production and consumption

  8. Techniques used: Data dependence analysis: • Data flow analysis • Array data flow analysis Tree transformations: • Adding/removing/duplicating tree statements Compiler framework: • GCC

  9. P1 P2 P3 Solution for KPNs Four step approach: COMPUTATION: • Partitioning (computation) COMMUNICATION: 2. Interprocess communication 3. Token production 4. Token consumption P1 P21 P3 P22

  10. Partitioning of the original process computation over the resulted split-up processes

  11. Interprocess Communication : for(int i=1; i<10; i++) a[i] = a[i-1] + i; //s1 : • Inter process communication is given by the loop-carried dependency: a[i-1] at iteration i is produced at iteration i-1. • If execution of stmt s1 is distributed over different processes, token a[i-1] needs to be communicated: : : for(int i=1; i<10; i++){ for(int i=1; i<10; i++){ if(i%2==0) if(i%2==1) a[i] = a[i-1] + i; a[i] = a[i-1] + i; : :

  12. P2’ ? P1 P2 P1 P2’’ ? P2’ P2 P3 P3 P2’’ Token Production&Consumption Problems: • P1.At the producer side: where to send the tokens to? • PII.At the consumer side: from where to consume tokens ? Solutions P1: • Producer filters the tokens (static solution) • Producer sends all tokens to all split-up processes (run time solution) Solutions PII: • The consumer knows by it self when to switch (static solution) • Each producer sends a signal to the consumer when to switch reading data from a different FIFO (run time solution)

  13. Static solution 50 tokens 100 tokens P2’ 50 tokens 100 tokens P1 P2’’ Runtime solution P2’ P1 P2’’ Token Production– runtime vs. static 100 tokens P1 P2

  14. Static solution Switch is known internally by the consumer 50 tokens P2’ P2 P2’’ 50 tokens 100 tokens P2 P3 Runtime solution 50 tagged tokens Switch is communicated over the channels to the consumer P2’ P3 P2’’ 50 tagged tokens Token Consumption – runtime vs. static

  15. Token Production & Consumption – static solution • Establish the data-dependencies over the processes HOW? • Data Dependence function (DD) and DD-1 DD -1 : Producer Consumer DD : Consumer Producer • However, DD cannot always be determined at compile time

  16. Token Production – static solution without DD -1 Observation: loop counters producer side equal loop counters from consumer side

  17. Token Production – static solution without DD -1 DD-1 (w1,w2,w3)=(w4,w5,w6); P2(DD-1 (w1,w2,w3))=w5 w5=w2 => P2(DD-1 (w1,w2,w3)%2= w2%2

  18. Token Consumption – static solution without DD Similar to production of tokens.

  19. Runtime solution:

  20. Split-up into 3 processes P2’ P3’ P4 P1 P2’’ P3’’ P3’’’ P2’’’ Multiple split-up processes P3 P4 P1 P2

  21. Copy-nodes insertion P1 P2 P4 P3 Splitting transformation P2’ P3’ P1 P2’’ P4 P3’’ P2’’ P3’’ Copy-nodes P3 P4 P1 P2

  22. Copy-nodes • Pros: • Simple network structure • Apply four-step splitting approach • Cons: • More processes => more communication (can be improved) => overhead

  23. Implementation • Used technique: • Runtime solution (general) • Used framework: • GCC (GNU Compiler Collection) • Advantages GCC: • Availability of data dependence information • Supported by large community; • We are in contact with Sebastian Pop, maintainer and developer of various compiler phases e.g. the data dependence analysis, control flow and induction variable.

  24. Implementation • Data dependence analysis (already present): • scalars • arrays • Data Dependence Graph (DDG) present only on RTL level, not on tree SSA • Two new passes: • Create DDG • Splitting

  25. Implementation • Splitting pragma • Data dependence graph • Class definition reconstruction • Function cloning • Modulo condition insertion

  26. Implementation To do: • Copying of class definition • Copying of class member functions • Reconstruction network structure • FIFO • Network definition

  27. Implementation Final result: • Data dependence information tells whether splitting is legal (no IPC) • Semi-automatic transformation/case-study

  28. Original KPN KPN with copy nodes Processes split-up into two Improvement of 21% Results

  29. Merge P2’ P3’ Fork Mesh P1 P2’’ P4 P3’’ P2’’ P3’’ Future work: YAPI and CCP • Difference in active and passive connectors. • Active connectors in YAPI are modeled as a thread • Passive do not run in a separate thread • More connectors in CCP:

  30. Future Work • Connect GCC with SCOTTY: • GCC branch • Main branch: may not accept the patch • GOMP branch targets parallelization + data dependence + Network topology

  31. Conclusion • Only split-up the most computationally expensive processes • The transformation is profitable

More Related