1 / 28

Dynamic Scheduling and Dynamic Percolation

Dynamic Scheduling and Dynamic Percolation. Elkin Garcia CAPSL – UD Based on a presentation made by Rishi Khan ET International. Motivation. Static Scheduling is not eable to achieve the maximum performance on a many-core Archtecture (C64)

rcontreras
Télécharger la présentation

Dynamic Scheduling and Dynamic Percolation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Scheduling and Dynamic Percolation Elkin Garcia CAPSL – UD Based on a presentation made by Rishi Khan ET International

  2. Motivation • Static Scheduling is not eable to achieve the maximum performance on a many-core Archtecture (C64) • EVEN FOR REGULAR APPLICATIONS LIKE MATRIX MULTIPLICATION

  3. Issues of Static Scheduling • Blocks are not necessarily multiples of the Optimal Tile Size (OTS) • Extra overhead for processing non-optimal sized tiles. • It is worst when many processors share a small fixed amount of on-chip memory

  4. Issues of Static Scheduling (2) • SS does not consider stalls due to arbitration of shared resources. • SS assumes that TUs doing the same amount of work will complete at the same time. • Many-core architectures have plenty of shared resources: • FPUs, crossbar, memory, and I-Caches that can produce unexpected stalls.

  5. Dynamic Scheduling (DS) Balances optimally in presence of shared resources with higher efficiency. • Partition of matrix C only in tiles. • Use of atomic operations for low overhead. X = A B C

  6. Results on Cyclops64

  7. Dynamic Percolation (DP) • Assign tasks (codelets) to TU at runtime with low overhead using a lock-free queue: • Computation tasks: Compute optimum tiles of 6x6. • Data movement tasks: Move inputs and outputs between SRAM and DRAM using double buffering.

  8. Codelets • A nonpreemptive set of code that can run to completion once it’s dependencies/Events are met. • Dependencies/Events: • Data Dependencies • Resource Constrains (Threads/BW) • Desired Behavior (Power)

  9. Matrix Multiply in SRAM using Petri nets Size: 192x192 TOKENS PLACE T INIT Clean 1024 Comp1 Low TRANSITION X = A B C

  10. Matrix Multiply in DRAM T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 F Low

  11. Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 Copy2 F Low Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

  12. Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1 Copy1 F Low 1 T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 Copy2 F Low High Init Set 1 ΔT Low Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

  13. Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1024 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 1 Copy2 F Low High Init Set 2 Low Comp1(1024) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

  14. Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1020 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 1024 Copy2 F Low High Low Comp1(1020) Comp2(1024) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

  15. Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 1024 Copy2 F Low High Clean Low Comp2(1024) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

  16. Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 1020 Copy2 F Low High Init Copy Set Low Comp2(1020) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

  17. Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 8 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 1000 Copy2 F Low High Copy (8) Low Comp2(1000) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

  18. Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 500 Copy2 F Low High Clean Low Comp2(500) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

  19. Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 495 Copy2 F Low High Done Low Comp2(495) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

  20. Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 490 Copy2 F Low High Init Set 1 Low Comp2(490) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

  21. Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1024 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 480 Copy2 F Low High Low Comp1(1024) Comp2(480) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

  22. Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1024 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 Copy2 F Low High Clean Low Comp1(1024) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

  23. Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1020 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 Copy2 1 F Low High Init Copy Set 2 Low Comp1(1020) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

  24. The Cool Demo on MxM • Due to Rishi Khan (ETI) UHPC-Portland-Meeting-06-2009

  25. Final Results

  26. Scalability

  27. Summary • Static Optimizations increase performance substantially. • Dynamic Scheduling and Dynamic Percolation mitigates the unpredictable effects of resource sharing. • Optimizations implemented are also power efficient.

  28. Acknowledgements • Professor Guang Gao • ETI and CAPSL people that have help on this project (Rishi Khan, Daniel Orozco, Kelly Livingston, Ioannis Venetis) • Members of CAPSL Tutorial Project Part 2

More Related