Dynamic Scheduling and Dynamic Percolation

Dynamic Scheduling and Dynamic Percolation Elkin Garcia CAPSL – UD Based on a presentation made by Rishi Khan ET International

Motivation • Static Scheduling is not eable to achieve the maximum performance on a many-core Archtecture (C64) • EVEN FOR REGULAR APPLICATIONS LIKE MATRIX MULTIPLICATION

Issues of Static Scheduling • Blocks are not necessarily multiples of the Optimal Tile Size (OTS) • Extra overhead for processing non-optimal sized tiles. • It is worst when many processors share a small fixed amount of on-chip memory

Issues of Static Scheduling (2) • SS does not consider stalls due to arbitration of shared resources. • SS assumes that TUs doing the same amount of work will complete at the same time. • Many-core architectures have plenty of shared resources: • FPUs, crossbar, memory, and I-Caches that can produce unexpected stalls.

Dynamic Scheduling (DS) Balances optimally in presence of shared resources with higher efficiency. • Partition of matrix C only in tiles. • Use of atomic operations for low overhead. X = A B C

Results on Cyclops64

Dynamic Percolation (DP) • Assign tasks (codelets) to TU at runtime with low overhead using a lock-free queue: • Computation tasks: Compute optimum tiles of 6x6. • Data movement tasks: Move inputs and outputs between SRAM and DRAM using double buffering.

Codelets • A nonpreemptive set of code that can run to completion once it’s dependencies/Events are met. • Dependencies/Events: • Data Dependencies • Resource Constrains (Threads/BW) • Desired Behavior (Power)

Matrix Multiply in SRAM using Petri nets Size: 192x192 TOKENS PLACE T INIT Clean 1024 Comp1 Low TRANSITION X = A B C

Matrix Multiply in DRAM T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 F Low

Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 Copy2 F Low Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1 Copy1 F Low 1 T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 Copy2 F Low High Init Set 1 ΔT Low Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1024 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 1 Copy2 F Low High Init Set 2 Low Comp1(1024) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1020 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 1024 Copy2 F Low High Low Comp1(1020) Comp2(1024) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 1024 Copy2 F Low High Clean Low Comp2(1024) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 1020 Copy2 F Low High Init Copy Set Low Comp2(1020) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 8 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 1000 Copy2 F Low High Copy (8) Low Comp2(1000) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 500 Copy2 F Low High Clean Low Comp2(500) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 495 Copy2 F Low High Done Low Comp2(495) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 490 Copy2 F Low High Init Set 1 Low Comp2(490) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1024 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 480 Copy2 F Low High Low Comp1(1024) Comp2(480) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1024 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 Copy2 F Low High Clean Low Comp1(1024) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1020 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 Copy2 1 F Low High Init Copy Set 2 Low Comp1(1020) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily

The Cool Demo on MxM • Due to Rishi Khan (ETI) UHPC-Portland-Meeting-06-2009

Final Results

Scalability

Summary • Static Optimizations increase performance substantially. • Dynamic Scheduling and Dynamic Percolation mitigates the unpredictable effects of resource sharing. • Optimizations implemented are also power efficient.

Acknowledgements • Professor Guang Gao • ETI and CAPSL people that have help on this project (Rishi Khan, Daniel Orozco, Kelly Livingston, Ioannis Venetis) • Members of CAPSL Tutorial Project Part 2

Dynamic Scheduling and Dynamic Percolation

Dynamic Scheduling and Dynamic Percolation

Presentation Transcript

Dynamic Scheduling System

Dynamic Scan Scheduling Specification

Instruction-Level Parallelism Dynamic Scheduling

Instruction-Level Parallelism dynamic scheduling

Dynamic Scheduling

Dynamic Scheduling

Dynamic scheduling

Static vs. dynamic scheduling

Lecture 5. Dynamic Scheduling II

Dynamic Scheduling Using Pools and Dynamic Pools in Dynamic Domains

Dynamic Scheduling of Network Updates

Dynamic Scheduling

Dynamic Scheduling

Tomasulo Dynamic Scheduling

Dynamic instruction scheduling

L14: Dynamic Scheduling

Dynamic Scheduling Using Tomasulo’s Approach

Dynamic scheduling

Chapter 3 – Dynamic Scheduling

L15: Dynamic Scheduling

Advantages of Dynamic Scheduling

Advantages of Dynamic Scheduling