1 / 41

Low Power Hardware Synthesis from Concurrent Action Oriented Specifications (CAOS)

Low Power Hardware Synthesis from Concurrent Action Oriented Specifications (CAOS). Sandeep K. Shukla Gaurav Singh FERMAT Lab, Virginia Tech. Outline. CAOS Scheduling Problem Complexity Analysis Peak Power Problem Complexity Analysis Technique – Rescheduling ( suppressing actions )

talia
Télécharger la présentation

Low Power Hardware Synthesis from Concurrent Action Oriented Specifications (CAOS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Low Power Hardware Synthesis from Concurrent Action Oriented Specifications (CAOS) Sandeep K. Shukla Gaurav Singh FERMAT Lab, Virginia Tech.

  2. Outline • CAOS Scheduling Problem • Complexity Analysis • Peak Power Problem • Complexity Analysis • Technique – Rescheduling ( suppressing actions ) • Dynamic Power Problem • Complexity Analysis • Techniques – Rescheduling, Operand Isolation, Clock Gating, Gated Guards. FERMAT / Virginia Tech

  3. CAOS Scheduling Problem ( Complexity Analysis ) FERMAT / Virginia Tech

  4. SCHEDULING PROBLEMS WITHOUT A PEAK POWERCONSTRAINT • Maximum Non-conflicting Subset of actions (MNS) • Choosing actions which can execute in a clock cycle. • Minimum Length Schedule Construction (MLS) • Distributing actions over multiple clock cycles. FERMAT / Virginia Tech

  5. MAXIMUM NON-CONFLICTING SUBSET OF ACTIONS (MNS) Instance - Set A = {a1,a2, …, an}of enabled actions; a collection C of pairs of actions, where {ai, aj}Є C means that actions aiand ajconflict; an integer K ≤ n. Question - Is there subset A’ C A such that |A’| > K and no pair of actions in A’ conflict? • MNS problem is NP-Complete. • Corresponds to Maximum Independent Set (MIS) Problem. FERMAT / Virginia Tech

  6. MAXIMUM NON-CONFLICTING SUBSET OF ACTIONS (MNS) NOTE - For any ρ ≥ 1, a ρ-approximation algorithm for a combinatorial optimization problem is a heuristic that produces a solution which is within a factor ρof the optimal solution value. • It is known that for any Є > 0, there is no O(n1- Є) - approximation algorithm for the MIS problem, unless P = NP. • Same holds for MNS Problem. FERMAT / Virginia Tech

  7. MAXIMUM NON-CONFLICTING SUBSET OF ACTIONS (MNS) SOLUTION - Heuristics with good performance guarantees can be devised by exploiting the relationship between MNS and MIS problems. • SPECIAL CASES – • Each action conflicts with at most Δ other actions for some constant Δ- • Approximation algorithm exists that provides a performance guarantee of Δ+1. • Planar graphs, near-planar graphs and unit disk graphs- • Efficient approximation algorithms are known for such classes of graphs. FERMAT / Virginia Tech

  8. MINIMUM LENGTH SCHEDULE CONSTRUCTION (MLS) Instance - Set A = {a1, a2,…,an}of actions; a collection C of pairs of actions, where {ai, aj}Є C means that actions ai and aj conflict, an integer t ≤ n. Question - Is there a partition of A into r subsets A1, A2,...,Ar for some r ≤ t such that for each i, 1 ≤ i ≤ r, the actions in Ai are pair-wise non-conflicting? • MLS problem is NP-Complete. • Corresponds to Minimum K-coloring (MINCOLOR) Problem. FERMAT / Virginia Tech

  9. MINIMUM LENGTH SCHEDULE CONSTRUCTION (MLS) • It is known that for any Є > 0, there is no O(n1- Є) - approximation algorithm for MINCOLOR problem, unless P = NP. • Same holds for MLS Problem. FERMAT / Virginia Tech

  10. MINIMUM LENGTH SCHEDULE CONSTRUCTION (MLS) SOLUTION – Heuristics for graph coloring can be used in constructing schedules of near-minimum length. • SPECIAL CASES – • Upper bound on the length of schedule is two - • Corresponds to the problem of determining whether a graph is 2-colorable. • Efficient algorithms are known. • Each action conflicts with at most Δ other actions – • For such instances, a schedule of length at most Δ + 1 can be constructed in polynomial time. FERMAT / Virginia Tech

  11. PEAK POWER PROBLEM ( Complexity Analysis ) FERMAT / Virginia Tech

  12. SCHEDULING PROBLEMS INVOLVING A POWERCONSTRAINT Single Clock Cycle – • Maximum Number of Actions in a Time Slot Subject to Peak Power Constraint (MNA-PP). • Maximizing Utility Subject to Peak Power Constraint (MU-PP). FERMAT / Virginia Tech

  13. Maximum Number of Actions in a Time Slot Subject to Peak Power Constraint (MNA-PP). Instance – • set A = {a1, a2,…, an}of non-conflicting actions, • for each action ai, the power pineeded to execute that action, • a positive number P representing the peak power constraint. Requirement -Find a subset A’ C A such that - • total power needed to execute actions in A’ is at most P and • |A’| is a maximum over all subsets of A that satisfy peak power constraint. Optimal Solution - • Sort actions in A into non-decreasing order by the amount of power. • Keep adding actions in order as long as the peak power constraint is satisfied. FERMAT / Virginia Tech

  14. Maximizing Utility Subject to Peak Power Constraint (MU-PP) Instance – • set A = {a1, a2,…,an}of non-conflicting actions, • for each action ai, its power piconsumed and its utility ui, • a positive number P representing the peak power, • a positive number Γ representing the required utility. Question - Is there a subset A’ C A such that the total power needed to execute all the actions in A’ is at most P and the utility of A’ is at least Γ ? • MU-PP problem is NP-Complete. • Corresponds to KNAPSACK Problem. FERMAT / Virginia Tech

  15. Maximizing Utility Subject to Peak Power Constraint (MU-PP) • Any approximation algorithm for the KNAPSACK problem can be used as an approximation algorithm with the same performance guarantee for the optimization version of MU-PP • When the weights and profits are integers, there is a polynomial time approximation scheme (PTAS) for the KNAPSACK problem. FERMAT / Virginia Tech

  16. SCHEDULING PROBLEMS INVOLVING A POWERCONSTRAINT Multiple Clock Cycles – • Minimizing Makespan Subject to Peak Power Constraint (MM-PP). • Minimizing Peak Power Subject to Makespan Constraint (MPP-M). • Minimizing Makespan and Peak Power – Decision Version (MPP-DECISION) FERMAT / Virginia Tech

  17. Minimizing Makespan Subject to Peak Power Constraint (MM-PP) Instance – • set A = {a1, a2,…,an}of non-conflicting actions, • for each action ai, the power pineeded to execute that action, • a positive number P representing the peak power Requirement – Find a schedule of minimum length for the actions in A such that the total power needed to execute the actions in each time slot is at most P. FERMAT / Virginia Tech

  18. Minimizing Peak Power Subject to a Makespan Constraint (MPP-M) Instance – • set A = {a1, a2,…,an}of non-conflicting actions, • for each action ai, the power pi needed to execute that action, • a positive number L representing the makespan (number of slot used by a schedule). Requirement – Find a schedule of length at most L for the actions in A such that the maximum total power used in any time slot is a minimum over all schedules of length at most L. NOTE -MPP-M is dual of MM-PP. FERMAT / Virginia Tech

  19. Minimizing Makespan and Peak Power (MPP-DECISION)– Decision Version of MM-PP and MPP-M. Instance – • set A = {a1, a2,…,an}of non-conflicting actions, • for each action ai, the power pineeded to execute that action, • a positive number P representing the peak power, • a positive number L representing the makespan. Question – Is there a schedule of length at most L for the actions in A such that the total power used in any time slot is at most P ? • MPP-DECISION problem is Strongly NP-Complete. • Corresponds to 3-PARTITION problem. • No pseudo-polynomial algorithm for the MPP-DECISION problem, unless P = NP. FERMAT / Virginia Tech

  20. Approximation Algorithms for MM-PP • Efficient approximation algorithms possible by reducing the problem to the well known BIN PACKING problem. • Example - Simple algorithm called First Fit Decreasing (FFD) provides a performance guarantee of 11/9. • Sort items in non-increasing order of their sizes and then assign each item to the first bin in which it will fit. • Sophisticated implementation reduces the running time to O(n log n). FERMAT / Virginia Tech

  21. Approximation Algorithms for MPP-M • Efficient approximation algorithms possible by reducing the problem to classical multiprocessor scheduling problem. • Example – • 4/3 approximation algorithm - • Sort the actions in non-increasing order of their power requirements. • Assign each action to a time slot for which the total power used is the smallest at that time. • Can be implemented to run in O(n log n) time. FERMAT / Virginia Tech

  22. LOW PEAK POWER TECHNIQUE Re-scheduling –Suppress some actions in each cycle to reduce peak power of the design. Possible Ways – • Conflict - based • Add extra conflicts for peak power sake. • Memory - based • Use memory to select how many actions to execute in each cycle. FERMAT / Virginia Tech

  23. MEMORY-BASED LOW PEAK POWER TECHNIQUE ALGORITHM - • Arrange actions based on their TRS ordering. • Find possible combinations of non-conflicting actions which can violate the peak power constraint when executed concurrently. • For each violating combination - • find a satisfying combination by suppressing some actions. • give priority to actions which come earlier in TRS-ordering. • store the satisfying combinations in a memory. • In hardware, memory is used to execute appropriate actions in each clock cycle in order to satisfy the peak power constraint. FERMAT / Virginia Tech

  24. MEMORY-BASED LOW PEAK POWER TECHNIQUE Implemented in Bluespec Compiler – • Around 10% peak-power savings achieved for small designs like Vending Machine. • Larger power savings may be possible for larger designs • Experiments Ongoing. FERMAT / Virginia Tech

  25. MEMORY-BASED LOW PEAK POWER TECHNIQUE LIMITATIONS - • Some designs written under the assumption that maximum number of actions will execute in each clock cycle might not be able to use this technique. • Increases latency so applicable mostly to latency-insensitive designs. • Designs with large number of actions may result in a big memory. FERMAT / Virginia Tech

  26. DYNAMIC POWER PROBLEM ( Complexity Analysis ) FERMAT / Virginia Tech

  27. DYNAMIC POWER PROBLEM (DPP) Instance – - set A = {a1, a2,…,an}of actions. - a positive integer P representing dynamic power consumed. Requirement - Select the ordering of execution of actions in A such that P is minimized. • DPPisNP-Complete. • Corresponds to Traveling Salesman Problem - sub-problem to DPP. FERMAT / Virginia Tech

  28. LOW DYNAMIC POWER TECHNIQUES • Re-scheduling. • Operand Isolation. • Clock Gating. • Gated Guards. FERMAT / Virginia Tech

  29. RE-SCHEDULING • Actions can be re-scheduled such that switching at the inputs of the functional units is minimized. • Resource sharing - Conflicts can be created such that same functional units can be shared among actions consisting of same operations on same operands. FERMAT / Virginia Tech

  30. OPERAND ISOLATION • Operand Isolation – • Computation corresponding to the body of an action is allowed only when its output is used in the present clock cycle. • Involves - • Insertion of gates at the appropriate points without affecting guards. • Selection of activation signal. • Guards of actions used as gating signals. • Implemented algorithm in Bluespec Compiler saved upto 25% dynamic power. FERMAT / Virginia Tech

  31. OPERAND ISOLATION – SINGLE ACTION action foo (… cond … (x < y) …); x <= x + z … endrule Computations stay quiescent except when action executes, i.e. guard is True x x’ action foo y y’ next-state values Φ2 z z’ next state Q D body logic current state EN cond logic enablesignals FERMAT / Virginia Tech

  32. D Q Enable OPERAND ISOLATION – MULTIPLE ACTIONS Isolating multiple actions of a design. Rule1 Rule Control State DataSelect RuleN Φ2 Action1 ΦN ActionN Cond1 Scheduler CondN FERMAT / Virginia Tech

  33. REGISTER CLOCK GATING • Register Clock-gating - • Registers having a common ENABLE signal can be provided the same gated clock. • CAOS - Registers being updated by same set of actions can be passed the same gated clock. • Implemented algorithm in Bluespec Compiler saved upto 45% dynamic power. FERMAT / Virginia Tech

  34. REGISTER CLOCK GATING In CAOS, guards of the actions provide the control for gating the clocks of the registers. CLK Register DIN EN QOUT GATED_CLK GATED_CLK EN CLK FERMAT / Virginia Tech

  35. GATED GUARDS • In hardware, only required guards should be computed in each clock cycle for power sake. • Static analysis can be done to figure out which guards should be computed. FERMAT / Virginia Tech

  36. Gated Guards • Rule 1: (x > y) && (y != 0) --> (x = y; y = x;) • Rule 2: (x <= y) && (y != 0) --> (y = y - x;) • Rule 3: (y == 0) --> (result = x;) Let P = ( x > y) ; Q = (y == 0); Then g1: P && !Q g2: !P && !Q g3: Q ------------------------------------------ g1 && g2 = false; g1 && g3 = false; g3 && g1 = false FERMAT / Virginia Tech

  37. Gated Guards What else can we infer? (x > y), (y != 0), (x’ == y), (y’ == x) ------------------------------------------------------ (x’ <= y’) && (y’ != 0) OR (y == 0) So after Rule 1 execution, we know for sure, G1 cannot be true, but G2 or G3 may be true, and hence G1 need not be evaluated. Also prioritize G3. FERMAT / Virginia Tech

  38. Gated Guard • Gcd (70, 42) • x = 70, y = 42 --> Rule 1 • x = 42, y = 70 --> Rule 2 • x = 42, y = 28 --> Rule 1 • x = 28, y = 42 --> Rule 2 • x = 28, y = 14 --> Rule 1 • x = 14, y = 28 --> Rule 2 • x = 14, y = 14 --> Rule 2 • x = 14, y = 0 --> Rule 3 • result = 14 FERMAT / Virginia Tech

  39. Gated Guard • Use a F/F that gets value 1, when Rule 1 is fired, and becomes 0, when other rules are fired. • If this F/F holds a value 1, evaluate only G3 and then G2. • Unless Rule 1 is fired, this F/F stays at 0, and hence can be clock gated most of the time. • This example may not be very useful, as the guards are simple to evaluate, but guard calculus on complex guards can lead to savings. FERMAT / Virginia Tech

  40. GATED GUARDS • Theorem proving techniques can be used for deductions. • Such analysis can be done for more complicated designs. • A memory in hardware can be used to store the information about which guards need not be computed in the present clock cycle. FERMAT / Virginia Tech

  41. Thank You !! ? FERMAT / Virginia Tech

More Related