290 likes | 420 Vues
“Rate-Optimal” Resource-Constrained Software Pipelining. Objectives. Establish good bounds Help compiler writers Help architects Study pragmatic issues when implemented as an option of compilers Study its payoffs. A Motivating Example. for i = 0 to n do
E N D
“Rate-Optimal” Resource-Constrained Software Pipelining \course\cpeg421-08s\Topic-7a.ppt
Objectives • Establish good bounds • Help compiler writers • Help architects • Study pragmatic issues when implemented as an option of compilers • Study its payoffs \course\cpeg421-08s\Topic-7a.ppt
A Motivating Example for i = 0 to n do 0: a [i] = X + d [i - 2]; 1: b [i] = a [i] * F + f [i - 2] + e [i - 2]; 2: c [i] = Y - b [i]; 3: d [i] = 2 * c [i]; 4: e [i] = X - b [i]; 5: f [i] = Y + b [i] end \course\cpeg421-08s\Topic-7a.ppt
2 L: for ( i = 0; i < n; i ++) { 3 0 S : a [i] = X + d [i - 2]; 0 S : b [i] = a [i] * F + f [i - 2] + e [i - 2]; 1 1 2 S : c [i] = Y - b [i]; 2 S : d [i] = 2 * c [i]; 2 2 3 S : e [i] = X - b [i]; 4 4 5 S : f [i] = Y + b [i] 5 } Data Dependence Graph Program Representation Assume all statements have a delay = 1 \course\cpeg421-08s\Topic-7a.ppt
Question • How fast can L run (I.e.optimal computation rate) without resource constraints? • How many FUs it needs minimally to achieve the optimal rate? \course\cpeg421-08s\Topic-7a.ppt
Schedule A: t (i, Sj) = 2i + tsj where: ts0 = 0 ts1 = 1 ts2 = ts4 = ts5 = 2 ts3 = 3 iter #1 #2 #3 . . . 0 S 0 1 S 1 2 S , S , S S 2 4 5 0 3 S S 3 1 4 S , S , S S 2 4 5 0 5 S S 3 1 6 S , S , S 2 4 5 7 S . . . 3 Schedule A : # of FUs = 4 \course\cpeg421-08s\Topic-7a.ppt
Can we do better? \course\cpeg421-08s\Topic-7a.ppt
Schedule B : t (i, Sj) = 2i + tsj where: ts0 = 0 ts1 = 1 ts2 = ts4 = 2 ts3 = ts5 = 3 iter #1 #2 #3 . . . 0 S 0 1 S 1 2 S , S S 2 4 0 3 S S S 3, 5 1 4 S , S S 2 4 0 5 S S S 3, 5 1 6 S , S 2 4 7 S S . . . 3, 5 Schedule B : # of FUs = 3 \course\cpeg421-08s\Topic-7a.ppt
Problem Statements Assumptions: homogeneous function units Problem 0: Given a loop L, determine a “rate-optimal” schedule for L which uses minimum # of FUs. Problem I (FIxed Rate SofTware Pipelining with minimum Resource - FIRST): Given a loop L and a fixed initiation rate determine a schedule for L which uses minimum # of FUs. Problem II (REsource Constrained SofTware Pipelining - REST) Given a loop L and the # of Fus, determine an optimal schedule which can run under the given resource constraints. \course\cpeg421-08s\Topic-7a.ppt
Problem Formulation of FIRST How to formulate resource constraints into a linear form? \course\cpeg421-08s\Topic-7a.ppt
R = Max (width) Min R • Subject to: • all dependence constraints • R is the maximum width • for a fixed rate (or period) The SWP kernel: --- The “frustum” \course\cpeg421-08s\Topic-7a.ppt
i time 0 1 2 . . . N-1 r 0 1 1 1 2 1 3 1 . . . II-1 1 Min (max ari) R Frustum A Contribution of node i to step r’s resource requirement Now R = max ari r [ 0, II - 1] \course\cpeg421-08s\Topic-7a.ppt
FIRST: A Linear Programming Formulation Example 0 1 2 3 4 5 a00 a01 a02 a03 a04 a05 a10 a11 a12 a13 a14 a15 So for node I: ti = ki II + (aoi , a1i ) * 0 1 Where ari = 1 means node i start at step r = 0 otherwise \course\cpeg421-08s\Topic-7a.ppt
FIRST: A Linear Programming Formulation Minimize R Subject to R: Max T T = T Periodic Dependence Constraints and are integers \course\cpeg421-08s\Topic-7a.ppt
Example Revisited Min R Subjected to: - + + + + + ³ R ( a a a a a a ) 0 00 01 02 03 04 05 - + + + + + ³ R ( a a a a a a ) 0 10 11 12 13 14 15 × + × + × = 2 k 0 a 1 a t 0 00 10 0 × + × + × = 2 k 1 0 a 1 a t 1 01 11 . . × + × + × = 2 k 0 a 1 a t 5 05 15 5 + = a a 1 00 10 + = a a 1 01 11 . . + = a a 1 05 15 - . . t t ³ 1 0 1 . . - t t ³ - 1 4 3 - t t ³ - 1 5 3 . . ³ ³ ³ t 0 , k 0 , a 0 1 i ri \course\cpeg421-08s\Topic-7a.ppt
Solution Schedule C t(i, s) = 2i + ts where t0 = 0 t1 = 1 t2 = 2 t3 = 3 t4 = 3 t5 =2 so, # of FUs = 3! \course\cpeg421-08s\Topic-7a.ppt
Linear Programming • A Linear Program is a problem that can be expressed in the following form: • minimize cx • subject to • Ax = b • x >= 0 • where x: vector of variables to be solved • A: matrix of known coefficients • c, b: vectors of known coefficients • cx: it is called the objective function • Ax=b is called the constraints \course\cpeg421-08s\Topic-7a.ppt
Linear Programming • “programming” actually means “planning” here. • Importance of LP • many applications • the existence of good general-purpose techniques for finding optimal solutions \course\cpeg421-08s\Topic-7a.ppt
Integer Linear Programming(ILP) • Integer programming have proved valuable for modeling many and diverse types of problems in : • planning, • routing, • scheduling, • assignment, and • design. • Industry applications: • transportation, energy, telecommunications, and manufacturing \course\cpeg421-08s\Topic-7a.ppt
Integer Linear Programming • Classification: • mixed integer (part of variables) • pure integer (all variables) • zero-one (only takes 0 or 1) • Much harder to solve • Many existing tools and product • solve the LP/ILP problem and get optimal solution • help to model the problem, and then solve the problem \course\cpeg421-08s\Topic-7a.ppt
Quiz How to handle cases where di 1 for node i ? >= \course\cpeg421-08s\Topic-7a.ppt
The “Trick” At time step r, the # of FUs required Contributed by node i {Started at steps in [0, (t + di-1)%II]} a((r - 1)%II)i Note: if actor i requires a FU at time r or 0 otherwise. So objective function becomes: Max Min \course\cpeg421-08s\Topic-7a.ppt
How to extend the FIRST formulation to heterogeneous function units? \course\cpeg421-08s\Topic-7a.ppt
Mk - 0 FU(i) = k Heterogeneous Function Units Solution Hints: “weighted Sum”! For FU of type k: Mk = max " Î - r [ 0 , II 1 ] and so, the objective should be min Ck Mk Where h is the total # of FU types " Î - " Î - k [ 0 , h 1 ], r [ 0 , II 1 ] \course\cpeg421-08s\Topic-7a.ppt
Solution of the REST Resource - constraint rate-optimal software pipelining problem \course\cpeg421-08s\Topic-7a.ppt
Hints Based on scheme for “FIRST” and play “trial-and improve” \course\cpeg421-08s\Topic-7a.ppt
Solution Space for REST MII = max { RecMII, ResMII} 1 2 \course\cpeg421-08s\Topic-7a.ppt
Max cycles C d(C) m(C) RecMII = # of nodes # of FUs ResMII = MII = max {RecMII, ResMII } Note: The bounds of initiation interval 1. Loop-carried dependence constraints: 2. Resource constraints As a result: \course\cpeg421-08s\Topic-7a.ppt
Other Work • Extend the technique to “more benchmarks” and establish “bounds” • Extend the framework to include register constraints • Extend the model for pipelined architectures (with structural “hazards”) • Extend the model to consider superscalar and multithreaded with dynamic scheduling support. \course\cpeg421-08s\Topic-7a.ppt