EE5900 Advanced Embedded System For Smart Infrastructure

EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling

Time Frame • Given a set of tasks, let H denote the smallest hyper period of all tasks. • T1=(1,4), T2=(1.8,5), T3=(1,20), T4=(2,20) • H=20 • Divide time into frames and frame size f should divide H. • f could be 2,4,5,10,20 • Choose small frame size since this will make the scheduling solution more useful

Network flow formulation • Denote all the tasks as J1,J2,…,Jn • Vertices • N job vertices • H/f time frame vertices • Source • Sink • Edges • Source to job vertex with capacity set to execution time ei • Job vertex to time frame vertex with capacity f if the job can run in the time frame • Time frame to sink with capacity f

Flow network

Computing scheduling • If the obtained maximum flow is equal to the sum of execution time of all tasks, then the task set is schedulable.

Flow network • Given a directed graph G • A source node s • A sink node t Goal: To send as much information from s to t

Flows An s-t flow is a function f which satisfies: (capacity constraint) (conservation of flows (at intermediate vertices)

Value of the flow Maximum flow problem: maximize this value 3 4 G: 9 10 7 6 6 8 10 10 2 0 9 10 9 10 s t 10 9 Value = 19

Cuts • An s-t cut is a set of edges whose removal disconnect s and t • The capacity of a cut is defined as the sum of the capacity of the edges in the cut Minimum s-t cut problem: minimize this capacity of a s-t cut

Flows ≤ cuts • Let C be a cut and S be the connected component of G-C containing s.

Main result • Value of max s-t flow ≤ capacity of min s-t cut • (Ford Fulkerson 1956) Max flow = Min cut • A polynomial time algorithm

Greedy method? • Find an s-t path where every edge has f(e) < c(e) • Add this path to the flow • Repeat until no such path can be found. • Does it work?

A counterexample 20 10 30 10 20 The greedy algorithm produces a flow of value 20 while the maximum flow has value of 30.

Residual graph • Key idea allow flows to push back f(e) = 2 Can send 8 units forward or push 2 units back. c(e) = 10 c(e) = 8 Advantage of this representation is not to distinguish send forward or push back c(e) = 2

Ford-Fulkerson Algorithm • Start from an empty flow f • While there is an s-t path P in residual graph update f along the original graph • Return f

Ford-Fulkerson Algorithm 0 flow 2 4 4 capacity G: 0 0 0 6 0 8 10 10 2 0 0 0 0 10 s 3 5 t 10 9 Flow value = 0

8 X 8 X 8 X Ford-Fulkerson Algorithm 0 flow 2 4 4 capacity G: 0 0 0 6 0 8 10 10 2 0 0 0 0 10 s 3 5 t 10 9 Flow value = 0 2 4 4 residual capacity Gf: 6 8 10 10 2 10 s 3 5 t 10 9

10 X X 2 10 X 2 X Ford-Fulkerson Algorithm 0 2 4 4 G: 0 8 8 6 0 8 10 10 2 0 0 8 0 10 s 3 5 t 10 9 Flow value = 8 2 4 4 Gf: 8 6 8 10 2 2 10 s 3 5 t 2 9 8

6 X X 6 6 X 8 X Ford-Fulkerson Algorithm 0 2 4 4 G: 0 10 8 6 0 8 10 10 2 2 0 10 2 10 s 3 5 t 10 9 Flow value = 10 2 4 4 Gf: 6 8 10 10 2 10 s 3 5 t 10 7 2

2 X 8 X X 0 8 X Ford-Fulkerson Algorithm 0 2 4 4 G: 6 10 8 6 6 8 10 10 2 2 6 10 8 10 s 3 5 t 10 9 Flow value = 16 2 4 4 Gf: 6 6 8 4 10 2 4 s 3 5 t 10 1 6 8

3 X 9 X 7 X 9 X 9 X Ford-Fulkerson Algorithm 2 2 4 4 G: 8 10 8 6 6 8 10 10 2 0 8 10 8 10 s 3 5 t 10 9 Flow value = 18 2 2 4 2 Gf: 8 6 8 2 10 2 2 s 3 5 t 10 1 8 8

Ford-Fulkerson Algorithm 3 2 4 4 G: 9 10 7 6 6 8 10 10 2 0 9 10 9 10 s 3 5 t 10 9 Flow value = 19 3 2 4 1 Gf: 9 1 6 7 1 10 2 1 s 3 5 t 10 9 9

Ford-Fulkerson Algorithm 3 2 4 4 G: 9 10 7 6 6 8 10 10 2 0 9 10 9 10 s 3 5 t 10 9 Cut capacity = 19 Flow value = 19 3 2 4 1 Gf: 9 1 6 7 1 10 2 1 s 3 5 t 10 9 9

Max-flow min-cut theorem • Consider the set S of all vertices reachable from s • s is in S, but t is not in S • No incoming flow coming in S (otherwise push back) • Achieve full capacity from S to T Min cut!

Integrality theorem • If every edge has integer capacity, then there is a flow of integer value.

Complexity • Assume edge capacity between 1 to C • At most mC iterations • Finding an s-t path can be done in O(m) time • Total running time O(m2C)

Speedup with capacity scaling • Capacity scaling to find paths with large capacity • Find 2p-1 C  2p • For i from p-1 to 0 • Compute the graph with edge capacity at least 2i • Find maximum flow there • At iteration i, there are at most m edges, the capacity of the min cut is at most m2i+1 and each augmenting path has flow value at least 2i, so there are at most 2m augmentations. Runtime is bounded in O(m2logC).

Speedup with BFS • In each iteration, compute the breadth first search in the residual graph and choose the path with fewest edges. • Let leveli(v) denote the distance from s to v in the residual graph. • Leveli(v) cannot decrease during iterations. Prove by induction. • Suppose that in the i+1 iteration, edge u->v is picked in the residual graph for pushing flow. If u->v is an edge in the residual graph in last iteration, leveli(u)+1=leveli(v)<=leveli+1(u)+1=leveli+1(v) by induction • Otherwise, v->u is in the augmenting path of iteration i, which means that it is along the shortest path, so leveli(v)=leveli(u)-1<leveli(u)+1<=leveli+1(u)+1=leveli+1(v) • Each edge cannot appear and disappear many times. • Given a consecutive disappearance in Gi and appearance in Gj of an edge u->v in two residual graphs. u->v is on the augmenting path of Gi and v->u is on the augmenting path of Gj, so leveli(u)+1=leveli(v) and levelj(v)+1=levelj(u). Note that levelj(v)>=leveli(v). We have levelj(u)>=leveli(u)+2.

Speedup with BDF (2) • Distance from s to u increases by at least 2 for disappearance and appearance. The level is at most n, so the number of disappearance is bounded by n/2. • Each edge can disappear at most n/2 times, totally m edges which means that the total disappearance is nm/2 • At least one edge disappears, so at most nm/2 iterations • Total runtime O(nm2)

Precedence and nonpreemption • Suppose that J1 needs to be scheduled before J2, then make sure that the release time of J1 is before J2. In the resulting schedule, if (part of) J1 is scheduled after (part of) J2, then just swap them. • Nonpreemption cannot be handled and it is NP-hard.

NP completeness proof • Reduce from 3-partition problem • Given a set S of 3m elements where each element a has a value v(s) and ∑s ∈S v(s)=mB, one asks whether S can be partitioned into m disjoint subsets S1,S2,…,Sm such that for each subset ∑ s ∈ Si v(s)=B?

Reduction • Given an instance of 3-partition, form an instance of nonpreemptive scheduling problem which contains 3m+1 tasks, T1,T2,…,T3m+1 as follows. • For each element si, create a task Ti with p=d=mB+m and c=v(si). • Create a task T3m+1 with p=B+1 and d=c=1. • We claim that the task set is schedulable if and only if the 3-partition instance is feasible.

Only if direction • When the task set is schedulable • Task T3m+1 is scheduled at time 0, B+1, 2(B+1), … • Consider the hyper period mB+m. All of the first 3m tasks need to be scheduled within it. • During this hyper period, T3m+1 has run for m times with total time m. • Thus, mB time is for all other tasks. • The available time between the first and the second T3m+1 is B. • The task set between them has total time bounded by B. Let S1 denote the corresponding set in S, so ∑ s ∈ S1 v(s) ≦ B • Similarly, ∑ s ∈ Si v(s) ≦ B for all 1 ≦ i ≦ m since T3m+1 has run for m times • On the other hand, ∑ s ∈ S1 v(s) + ∑ s ∈ S2 v(s) +…+ ∑ s ∈ Smv(s)=mB. One has that each ∑ s ∈ Si v(s)=B.

If direction • When there is a feasible 3-partition solution, • One can schedule T3m+1 at time 0, B+1, 2(B+1),… • One then puts the other tasks according to the 3-partition solution

3-partition • First show that numerical 4DM is NP-complete. Reduce from 3DM. • 4DM problem says that given four sets S1,S2,S3,S4, each of which consists of some distinct elements, and a collection C=S1⨯S2⨯S3⨯S4, one asks whether there exists a subcollection C’ to partition the union of four sets and the sum of values of each set in C’ is B.

Reduce from 3DM to numerical 4DM • Create four elements for each candidate set (xa,yb,zc) in M. e1 in S1, e2 in S2, e3 in S3 and e4 in S4. • If xa is in the candidate set, create an element e1 with value either 2q3+aq2 (core) or aq2 (dummy). • If yb is in the candidate set, create an element e2 with value either bq (core) or q3+bq (dummy). • If zc is in the candidate set, create an element e3 with value either c (core) or q3+c (dummy). • create an element e4 with value 2q3-aq2 -bq-c. • If there is only one occurrence of a variable (e.g., x1) in M, then there is only one core element generated. • If there are k occurrences (e.g., z7) in M, then there are k elements generated where contains one core element and k-1 dummy elements. Note that different elements can have the same value. • Candidate sets in 4DM is created such that it contains either all core elements or all dummy elements. Enumerate all possible candidate sets. • Set B=4q3.

Reduction example • Suppose that the candidate sets M in 3DM is as follows. • (x1,y5,z7), (x2,y2,z7), (x2,y5,z5) … • (x1,y5,z7) produces e11 with value 2q3+q2, e21 with value 5q, e31 with value q3+7, e41 with value 2q3-q2-5q-7. • (x2,y2,z7) produces e12 with value 2q3+2q2, e22 with value 2q, e32 with value 7, e42 with value 2q3-2q2-2q-7. • (x2,y5,z5) produces e13 with value 2q2, e23 with value q3+5q, e33 with value 5, e43 with value 2q3-2q2-5q-5. • If (x1,y5,z7) is picked in M, we pick (e11 e21 e32 e41). • Since (x2,y2,z7), (x2,y5,z5) are not picked, we pick (2q2 q3+2q e31 e42) and (2q2 e23 e33 e43). • The elements with values are those generated from other candidate sets in M. • e12 e22 e13 are not picked and they will be picked corresponding to some sets picked in M.

If direction • When there is solution of 3DM problem, • If a set is picked in 3DM, the corresponding core set is picked in numerical 4DM. Otherwise, the corresponding dummy set is picked. • Each variable is picked exactly once in 3DM, so each core element is picked exactly once. Note that core elements generated from multiple sets in M could be combined together and picked (since we enumerate candidate sets in numerical 4DM). • Given k occurrences of a variable in M, they are in k candidate sets in M. One of them is picked (so is the corresponding core element), and k-1 of them is not picked (so the corresponding k-1 dummy elements are picked). Thus, each generated element is picked exactly once. There is only one e4 for each set in M, which will be used to make the sum of values 4q3. • This is the subcollection of sets to partition the union of four sets and each set with the sum of values to be B.

Only if direction • Given a solution to numerical 4DM, each core element is covered exactly once. There exists sets which contain only the core elements and one can pick the corresponding sets in M.

EE5900 Advanced Embedded System For Smart Infrastructure