Hardware/Software Codesign of Embedded Systems

Hardware/Software Codesign of Embedded Systems OPTIMIZATION II Voicu Groza SITE Hall, Room 5017 562 5800 ext. 2159 Groza@SITE.uOttawa.ca

Hill Climbing • Simulated Annealing • Tabu search

Hill Climbing (Gradient Descent) • Trying to maximize a function • Pick a random point in the search space. • Consider all the neighbours of the current state. • Choose the neighbour with the best quality and move to that state. • Repeat 2 through 4 until all the neighbouring states are of lower quality. • Return the current state as the solution state. http://www.ndsu.nodak.edu/instruct/juell/vp/cs724s00/hill_climbing/hill_climbing.html

Hill Climbing Algorithm Function HILL-CLIMBING(Problem) returns a solution state Inputs: Problem, problemLocal variables: Current, a node Next, a node Current = MAKE-NODE(INITIAL-STATE[Problem]) Loop do Next = a highest-valued successor of CurrentIf VALUE[Next] < VALUE[Current] then return CurrentCurrent = Next End (Russell, 1995)

Problems in Hill Climbing http://www.cs.nott.ac.uk/~gxk/courses/g5baim/Hill%20Climbing/Hill02-problems.html • Tendency to get stuck at foothills, a plateau or a ridge. If the algorithm reaches any of them, it will fail to find a solution. • Local maxima = states that are better than all its neighbours but are not better than some other states farther away. • Foothills are potential traps for the algorithm. • A plateau is a flat area of the search space in which a whole set of neighbouring states have the same value. On a plateau, it is not possible to determine the best direction in which to move by making local comparisons. • Possible solutions: • Try several runs, multi-start hill-climb: each starting at a random point in the search space. • Increase the size of the neighbourhood There is no guarantee to find a global optimum.

Simulated Annealing • Motivated by the physical annealing process • Material is heated and slowly cooled into a uniform structure • Simulated annealing mimics this process • The first SA algorithm was developed in 1953 (Metropolis) • Compared to hill climbing the main differences are that • SA allows downwards steps; • a move is selected at random and then decides whether to accept it • In SA better moves are always accepted. Worse moves are not always accepted!? Dr. Graham Kendall, The University of Nottingham, UK

To accept or not to accept - SA? P = exp(-c/t) > r where • c is change in the evaluation function • t the current temperature • r is a random number between 0 and 1 • Example • The law of thermodynamics states that at temperature t, the probability of an increase in energy of magnitude, δE, is given by • P(δE) = exp(-δE /kt) • where k is Boltzmann’s constant

To accept or not to accept - SA? • The probability of accepting a worse state is a function of both the temperature of the system and the change in the cost function • As the temperature decreases, the probability of accepting worse moves decreases • If t = 0, no worse moves are accepted (i.e. hill climbing)

SA Algorithm The most common way of implementing an SA algorithm is to implement hill climbing with an accept function and modify it for SA Function SIMULATED-ANNEALING(Problem, Schedule) returns a solution state Inputs: Problem, a problem Schedule, a mapping from time to temperature Local Variables : Current, a node Next, a node T=“temperature” controlling the probability of downward steps Current = MAKE-NODE(INITIAL-STATE[Problem]) Russell/Norvig

SA Algorithm For i = 1 todo T = Schedule[i] If T = 0 then return Current Next = a randomly selected successor of Current E = VALUE[Next] – VALUE[Current] ifE > 0 thenCurrent = Next elseCurrent = Next only with probability exp(- E/T)

SA Algorithm - Observations • The cooling schedule is hidden in this algorithm, and important are • Starting Temperature • Final Temperature • Temperature Decrement • Iterations at each temperature • The algorithm assumes that annealing will continue until temperature is zero - this is not necessarily the case

SA Cooling Schedule - Starting Temperature • Must be hot enough to allow moves to almost neighbourhood state (else we are in danger of implementing hill climbing) • Must not be so hot that we conduct a random search for a period of time • Problem is finding a suitable starting temperature

SA Cooling Schedule – Starting Temperature • Starting Temperature - Choosing • If we know the maximum change in the cost function we can use this to estimate • Start high, reduce quickly until about 60% of worse moves are accepted. Use this as the starting temperature • Heat rapidly until a certain percentage are accepted the start cooling

SA Cooling Schedule - Final Temperature • It is usual to let the temperature decrease until it reaches zeroHowever, this can make the algorithm run for a lot longer, especially when a geometric cooling schedule is being used • In practice, it is not necessary to let the temperature reach zero because the chances of accepting a worse move are almost the same as the temperature being equal to zero • => the stopping criteria can either be a suitably low temperature or when the system is “frozen” at the current temperature (i.e. no better or worse moves are being accepted)

SA Cooling Schedule - Temperature Decrement Theory states that: • should allow enough iterations at each temperature so that the system stabilises at that temperature • the number of iterations at each temperature to achieve this might be exponential to the problem size => compromise: • doing a large number of iterations at a few temperatures, or • a small number of iterations at many temperatures or • a balance between the two

SA Cooling Schedule - Temperature Decrement • Linear • temp = temp - x • Geometric • temp = temp * α • Experience has shown that α should be between 0.8 and 0.99, with better results being found in the higher end of the range. Of course, the higher the value of α, the longer it will take to decrement the temperature to the stopping criterion

SA Cooling Schedule - Iterations POSSIBLE APPROACHES: • A constant number of iterations at each temperature • Lundy, 1986: do only one iteration at each temperature, but to decrease the temperature very slowly • t = t/(1 + βt) • where β is a suitably small value • Dynamically change the number of iterations as the algorithm progresses: • at lower temperatures a large number of iterations are done so that the local optimum can be fully explored • at higher temperatures, the number of iterations can be less

Problem Specific Decisions • The cooling schedule is all about SA but there are other decisions which we need to make about the problem • These decisions are not just related to SA

Problem Specific Decisions - Cost Function • The evaluation function is calculated at every iteration • Often the cost function is the most expensive part of the algorithm => Need to evaluate the cost function as efficiently as possible • Use Delta Evaluation • Use Partial Evaluation • The cost function should be designed to lead the search • Avoid cost functions where many states return the same value. This can be seen as representing a plateau in the search space which the search has no knowledge about which way it should proceed

Problem Specific Decisions - Cost Function Many cost functions cater for the fact that some solutions are illegal. This is achieved using constraints • Hard Constraints : these constraints cannot be violated in a feasible solution • Soft Constraints : these constraints should, ideally, not be violated but, if they are, the solution is still feasible • Hard constraints are given a large weighting. The solutions which violate those constraints have a high cost function • Soft constraints are weighted depending on their importance • Weights can be dynamically changed as the algorithm progresses. This allows hard constraints to be accepted at the start of the algorithm but rejected later

Problem Specific Decisions - Neighbourhood • How do you move from one state to another? • When you are in a certain state, what other states are reachable? • Some results have shown that the neighbourhood structure should be symmetric. That is, if you move from state i to state j then it must be possible to move from state j to state i • However, a weaker condition can hold in order to ensure convergence. • Every state must be reachable from every other. Therefore, it is important, when thinking about your problem to ensure that this condition is met

Problem Specific Decisions - Performance • What is performance? • Quality of the solution returned • Time taken by the algorithm • We already have the problem of finding suitable SA parameters (cooling schedule)

Problem Specific Decisions - Initialisation • Start with a random solution and let the annealing process improve on that. • Might be better to start with a solution that has been heuristically built (e.g. for the TSP problem, start with a greedy search)

Problem Specific Decisions - Hybridisation … or mimetic algorithms: Combine two search algorithms • Often a population based search strategy is used as the primary search mechanism, and a local search mechanism is applied to move each individual to a local optimum • It may be possible to apply some heuristic to a solution in order to improve it

Example: VLSI Floorplanning with Simulated Annealing Modules are displayed as yellow rectangles; terminals are marked by small black rectangles. Connections between modules are displayed using red lines. This display of nets is sometimes known as a "rat's nest" diagram. On the left, it displays a diagram of a floorplan and its estimated cost. On the right, it displays annealing status information, controls, and a plot of maximum (red), average (black), and minimum (green) accepted configuration costs at each temperature. http://foghorn.cadlab.lafayette.edu/cadapplets/fp/fpApplet.html

SA Modifications – Acceptance Probability • The probability of accepting a worse move is normally based on the physical analogy (based on the Boltzmann distribution) • Is there any reason why a different function will not perform better for all, or at least certain, problems? • Why should we use a different acceptance criteria? • The one proposed does not work. Or we suspect we might be able to produce better solutions • The exponential calculation is computationally expensive. • (Johnson, 1991) found that the acceptance calculation took about one third of the computation time

SA Modifications – Acceptance Probability • Johnson experimented with P(δ) = 1 – δ/t • This approximates the exponential • A better approach was found by building a look-up table of a set of values over the range δ/t • During the course of the algorithm δ/t was rounded to the nearest integer and this value was used to access the look-up table • This method was found to speed up the algorithm by about a third with no significant effect on solution quality

SA Modifications - Cooling • If you plot a typical cooling schedule you are likely to find that at high temperatures many solutions are accepted • If you start at a too higher temperature a random search is emulated and until the temperature cools sufficiently any solution can be reached and could have been used as a starting position • At lower temperatures, a plot of the cooling schedule, is likely to show that very few worse moves are accepted; almost making simulated annealing emulate hill climbing • Taking this one stage further, we can say that simulated annealing does most of its work during the middle stages of the cooling schedule

SA Modifications - Cooling • Connolly suggested annealing at a constant temperature! • But what temperature? • It must be high enough to allow movement but not so low that the system is frozen • But, the optimum temperature will vary from one type of problem to another and also from one instance of a problem to another instance of the same problem • One solution to this problem is to spend some time searching for the optimum temperature and than stay at that temperature for the remainder of the algorithm • The final temperature is chosen as the temperature that returns the best cost function during the search phase

SA Modifications - Neighbourhood • The neighbourhood of any move is normally the same throughout the algorithm but… • The neighbourhood could be changed as the algorithm progresses • For example, a cost function based on penalty values can be used to restrict the neighbourhood if the weights associated with the penalties are adjusted as the algorithm progresses

SA Modifications – Cost Function • The cost function is calculated at every iteration of the algorithm • Various researchers (e.g. Burke,1999) have shown that the cost function can be responsible for a large proportion of the execution time of the algorithm • Some techniques have been suggested which aim to alleviate this problem: (Rana, 1996) - Coors Brewery • GA but could be applied to SA • The evaluation function is approximated (one tenth of a second) • Potentially good solution are fully evaluated (three minutes)

SA Modifications – Cost Function • (Ross, 1994) uses delta evaluation on the timetabling problem • Instead of evaluating every timetable as only small changes are being made between one timetable and the next, it is possible to evaluate just the changes and update the previous cost function using the result of that calculation • (Burke, 1999) uses a cache • The cache stores cost functions (partial and complete) that have already been evaluated • They can be retrieved from the cache rather than having to go through the evaluation function again

Tabu Search Overview • Tabu – socially or culturally proscribed: forbidden to be used, mentioned, or approached because of social or cultural rather than legal prohibitions. (http://encarta.msn.com/dictionary_1861698691/taboo.html) • The Tabu search proceeds according to the supposition that there is no point in accepting a new (poor) solution unless it is to avoid a path already investigated. • Tabu search is based on introducing flexible memory structures in conjunction with strategic restrictions (Tabu lists) and aspiration levels as a means for exploiting search spaces.

Characteristics • Always performs best move or least non-improving move • Employs a memory to diversifyor intensifythe search • Is deterministic (although probabilistic elements may exist) • Three main strategies: • Forbidding strategy: control what enters the tabu list • Freeing strategy: control what exits the tabu list and when • Short-term strategy: manage interplay between the forbidding strategy and freeing strategy to select trial solutions

Basic Ingredients of Tabu Search • To exploit memory in tabu search => classify a subset of the moves in a neighborhood as forbidden (or tabu)* • A neighborhood is constructed to identify adjacent solutions that can be reached from current solution. • The classification depends on the history of the search, and particularly on the recency or frequency that certain move or solution components, called attributes, have participated in generating past solutions*. • A tabu list records forbidden moves, which are referred to as tabu moves [5]. • Tabu restrictions are subject to an important exception. When a tabu move has a sufficiently attractive evaluation where it would result in a solution better than any visited so far, then its tabu classification may be overridden. A condition that allows such an override to occur is called an aspiration criterion*. * Glover, F. and Laguna, M., “Tabu Search.,” Norwell, MA: Kluwer Academic Publishers, 1997

Components of Tabu Search procedure tabu search begin initialize Tabu List (TL) = empty generate a (current} solution S let S* = S be the best solution so far repeat repeat select the best point B in the neighbourhood of S: N(S) if(B is not Tabu: B TL) accept move: S=B update Tabu List: TL = TL  B if(eval(B) > eval(S*)) S* = B else if (eval(B) > eval(S*)) accept move: S = B update best: S* = B end until (acceptable move found) until (halting-criterion) • Encoding • Initial solution • Objective Function • Move operator • Definition of Neighbourhood • Structure of Tabu list(s) • Aspiration criteria (optional) • Termination criteria

Components Initial Solution • Random • Constructive Memory Aspects • Recency (short term) How recently was I here? • Frequency (long term) How often have I been here ? • Quality (aspiration) How good is being here? • Influence (aspiration) How far away am I from where I have just been ?

Components Tabu List Specifics • The main objective of the tabu list is to avoid “cycles”, thus making a global optimizer rather than a local optimizer. • Length – fixed or dynamic (generally 7 to 20) • Content - “from” attributes, “to” attributes; “move” attributes; the more specific, the less restrictive • Frequency - tallying similar solutions through tabu content. Usually used in penalty form rather than strict tabu.

Components Neighbourhood • Complete (deterministic) • Partial (probabilistic) • First improvement • Only improving Aspiration criteria specifics • Best so far (Global) • Best in neighbourhood (Local) • Dissimilar to existing solution (diversification) • Similar to existing solution (intensification) • High influence – degree of change in structure or feasibility

Basic Tabu Search Algorithm • Step 1: Choose an initial solution i in S. Set i* = i and k=0. • Step 2: Set k=k+1 and generate a subset V* of solution in N(i,k) such that either one of the Tabu conditions is violated or at least one of the aspiration conditions holds. • Step 3: Choose a best j in V* and set i=j. • Step 4: If f(i) < f(i*) then set i* = i. • Step 5: Update Tabu and aspiration conditions. • Step 6: If a stopping condition is met then stop. Else go to Step 2.

Tabu Search Stopping Conditions Some immediate stopping conditions could be the following: • N(i, K+1) = 0. (no feasible solution in the neighborhood of solution i) • K is larger than the maximum number of iterations allowed. • The number of iterations since the last improvement of i* is larger than a specified number. • Evidence can be given than an optimum solution has been obtained. Hillier and Lieberman outlined the tabu search stopping criterion by, for example, using a fixed number of iterations, a fixed amount of CPU time, or a fixed number of consecutive iterations without an improvement in the best objective function value. Also stop at any iteration where there are no feasible moves into the local neighborhood of the current trial solution.

Flowchart of a Tabu Search Algorithm Initial solution (i in S) Create a candidate list of solutions Evaluate solutions Choose the best admissible solution Stopping conditions satisfied ? Update Tabu & Aspiration Conditions No Yes Final solution

Example : N-Queens Problem • Consists of placing n queens on an n n chessboard in such a way that no two queens capture each other (aspiration criterion) 1 2 3 4 Q 1 2 Q Q 3 Q 4

1 2 3 4 1 Q 2 Q 3 Q 4 Q Encoding • The n-queens problem can be represented as a permutation problem. • Let the queens be indexed by i • Mark queen iwith the row’s indexi • Let (i) be the index of the column where queen i is placed • A configuration is given by the permutation  = {(1),(2),…(n)} Q1: (1) = 3 Q2:(2) = 1 Q3:(3) = 4Q4:(4) = 2  = {3,1,4,2}

Neighbourhood • Swaps (pair-wise exchanges) can be used to define neighborhoods in permutation problems. Current solution  = 4 5 3 6 7 1 2 move operator Swap queens 2 and 6 4 1 3 6 7 5 2 ’ =

Structure of Tabu List • To prevent the search from repeating swapping combinations tried in the recent past, we will classify all swaps Tabu for three iterations. 2 3 4 5 6 7 1 Each cell contains the # iterations remaining until the corresponding queens are allowed to exchange positions again 2 * * 3 4 Tabu Memory 5 6

4 5 3 6 7 1 2 2 3 4 5 6 7 1 2 3 4 Tabu Memory 5 6 Iteration - 0 1 2 3 4 5 6 7 1 Q Current solution 2 Q 3 Q 4 Q 5 Q swap value 6 Q 1 7 -2 7 Q 2 4 -2 2 6 -2 # collisions = 4 5 6 -2 1 5 1 Top 5 (of 21) Candidates

1 2 3 4 5 6 7 1 Q 2 Q 4 2 5 5 3 3 6 6 7 7 1 1 2 4 3 Q 4 Q 5 Q 6 Q 7 Q Iteration - 1 Swap Q1 with Q7 From Iteration 0: Current solution: 2 3 4 5 6 7 swap value 1 3 2 4 -1 2 1 6 0 3 2 5 0 # collisions = 2 4 1 2 1 Tabu Memory 5 1 3 1 6 Top 5 (of 20) Candidates

1 2 3 4 5 6 7 1 Q 2 Q 2 2 6 5 3 3 5 6 7 7 1 1 4 4 3 Q 4 Q 5 Q 6 Q 7 Q Tabu # collisions = 1 Tabu Iteration - 2 swap Q2 with Q4 From Iteration 1: Current solution: 2 3 4 5 6 7 swap value 1 2 1 3 0 2 3 1 7 1 3 2 4 1 4 4 5 1 Tabu Memory 5 6 7 1 6 Top 5 (of 19) Candidates

2 3 6 6 2 3 5 5 7 7 1 1 4 4 Tabu Tabu Iteration - 3 1 2 3 4 5 6 7 swap Q1 with Q3 1 Q From Iteration 2: 2 Q 3 Q Current solution 4 Q 5 Q 2 3 4 5 6 7 swap value 6 Q 1 3 1 1 3 0 7 Q 2 2 1 7 0 3 5 7 1 # collisions = 1 4 6 7 1 Tabu Memory 5 1 2 2 6 Top 5 (of 18) Candidates

Hardware/Software Codesign of Embedded Systems