Parallelization of Stochastic Evolution for Cell Placement

Parallelization of Stochastic Evolution for Cell Placement Committee Members: Dr. Sadiq M. Sait Dr. Aiman El-Maleh Dr. Mohammed Al-Suwaiyel Khawar S. Khan Computer Engineering MS Thesis Presentation

Outline • Problem Focus • Brief Overview of Related Concepts • Motivation & Literature Review • Parallelization Aspects • Parallel strategies: Design and Implementation • Experiments and results • Comparisons • Contributions • Conclusion & Future work

Problem Focus • Real-World combinatorial optimization problems are complex, difficult-to-navigate and have multi-modal, vast search-spaces. • Iterative Heuristics work well for such problems but incur heavy runtimes. E.g., Stochastic Evolution

Problem Focus • Accelerating Performance – Reducing runtime with consistent quality and/or achieve higher quality solutions within comparable runtime. HOW? Parallel Computing in a Distributed Environment

VLSI Design Steps Generic CAD tools CAD subproblem level Behavioral Modeling and Simulation tool Behavioral/Architectural Functional and logic minimization, logic fitting and simulation tools Register transfer/logic Tools for partitioning, placement, routing, etc. Cell/mask

Placement Problem • The problem under investigation is the VLSI standard cell placement problem • Given a collection of cells or modules, the process of placement consists of finding suitable physical locations for each cell on the entire layout • Finding locations that optimize given objective functions (Wire-length, Power, Area, Delay etc), subject to certain constraints imposed by the designer, the implementation process, layout strategy or the design style

Iterative heuristics • The computational complexity increases as the module count on chip increases. • For example, Alpha 21464 (EV8) "Arana" 64-bit SMT microprocessor has a transistor count of 250 million. • Considering brute force solutions, a combinatorial problem having just 250 modules require 250 factorial (3.23e492) combinations to be evaluated. • This is where the iterative heuristics come into the play.

Iterative heuristics • Iterative heuristics have proven remarkably effective for many complex NP-hard optimization problems in industrial applications • Placement • Partitioning • Network optimization • Non-linear systems’ simulation

Iterative heuristics • Characteristics • Conceptually simple • Robust towards real-life complexities • Well suited for complex decision-support applications • May be interrupted virtually at any time • Storage of generated solutions

Iterative heuristics • Following are the five dominant algorithms that are instance of general iterative non-deterministic algorithms, • Genetic Algorithm (GA), • Tabu Search (TS), • Simulated Annealing (SA), • Simulated Evolution (SimE), and • Stochastic Evolution (StocE). • In our research, Stochastic Evolution is used for VLSI standard cell placement.

Stochastic Evolution (StocE)(Historical Background) • Stochastic Evolution (StocE) is a powerful general and randomized iterative heuristic for solving combinatorial optimization problems. • The first paper describing Stochastic Evolution appeared in 1989 by Youssef Saab.

Stochastic Evolution(Characterstics) • It is stochastic because the decision to accept a move is a probabilistic decision • Good moves are accepted with probability one and bad moves may also get accepted with a non-zero probability. • Hill-climbing property. • Searches for solutions within the constraints while optimizing the objective function.

Stochastic Evolution(Algorithm) • Inputs to the algorithm: • Initial valid solution, • Initial Range variable po, and • Termination parameter R.

Stochastic Evolution(Algorithm)

Stochastic Evolution(Application) • Applied on VLSI placement problem. • Cost Function • A multi-objective function that calculates the wire-length, power and delay. • The three objectives are combined and represented as one objective through fuzzy calculations.

Stochastic Evolution(Cost Functions) • Wire-length Calculation

Stochastic Evolution(Cost Functions) • Power Calculation • Delay Calculation

Stochastic Evolution(Cost Functions) • Fuzzy Cost Calculation is the membership of solutionx in fuzzy set of acceptable solutions. for j = p,d,l are the membership values in the fuzzy sets within acceptable power length, delay and wire-length respectively. is a constant in the range [0,1].

Motivation & Literature Review • With the proliferation of parallel computers, powerful workstations, and fast communication networks, parallel implementations of meta-heuristics appear quite naturally as an alternative to modifications in algorithm itself for speedup. • Parallel implementations do allow solving large problem instances and finding improved solutions in lesser times, with respect to their sequential counterparts.

Motivation & Literature Review • Advantages of Parallelization • Faster Runtimes, • Solving Large Problem Sizes, • Better quality, and • Cost Effective Technology.

Motivation & Literature Review • Efforts have been made in parallelization of certain heuristics for cell placement which produced good results but as far as we know no attempts have been made for Stochastic Evolution • Parallelization of Simulated Annealing • E.H.L. Aarts and J. Korst. Simulated annealing and Boltzmann machines. Wiley, 1989. • R. Azencott. Simulated annealing: Parallelization techniques. Wiley, 1992. • D.R. Greening. Parallel simulated annealing techniques, Physica.

Motivation & Literature Review • Parallelization of Simulated Evolution • Erick Cant-Paz. Markov chain models of parallel genetic algorithms. IEEE Transactions On Evolutionary Computation, 2000. • Johan Berntsson and Maolin Tang. A convergence model for asynchronous parallel genetic algorithms. IEEE, 2003. • Lucas A. Wilson, Michelle D. Moore, Jason P. Picarazzi, and Simon D. San Miquel. Parallel genetic algorithm for search and constrained multi-objective optimization, 2004. • Parallelization of Tabu Search • S. M. Sait, M. R. Minhas, and J. A. Khan. Performance and low-power driven VLSI standard cell placement using tabu search, 2002. • Hiroyuki Mori. Application of parallel tabu search to distribution network expansion planning with distributed generation, 2003. • Michael Ng. A parallel tabu search heuristic for clustering data sets, 2003.

Motivation & Literature Review • Literature survey of parallel Stochastic Evolution reveals the absence of any research efforts in this direction. • This lack of research at a time presented a challenging task of implementing any parallelization scheme for Stochastic Evolution as well as a vast room for experimentation and evaluation. • Parallel models adopted for other iterative heuristics were also studied and analyzed.

Sequential Flow Analysis • To proceed with StocE parallelization, the sequential implementation of Stochastic Evolution was first analyzed. • Analysis was carried out using the Linux gprof by profiling the sequential results.

Sequential Flow Analysis • The result of sequential code analysis

Sequential Flow Analysis

Parallelization Issues • The three possible domain for any heuristic parallelization are as follows • Low level parallelization • The operations within an iteration can be parallelized. • Domain Decomposition • The search space (problem domain) is divided and assigned to different processors. • Parallel Search • Multiple concurrent exploration of the solution space using search threads with various degrees of synchronization or information exchange.

Parallelization Issues • Based on profiling results a simple approach would thus be to parallelize the cost functions, i.e., Parallel Search strategy. • In distributed computation environment, communication carries a high cost.

Parallelization Issues • For Simulated Annealing, low-level parallelization gave poor speedups, i.e., maximum speedup of three with eight processors. • StocE also invokes the cost function calculations after each swap in PERTURB function which makes it computationally intensive like Simulated Annealing. • Thus the approach may prove well for StocE in shared memory architectures but is not at all well suited in distributed environment.

Parallelization Issues • For StocE parallelization, all the following parallelization categories were evaluated while designing the parallel strategies • Low-Level Parallelization, • Domain Decomposition, and • Multithreaded or Parallel Search.

Parallelization Issues • Thorough analysis of StocE's sequential flow combined with the profiling results led to the following conclusions. • Any strategy designed to parallelize StocE in a distributed computing environment should address the following issues: • Workload division, while keeping the Algorithm's sequential flow intact, • Communication overhead should be minimal, • Remedy continuously the errors introduced due to parallelization • Avoid low-level or fine grained parallelization

Parallel strategies: Design and Implementation • Broadly classifying the designed parallel models • Asynchronous Multiple Markov Chains (AMMC) • Row Division • Fixed Pattern Row Division • Random Row Division

Asynchronous Markov Chain (AMC) • A randomized local search technique that operates on a state space. The search proceeds step by step, by moving from a certain configuration (state) Si to its neighbor Sj, with a certain probability Prob(Si,Sj) denoted by pij. • AMC approach is an example of Parallel Search strategy.

Asynchronous Markov Chain (Working) • A managing node or server maintains the best cost and placement. • At periodic intervals, processors query the server and if their current placement is better than that of the server, they export their solution to it, otherwise they import the server's placement. • This removes the need for expensive synchronization across all processors. • The managing node can either be involved in sharing computing load with its own searching process or can be restricted to serving queries. • For a very small number of processors, the server may also be involved with clients in searching process, but in a scalable design, the server is better off servicing queries only.

Asynchronous Markov Chain (AMC)Master Process

Asynchronous Markov Chain (AMC)Slave Process

Fixed Pattern Row Division • A randomized search technique that operates the search process by work division among the candidate processors. • Row-Division is an example of Domain Decomposition strategy.

Fixed Pattern Row Division • More promising when compared to AMC since it ensures the reduction in effective load on each working node. • Fair distribution of rows among processors. • Each processor is assigned with two sets of rows and is responsible for swapping cells among them. • Set of rows keeps alternating in every iteration. • Not much communication overhead, since the processors do not need to exchange information or synchronize during iterations.

Fixed Pattern Row Division Iteration - i Iteration - i+1

Fixed Pattern Row Division R1 P1 R2 R3 R4 R5 P2 R6 R7 R8 R9 R10 P3 R11 R12 Iteration ‘i’

Fixed Pattern Row Division R1 P1 R2 R3 R4 R5 P2 R6 R7 R8 R9 R10 P3 R11 R12 Iteration ‘i+1’

Fixed Pattern Row Division R1 R4 P1 R7 R10 R2 P2 R5 R8 R11 R3 P3 R6 R9 R12 Iteration ‘i+1’

Fixed Pattern Row DivisionMaster Process

Fixed Pattern Row DivisionSlave Process

Random Row Division • Variation of Fixed Pattern Row Division. • Instead of having fixed two sets of non-overlapping rows, the master processor generates the non-overlapping rows randomly. • Set of rows broadcasted to the slaves in each iteration. • The apparent advantage of this scheme over the previous is the randomness in rows which ensures that none of the rows remains with any specific processor throughout the search process.

Random Row DivisionMaster Process

Random Row DivisionSlave Process

Experiments and Results • The parallel implementation was tested on different ranges of ISCAS-89 benchmarks circuits. • These benchmark circuits cover set of circuits with varying sizes, in terms of number of gates and paths. • Cluster Specs: • Eight node generic workstations, • Intel x86 3.20 GHz; 512 MB DDR RAM, • Cisco 3550 switch for cluster interconnect, • Linux kernel 2.6.9, and • MPICH ver 1.7 (MPI implementation from Argonne laboratories).

Experiments and Results

Experiments and Results • Parallel Asynchronous Markov Chain Strategy

Parallelization of Stochastic Evolution for Cell Placement

Parallelization of Stochastic Evolution for Cell Placement

Presentation Transcript

Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Evolution of Coding Organisms in Stochastic Radiation Environments

Reporting of Standard Cell Placement Results

Parallelization Issues for MINLP

History Evolution Of Cell Phones

Consistent Placement of Macro-Blocks Using Floorplanning and Standard-Cell Placement

Parallelization

Early Earth Evolution of the Cell

Multiobjective VLSI Cell Placement

Congestion Driven Placement for VLSI Standard Cell Design

Parallelization of urbanSTREAM

Parallelization of RHSEG

Parallelization of RHSEG

Fast Force-Directed/Simulated Evolution Hybrid for Multiobjective VLSI Cell Placement

Evolution of the eukaryotic cell

Placement Cell

Reasons for parallelization

Evolution-based Standard Cell Placement

Parallelization Issues for MINLP

Reporting of Standard Cell Placement Results

Multiobjective VLSI Cell Placement