200 likes | 364 Vues
This presentation discusses the development of a full transistor-level system simulator at the University of California, San Diego. It addresses the motivations driven by Moore’s Law, focusing on challenges and limitations of existing commercial simulators. Key topics include multigrid solver techniques for improved convergence, activity-driven analysis for circuit latency, and methods for incorporating nonlinear transistor models effectively. The findings demonstrate significant improvements in handling large-scale circuit simulations, which are crucial as technology advances toward millions of transistors.
E N D
SPICEDiegoA Transistor Level Full System Simulator Chung-Kuan Cheng May 27,2004 Computer Science & Engineering Department University of California, San Diego
Outline • Motivation • Status of Commercial Simulators • Solver Engine: Multigrid Review • Activity Driven Analysis • Nonlinear Transistor Devices • Experimental Results • Conclusion
Motivation • Moore’s Law • # transistors and clock frequency double /2 years 1984: 100K transistors, 10M Hz 2003: 100M transistors, 5G Hz • More Challenges for Circuit Simulator • Electrical Coupling (C&L): interconnect delay, crosstalk, voltage drop, ground bounce • Short Channel Devices • SPICE • Cannot perform large chip analysis, capacity limit to 50,000 transistors ( O(n2) complexity )
Status of Commercial Simulators Partition Based Simulation • Most commercial fast spices (HSIM, Power Mill / Time Mill / Rail Mill, NaroSim, RedHawk, Ultrasim) • Advantage • Smaller Matrix Size • Easy to apply varies time step to different subcircuits • Disadvantage • Hard to catch coupling effect between subcircuits • Device Model ignoring Miller’s effect • Potential convergence problem • Accuracy not guaranteed.
High Complexity Basic Iterative Method Slow Convergence Multigrid Method Conjugate Gradient Motivation Direct Method
Multigrid Review • Error Components • High frequency error (More oscillatory between neighboring nodes) • Low frequency error (Smooth between neighboring nodes) • Basic iterative methods only efficiently reduce high frequency error • Basic Idea of Multigrid • Convert hard-to-damp low frequency error to easy-to-damp high frequency error
Gauss Elimination A2•X2=b2 Interpolation Restriction A1 •X1=b1 Smoothing Smoothing Interpolation A0 •X0=b0 Restriction Smoothing Smoothing 3 2 6 2 1 4 4 2 1 3 1 5 Multigrid : A Hierarchy of Problems Hierarchically, all error components smoothed efficiently
Geometric vs Algebraic • Geometric multigrid method • Require Regular Grid Structure • Algebraic Multigrid • Coarsening Relied on Matrix, • No requirement of regular grid structure • Coloring scheme • Error Smoothing Operator: Gauss-Seidel • Interpolation • Small residue but the error decreases very slowly. • In practice, we use only coarse node at the RHS of above formula to approximate error correction of fine node.
System Equation: Apply Trapezoidal Rule: Convergence of Multigrid Method • The matrix needs to be symmetric positive definite • Key to the convergence of iterative method SOR, PCG, Multigrid • RC network • The system matrix is S.P.D(symmetric positive definite) LHS matrix is S.P.D, it is also valid for B.E. and F.E formulae
System Equation: Apply Trapezoidal Rule: Convergence of Multigrid Method • RLKC network The LHS matrix is not S.P.D, but can be converted to S.P.D matrix The LHS matrix of first equation is now S.P.D. Similar for B.E and F.E L-1 is called K / Susceptance / Reluctance Matrix
Why Algebraic Method • No Requirement of Regular Grid • Works for general circuits. • Circuit with Mutual Inductance • Adjacency graph of the converted system matrix is different from circuit topology. Converted System Matrix:
Activity Driven Analysis • Circuit Latency & Multi-rate Behavior • Spatial Latency • Only portions of the circuit is active at any given time 80%-90% of total gates are non-switching • Temporal Latency A given portion of circuit is not always active. • Multi-rate Behavior • Varies time constant multi-rate behavior • How to utilize ? • Circuit Partitioning: common technique used in timing simulators.
Adaptive Smoothing • HOW? • Only active regions get error smoothed • Varies “time step size” • inactive subcircuits may only get chance to have error smoothed at finest level once every several time points • WHY? • Error smoothing operation at finer levels takes most of the iteration time • Smoothing at coarser level is sufficient for inactive portions of circuit Adaptive smoothing at finest grid level
Incorporating Transistor Devices (1) • Direct Simulation of Transistor Devices Makes Linear Solver Diverge • Conventional Method: Abstract Device as Current Waveform, Ignore the Interaction with VDD/VSS. • How to include Transistor Devices? Inside the inner most Newton-Raphson linearization iteration, decouple the linear and nonlinear interface, replaced by Norton Equivalent Circuit.
Incorporating Transistor Devices (2) • Advantage • Possible to use fast linear matrix solver (require symmetric positive definite matrix properties , which is not hold for nonlinear transistors) • Less Memory Requirement: Matrix for nonlinear components can be generated on the fly. Possible to run large case with millions of transistors. • Decouple linear-nonlinear only at the inner most Newton-Raphson iteration of transient analysis. Accuracy guaranteed via linear-nonlinear iteration (typically 4 ~ 10 iterations)
Experimental Results (1) • Test Case #1 • Board / Packaging / Chip Power Network • Fully coupled packaging inductance • 60k elements, 5000 nodes. • Spice failed • Our tool • Less than 2 minutes chip board Power Supply
Experimental Results (2) • Power/Clock network case. • 30k nodes, 1000 transistor devices • Spice run time 41323s • Our Run time: 1859s 22x speedup
Experimental Results (3) • 1K cell design • 10,286 nodes • 751 Gates • Spice run time: 2121s • Our run time: 26.1s 8x Speedup • 10K cell design • 123,590 nodes • 7,481 Gates • Spice Run time: 44293s • Our run time: 3572s, 12.4x Speedup
Why SPICEDiego is better? • SPICEDiego: fast accurate transistor level circuit simulator • Powerful Matrix Solver Engine • Transistor devices. • Capable of capturing coupling effects. • Device Model including Miller’s effect • Less Memory Requirement (no LU factorization, dose not save matrix for transistors) • Application • interconnect delay • Crosstalk • voltage drop, ground bounce • simultaneous switching noise
Conclusion • Moore’s Law demands an extraordinary fast circuit simulator with guaranteed accuracy. • Current tools cannot cover Miller’s effect, mutual inductance. There is no bound on the error either. • SPICEDiego offers a solution for circuit designers