270 likes | 482 Vues
System Roadmap. Andrew B. Kahng Core Pillar September 29, 2006 swamy@vlsicad.ucsd.edu sharma@vlsicad.ucsd.edu kambiz@vlsicad.ucsd.edu abk@ucsd.edu. Modeling Requirements for System-Level Living Roadmap. Core Pillar Requirements (ASV).
E N D
System Roadmap Andrew B. Kahng Core Pillar September 29, 2006 swamy@vlsicad.ucsd.edu sharma@vlsicad.ucsd.edu kambiz@vlsicad.ucsd.edu abk@ucsd.edu
Core Pillar Requirements (ASV) • Benefits of technology scaling can be sustained by migrating design process to system scaling paradigm design elements are IP blocks/processor cores as opposed to devices and standard cells • New system synthesis paradigms rely on accurate yet simple models of delay/area/power/cost trade-offs for parameterized design elements • Models of block-level design metrics should account for • Cost and impact of design techniques to cope with variability • Cost of hardware (in terms of design metrics) required for adaptivity/ resiliency • Goal: Synthesize and abstract the impact of low-level technology parameters (and their variabilities) on design metrics of system-level blocks
BEOL Stack Optimization (Nagaraj, TI) • Quality assessment of BEOL interconnect stacks • Inputs: Technology parameters (resistivity, ILD thickness etc.), geometry parameters (wire widths, pitches), Rent parameters • Stack quality assessment is required for blocks (instead of individual wires) such as data path elements, SoC components and processor cores • Outputs: Reports of trade-offs and models of stack performance metrics for system-level exploration • Existing wire length distribution models and interconnect performance metrics optimize stack metrics do not have a system-level view • Stack parameter exploration and optimization should be driven by “design-level” throughput and power considerations • E.g., area-normalized throughput and power density
Interconnect Library Modeling (Carloni) • Focus of design process is shifting from “computation” to “communication” • Device scaling and interconnect performance scaling mismatches are causing breakdown of traditional across-chip communication mechanisms • New techniques: wave pipelining, stateful repeaters communication and network centric approach for designs in future • Communication-driven design synthesis • System-level design requirements translated to communication mechanism between computational blocks analogous to classic synthesis process (design requirements translated to computational blocks) • Mapping stage involves association of communication apparatus (links, repeaters, buses, routers etc.) to high-level synthesis solution similar to technology mapping of standard cells to generic netlist • New synthesis methodology requires “characterized” interconnect library composed of links, repeaters routers etc. • Modeling/metrics SIG can provide models of latency, bandwidth, throughput, power (high-level metrics) based on thorough characterization of library elements based on device and process technology roadmaps.
Concurrent Theme Requirements (Keutzer) • Current technology extrapolation framework doesn’t allow study of impact of design choices on high-level parameters • E.g., what if a vector unit is added? What if local memory size is increased? what is the impact of architectural design choices on chip-level attributes? • Architectural exploration work requires models of design metrics that are within ~20% • There is a significant gap between numbers in ITRS and technology extrapolation frameworks (BACPAC/GTX) • Design space/architectural exploration based on rule/inference chains will run out of steam require models for higher-level design blocks • Specific questions: • What will be the size of an economical die in future nodes? • # RISC processors that can be implemented • # clock / power regimes (i.e., voltage islands) • Clock frequencies in future nodes • Power implications / trade-offs
Other Guidance • Questions from Intel mentors • How to model the reliability and the error rate of SRAM • How to embed technological variability and reliability issues into the system diagnosis • How to identify the ‘hot spots’ of a design • How to efficiently validate the design under variations • Other • What are impacts of variability on NOC? • NBTI power-law modeling (Purdue-TI)
The Challenge of System Projection and Design • What is impact of new technology on system macro parameters? • Execution speed, power consumed, latency, reliability, cost, … • What macromodeling will enable system-level optimization ? • System optimization : large block :: logic optimization : standard cell • “Large block” = microprocessor, memory, network, bus, … • Logic cell abstraction through 65nm WAS: size, power, delay • Block abstraction beyond 65nm MUST BE: much more • Cost and resource tradeoffs especially in the face of variability and reliability • From latency and bandwidth to flexibility and resilience • Scaling of future systems will be dominated by non-determinism • GSRC Modeling SIG: Toward System Scaling Theory
Towards Parameterized Scalable Macromodels • Low-level (device- or gate-level) models accurate but unusable for system-level exploration • Macromodels: • Estimate metrics such as delay, power, area, power/performance variability, reliability for higher-level blocks • Are scaleable to novel technologies • Are scaleable to different design styles, Vdd, Vth, etc. • Are parameterized by architectural parameters of higher-level blocks • Allow designers to: • Speculatively achieve highest performance given area, power budget • Explore reliability tradeoffs with area and power • Access system-level resiliency requirements • Develop robust designs
Use Model: Facilitate System-Level Exploration Instruction-Set or Cycle-Accurate Simulator Delay Macromodels Cycle Time Performance Power Macromodels Power • Optimizations enabled: • Evaluation for future technologies • Area-performance tradeoff • Power-performance tradeoff • Resilience requirements due to reliability and/or variability System-Level Design Area Macromodels Area Reliability Macromodels Vulnerable System Components Variability Macromodels Yield Determining components
Challenges in Macromodeling • Lots of high-level blocks, algorithms and design styles • Some identified blocks (cf. Gajski “Architecture SC” request): • Array structures: single- and multiple-port SRAMs, content-addressable memories, register files, reservation stations, renaming units, issue queues, branch target buffers, etc. • Complex logic blocks: adders, multipliers, dividers, vector blocks, normalization, rounding, etc. • IP blocks: encryption/decryption, JPEG/MPEG compression/decompression, CRC, etc. • On-chip communication: buses, NoCs (Polaris) • Clocking network • Lack of robust reliability and variability prediction
Parametric Yield Estimation and Optimization Variability Data Technology / Circuit Data Fmax Variability SER Macromodeling Statistical Clock Skew
Example: Carry-Lookahead Adder • Parameters: bit width, lookahead stages • Design styles: dynamic, static, pass-gate • Delay: carry generation for MSB slowest based on bit width and lookahead calculate hierarchy levels identify critical path project delay from gate-level delay projections (ITRS + BPTM) • Power: calculated using bit width and lookahead stages in terms of gates, projected using gate-level power • Area: similar to power • Reliability and variability projections from iTunes • All metrics calibrated with implementations for few parameters and technologies
Write Column logic Addr. Decoder MemoryCore Precharge, Read Column logic Example: Memory Array 6T Memory Cell Memory Array • Parameters: #bitlines, #wordlines, #ECC bits, etc. • Design styles: memory cell design, layouts, drive strength ratios, etc. • Delay: addr decoder delay + memory cell read/write delay + bitline mux delay project delay from gate-level delay projections • Power: CACTI, IDAP (uses wordline cap, bitline cap, precharge device cap/memory cell cap, #bit flips, etc.) • Area: memory cell area dominated, easy to predict & project • Reliability and variability projections from iTunes along with #ECC bits
Why New Models ? • Classic scaling laws are not aware of the implications of scaling • Models of scaling do not represent system constraint-driven design of future • Hardware overheads for resiliency, power, adaptability and tuning go against scaling performance implications • Models of design infrastructure in future nodes should understand implications of circuit and interconnect unreliability • Static variations – process variations, NBTI • Dynamic variations – temperature, SEU, EM • Existing models are too low-level to be usable in system design scenario even with inference chain analysis (e.g., GTX)
Technology Scaling : Interconnect Implications • Vdd scaling slowing Delay scaling slowing down • Subthreshold slope limit • Vt scaling has Ioff consequences • Power concerns push Vdd down • Scaling interconnect dimensions • Wire delays become worse • Huge performance penalties (because devices also are not as fast) • Global wires are the worst victims • Repeaters are of limited help Significant area and power penalty Global communication a costly overhead Image source: Prof. Saraswat, Stanford Univ.
Design Impact of Interconnect (non) Scaling • Repeater-driven interconnect is energy, congestion, performance-limited • Maximum reachable distance in a clock cycle = ? (low-swing, differential, …) • Bandwidth vs. latency envelope = ? (encoding, power, signal reliability, …) • Latency is not the only problem: temperature, power density and EM Temperature of global interconnect rises with low-k performance impact • Future NoC interconnections should address performance/thermal/reliability issues at fabric design, and design optimization phases • This work search for optimal NoC interconnect stack parameters
New Directions for System-Level Interconnects • Wire pipelining, state-aware repeaters • Methodologies ? • Globally asynchronous, locally synchronous • Latency insensitive • Design paradigm shift from “computation-driven” to “communication-driven” • Computation is no longer the bottleneck • Computation is cheap exploit computation infrastructure to develop efficient communication mechanisms • Designs transforming into distributed systems • Interconnection network performance key for system performance power, bandwidth, and throughput envelope constrained by elements in the system
Design-Centric Modeling of Interconnects • Traditional modeling techniques consider individual wires for characterization / optimization of interconnect performance metrics • no notion of design specificity • Multi-core / NoC / communication-system design exploration and synthesis methodologies should consider interconnect fabric in the context of design design-centric modeling of interconnects • Modeling design fabrics: • Design fabrics for communication-based design: nodes, interconnect • Global interconnect of data path elements, processor cores • Point-to-point/broadcast buses, links, switch/multiplexer interfaces, and routers • Metrics for performance analysis of design fabrics • Conventional design metrics: performance, power, area • New metrics needed • Should reflect system-level power/performance characteristics
Interconnect Metrics (IM) • Traditional interconnect performance metrics • Signaling: Delay, power, bandwidth, noise, crosstalk, area • Clocking: Skew/jitter, power, slew rate, area • Power distribution: supply fidelity • Reliability: electromigration • Some recent metrics • Interconnect architecture rank: inclusive metric combining delay, routability, area • Bandwidth/Energy: signifies throughput as well as energy spent in signaling with a specific bandwidth • Problems with existing metrics: • No notion of design specificity interconnect stack performance is heavily dependent on a design’s wire length distribution • IM optimization based on canonical test structures is not valid for all wiring topologies sub-optimal results
BEOL Stack Metrics • Design-centric BEOL interconnect stack architectures global interconnect topologies for NoC • Macro-block configurations may vary in # of wires, geometric parameters (width, spacing) and link structure • Stack metrics: • Traditional metrics can be adapted to macro-blocks • New metrics: area-normalized throughput, power density Macro blocks Bus – (1) Curves – (2) Cross w/o contacts – (3) Cross w/ contacts – (4)-(6) Interconnect library Source: Addino et al.
Recall: TI Request for BEOL Stack Optimization • WANTED: BEOL stack optimization tool (Nagaraj, TI) • Inputs: • Stack options: thickness, pitch, dielectric materials, process variations • Class of representative designs at RT-level: logic-only, logic+memory, datapaths, CPU cores, SoC • Cell and IP library • Outputs: • Concise summary of tradeoffs for different BEOL stack options • Area, clock and power distribution (on die and package), performance, reliability, cost
BEOL Stack Optimization • Stack optimization: search for the best set of macro block parameters which yield optimum points for performance metrics • Methodology • Step 1: Construction of interconnect library • Step 2: Electrical characterization of library elements for different choices of geometric, user-specified parameters(Addino et al, PATMOS’03) • Step 3: Computation of performance metrics • Bit transfer rate across cross-section of the elements • Power density per unit area • “Traditional metrics” – latency, bandwidth, noise etc. • Best solutions for different performance metrics may be mutually conflicting intelligent search in parameter space to obtain optimal solution
Interconnect Characterization for Communication-based Design • BEOL stack exploration: initial step toward interconnect fabric design and optimization for MPSoC, CMPs and heterogeneous systems best interconnect stack for specific communication objective • Novel interconnect metrics • Capture technology scaling • Capture system scaling (design constraints): consider impact of memory hierarchy, interface timing, power, signal swing levels • Interconnect characterization: create models of performance metrics for interconnect structures • E.g. Which structure gives the best throughput per area for a given performance constraints ? • How does power density change with bus parameters, power constraints ? • Probabilistic, continuum/hierarchy of models • Dial effort/information vs. accuracy • “N+1 N+2 shrink”; “Side + Ngate + Rent p”; “run Architecture Compiler”; … • Dial guardband vs. certainty
Conclusions • Existing BEOL stack analysis/optimization oblivious of system design constraints • Individual wires no longer sufficient for performance analysis move to higher levels of abstraction • Communication-driven design synthesis paradigm drives system-level interconnect analysis • Standalone metrics (e.g., delay, power, bandwidth) cannot give complete picture of performance • new metrics: area-normalized throughput, power density • Explore parameter space to efficiently obtain stack parameters for optimum performance