Rob A. Rutenbar Director, MARCO Center for Circuit & System Solutions

Design at the Top of the Semiconductor Foodchain:How Manufacturing Challenges Below 90nm Impact Circuits & Systems Rob A. Rutenbar Director, MARCO Center for Circuit & System Solutions Professor of ECE, Carnegie Mellon rutenbar@ece.cmu.edu

About This Talk • Some background on the MARCO FCRP program • What all these acronyms are… • …and why you might want to know about them • A brief look at work in the C2S2 “Circuits” Center • How mfg challenges at highly scaled nodes percolate up to the foodchain • What causes circuit designers nightmares, and what we’re doing about it.

C2S2, MARCO, FCRP, etc:A Little Bit of Background and Context

FCRP: Focus Center Research Program • Vision: National research centers in semiconductor technology • Multiple-university teams & large-scale efforts (~$10M/center/year) • Long-range research horizon • Focus on discovery: where evolutionary R&D may not find solutions The focus center program is designed to create a nationwide multi-university network of research centers that will keep the United States and U.S. semiconductor firms at the front of the global microelectronics revolution.” • Craig R. Barrett • President and CEO, Intel; • (Former) Chair, • Semiconductor Technology Council

n F E A recompete restart Focus Center Research Program: Timeline • First centers chartered in 1999; five centers in operation today • Total program currently ~$25M/year “Systems” “Interconnect” “Circuits” “Devices” “Materials” 1999 2001 2003 Aug 2004

n F E A US DOD MARCO: Microelectronics Advanced Research Corp. • MARCO coordinates FCRP Centers, funding, industry/govt interfaces Funding MARCOGoverningCouncil Management Centers Universities . . . ~30 schools

systems structures materials physics FCRP Centers: Designed To Target Entire “Semiconductor Foodchain” integrated products application HW/SW • Pushing CMOS to its limits—and beyond • Containing the growing cost of complexity • Driving down cost of design & verification • Containing latency & power of interconnect • Overcoming the tyranny of KT/q system software logic / architecture circuits devices structures materials physics

logic / architecture circuits devices C2S2: Center for Circuit & System Solutions • C2S2 core competency: Circuits • Technology scaling impacts • Analog, digital, RF, MEMS ckts • Some photonics, too • Assoc. design tools & methodologies • Logistics • CMU is lead school • Now 12 universities • ~47** faculty, 67 grad students

The C2S2 Research Team • Executive team • Research team Rob Rutenbar CMU, Director Art DavidsonCMU, Exec Dir Bob BrodersenBerkeley Mark HorowitzStanford Wen-Mei HwuIllinois Teresa MengStanford Larry PileggiCMU Charles SodiniMIT

How will we do circuit design with tomorrow’s different, difficult devices? logic / architecture circuits devices C2S2: Doing Circuits in Highly Scaled Technologies Coping with scaling

…and, how do we deal withthe “conscientious objectors”…?

How will we approach circuits that don’t want to scale? Circuits that prefer—for $$, or for performance—a different technology platform? V ITRS-03 Vsupply V analog range V logic / architecture V digital V circuits (nm) devices C2S2: Doing “Reluctant” Circuits at Scaled Nodes - + 10nm

About This Talk • Some background on the MARCO FCRP program • What all these acronyms are… • …and why you might want to know about them • A brief look at work in the C2S2 “Circuits” Center • How mfg challenges at highly scaled nodes percolate up to the foodchain • What causes circuit designers nightmares, and what we’re doing about it.

What about delay? Past expectation Next process node is faster We rely on this for new designs New problems Yes, it’s faster… Chip-scale, size of logic hurts Speed of light is a big limiter What about mfg variability? Past expectation Next process node is worse …but we’re smart, we’ll manage New problems It’s a lot worse than the last node Cannot pretend it’s deterministic Cannot just look at a few “corners” Two Different Scaling-Related Problems

Architectures Systems Basic blocks Fundamental circuits Materials & structures View from the “Top” – Circuits & Systems What’s happening with circuits & systemswith CAD & methodologyto help design in scaled technologies? Devices & wires

~6 ps ~6 ps Let’s Look at Delay… • Unfortunate fact: despite a century of physics funding • … “c” has not budged —not even 1 m/s !

Local wires Get shorter with scaling scale Global wires Don’tget shorter with scaling(that’s why they’re global…) Delay: Two Different Flavors for Wires Local wires ~ constant complexity,span constant # gates Global wires ~ constant length

Mid-layer metals 30x delay increase Upper-layer metals 40x delay increase Delay: Global Wire Trends Optimally buffered global wires that span 5mm (roughly ¼ die) • 30x-40x delay penalty over nine process generations • Cannot contract global communications (they're global…) CourtesyMark Horowitz & Ron Ho, Stanford

Consequence: More MHz Not Necessarily Better Anymore This is SpecInt/MHz, a measure of CPU performance / clock speed Curve is flattening, more MHzisn’t paying off anymore… Courtesy Mark Horowitz, Stanford

Recent big Intel news: No more single-core CPUs Two high-profile designs abruptly canceled Future is multiple CPUs on a single chip Consequence: 1 Chip  1 CPU Going Forward “Intel Corp. has cancelled its single-core processor development efforts… and will move to dual-core designs across the mobile, desktop, and server markets…”

CPU CPU CPU CPU scale Memory CPU CPU CPU CPU Make most of the wires local Can still clock a small CPU fast Use parallelism smarter Idea: More CPUs Instead of More MHz Local wires Still OK, still manageable CPU Global wires Bad even with bufferingor multi-cycle delays

Next Problem: Manufacturing Variability

Historically, how have we coped? By hiding as much as possible Behind logic/memory libraries Behind circuit and shapes rules Behind design methodologies Manufacturing Variability: A Little History Library abstractions Qualification & characterization Design rules…

At Nanoscale: Predictability  (Chip Variability)-1 ASIC library abstraction broken:doesn’t “hide” the details anymore as we scale below ~65nm Cu thickness distrib Cu thickness histogram Correlated randomvariations hit ckt level Global effects Demise of context-free layout design rules Local printability problems

So, How Do We Cope With Growing Variability? Three broad kinds of solutions • Model it accurately, manage it early, inside CAD tools • Minimize it aggressively, via smarter ASIC chip architectures • Measure it on the fly, calibrate for it—like analog has had to do

Statistical static timing analysis • Gate delays are statistical • Signal arrival times are statistical • Gates and signals are correlated • Correlations are both local and global • Want distribution of delay at output • Statistical interconnect delay analysis • R, L, C parameters are statistical • Based on mfg variations in BEOL fab • R, L, C parameters are correlated • Correlations are both local and global • Want distribution of delay at outputs “Model It”: Pulling Statistics Up into CAD

Wrong way: Monte Carlo trials with existing CAD analysis tools Cannot afford time to run 1000s of randomly parameterized samples Right way: pull the statistics up directly into the analysis engines Represent key circuit/interconnect quantities in a statistical form Key Ideas: Direct Manipulation of the Statistics D1 D2 DN S / N

Represent all statistical quantities as correlated intervals • Result is interval poles/zeros Root loci “histograms” Each parameter is a range, nota scalar now Pole/zero complex plane • Recast the numerical ‘recipes’ for linear model order reduction to use intervals instead of scalars • Transform back to delay distrib % P zeros P poles Delay Example: Interval-Valued Interconnect Models

Ex: 123-elem RLC Wire, 5%-Global 30%-Local Variation 8th order reduction, 4 dominant poles Monte Carlo results 8th order reduction, 4 dominant poles Interval-valued predictions Plots show perspective view of complex plane (bottom & right axes) with pole histograms shown shaded blue (left axis, 10,000 interconnect samples) Courtesy James D. Ma, CMU

Delay PDF and CDF of the Same Example PDF of Interconnect Delay CDF of Interconnect Delay • Very early research result—but accuracy is promising, and speedup is currently ~20X over simplistic Monte Carlo Full Monte Carlo Interval-valued predictor Interval-valued predictor Full Monte Carlo Courtesy James D. Ma, CMU

Yesterday’s designs A “regular fabric” for tomorrow Or, maybe this pattern…? “Minimize It” Attack: Make Variability Small… • Starting from basic mfg processes, from shapes-level layout, thru circuits, thru logic, thru interconnect arch: extremely regular • Tries to minimize impact of short, medium, long range mfg variations • Of course, it also breaks all our design tools and flows, too…

Via Patterned Gate Array Uses only 4 masks to define totalapplication-specific interconnect Logic tiles, and the interconnect, are totally regularized chip-wide Like “gate array” but better, and informed by ~20 years of FPGAs Example: Replace FPGA switchblock of 36-120 devices with 8 mask-config vias Goal Minimize variations (eg, CMP) at all length scales on chip Make logic and interconnect very predictable for designers Cu thickness distribution Example: CMU VPGA Architecture

Example: Manufacturability of VPGA BEOL Cu Dishing (M4) Final Post-CMP Cu Thickness (M4) M4 Density of CMU VPGA FPU • Reduced CMP effects • Copper dishing < 40Å • Post-CMP Copper thickness variation is less than 2-3% • Highly promising as a manufacturable ASIC replacement structure Plated Thickness (M4) Oxide Erosion (M4) Courtesy Duane Boning (MIT) & Larry Pileggi (CMU)

“Measure It”: Circuits to Measure & Adapt • With scaling, not only are transistors getting worse, … but the neighborhoods they live in getting noisier • What we worry about • Behavior of chip-scale interconnects like clock and power distribution • Ability to predict worst-case behavior for robust design • Ability to understand data-driven noise problems, and to reduce them • Big idea • Some things you can just design for “up front” • Increasingly, we may need to add circuits that measure & adapt on the fly

Ex: Supply Noise Measurement Circuits • To measure autocorrelation, just need 2 samplers with fine timing control. • Sampling switches are only component required to have high bandwidth. • High-resolution, on-chip ADC’s to minimize additional noise and allow measurement circuits to hook up to scan chain. • VCO-based ADC • VCO acts as V-to-f, clock edge count gives digital estimate of f. • Averaging improves noise tolerance. • Calibration relaxes linearity and offset requirements.

Rambus 0.13m design Demonstration of concept Noise floor < 300mV rms Measured Vdd and VddAnalog Measurements verify cyclostationarity: 1GHz noise at t2 – but not at t1! Link runs at 1GHz for this data-rate; high link activity at t2, relatively quiet at t1. Noise injected from ASIC core PSD(t2) PSD(t1) Result: 10Gb/s Rambus Link Measurement • This is still 130nm—but we think idea holds as we scale aggressively • We also put some of these ckts on a next-gen Itanium™ chip (Courtesy E. Alon, V. Stojanovic, Mark Horowitz, Stanford)

Example: Massively parallel ADCs Thousands of small ADCs DSP combines data adaptively Ignore (give low weight) to faulty ckts Idea: Massive time-interleaving Relaxes speed req’t of each path Allows device bias in the optimum power efficiency/gain region… …at the knee of weak inversion. In design: 12b 600 MS/s self-calibrated ADC in 0.18 mm, 128 channels – 100mW Next: 8b 20GS/s with 1000 channels Looks promising as a scalable ADCarchitecture for below 90nm Analog Too: Calibrate & Adapt (…or Die) (Courtesy H-S Lee, MIT)

Exotic interconnect Novel devices Radical new ckts / tools Radical architectures Summary: FCRP Innovating Across Whole Foodchain

A Lot More Work To Do At “Top” of Foodchain (No shortage of problems, even up here, in the clouds) • New system architectures • For CPUs and for ASICs, to overcome wire delay & mfg variation limitations • New circuits & design methodologies • CAD tools that understand and optimize statistical models of interconnect • Circuits that measure interconnect problems and try to adapt to them

Many participants in the MARCO Focus Center for Circuit & System Solutions (C2S2) provided material for this talk I want to acknowledge them here More info on all these projects at www.fcrp.org www.c2s2.org Carnegie Mellon Prof. Larry Pileggi Prof. Andrzej Strojwas James D. Ma MIT Prof. Duane Boning Prof. H.-S. Harry Lee Stanford Prof. Mark Horowitz E. Alon Ron Ho V. Stojanovic Acknowledgements

Rob A. Rutenbar Director, MARCO Center for Circuit & System Solutions