1 / 283

Design challenges in sub-100nm high performance microprocessors

Design challenges in sub-100nm high performance microprocessors. Nitin Borkar, Siva Narendra, James Tschanz, Vasantha Erraguntla Circuit Research, Intel Labs nitin.borkar@intel.com siva.g.narendra@intel.com james.w.tschanz@intel.com vasantha.erraguntla@intel.com. ®. Outline.

hua
Télécharger la présentation

Design challenges in sub-100nm high performance microprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design challenges in sub-100nm high performance microprocessors Nitin Borkar, Siva Narendra, James Tschanz, Vasantha Erraguntla Circuit Research, Intel Labs nitin.borkar@intel.com siva.g.narendra@intel.com james.w.tschanz@intel.com vasantha.erraguntla@intel.com ®

  2. Outline • Section 1: Challenges for low power and high performance (90 mins) • Historical device and system scaling trends • Sub-100nm device scaling challenges • Power delivery and dissipation challenges • Power efficient design choices • Section 2a: Circuit techniques for variation tolerance (90 mins) • Short channel effects • Adaptive circuit techniques for variation tolerance

  3. Outline (contd.) • Section 2b: Circuit techniques for leakage control (90 mins) • Leakage power components • Leakage power prediction • Leakage reduction and control techniques • Section 3: Full-chip power reduction techniques (90 mins) • Micro-architecture innovations • Coding techniques for interconnect power reduction • CMOS compatible dense memory design • Special purpose hardware • Design methodologies & challenges for CAD

  4. Section 1 Challenges for low power and high performance

  5. Moore’s Law on scaling

  6. Scaling of dimensions 1 1 1 0.49 0.7 0.7

  7. Transistors on a chip 1000 2X growth in 1.96 years! 100 Pentium 4 Pentium III 10 Pentium II Pentium Transistors (MT) 486 1 386 0.1 286 8086 8085 0.01 8080 8008 4004 0.001 1970 1980 1990 2000 2010 Year Transistors on Lead Microprocessors double every 2 years

  8. Die size growth 100 Pentium 4 Pentium III Pentium II Pentium 486 Die size (mm) 10 386 286 8080 8086 ~7% growth per year 8085 8008 ~2X growth in 10 years 4004 1 1970 1980 1990 2000 2010 Year Die size grows by 14% to satisfy Moore’s Law

  9. Frequency Doubles every 2 years 10000 Pentium 4 1000 Pentium III Pentium II 100 Pentium Frequency (Mhz) 486 386 10 8085 286 8086 8080 1 8008 4004 0.1 1970 1980 1990 2000 2010 Year Lead Microprocessors frequency doubles every 2 years

  10. Performance Applications will demand TIPS performance

  11. Power Future 100 Pentium 4 Pentium III Pentium 10 486 286 8086 Power (Watts) 386 8085 1 8080 8008 4004 0.1 1971 1974 1978 1985 1992 2000 Year Lead Microprocessors power continues to increase

  12. Obeying Moore’s Law... 10000 1.8B 1000 900M 425M 100 200M Pentium 4 10 Pentium II Transistors (MT) Pentium 486 1 386 0.1 286 8086 8085 0.01 8080 8008 4004 0.001 1970 1980 1990 2000 2010 Year 200M--1.8B transistors on the Lead Microprocessor

  13. Vcc will continue to reduce 10.00 1.35 1 1.00 Supply Voltage (V) 1.15 0.9 0.10 1970 1980 1990 2000 2010 Year Only 15% Vcc reduction to meet frequency demand

  14. Constant Electric Field Scaling 5 4 3 Oxide Field (MV/cm) 2 1 0 1.5 1.2 1.0 0.8 0.6 0.35 0.25 0.18 Technology Dimension (um)

  15. Active capacitance density Active capacitance grows 30-35% each technology generation

  16. Power will be a problem 100000 18KW 5KW 10000 1.5KW 500W 1000 P4 100 P III Power (Watts) Pentium 486 286 10 386 8086 8085 8080 1 8008 4004 0.1 1974 1978 1985 1992 2000 2004 2008 1971 Year Power delivery and dissipation will be prohibitive

  17. Closer look at the power 100,000 Will be... 18KW 10,000 5KW Should be... Power (Watts) 1.5KW 623W 1,000 500W 375W 225W 135W 100 2002 2004 2006 2008 Year

  18. Advanced transistor design Shallow highly doped source/drain extension Thin TOX p+ p+ Halo/pocket Retrograde Well Shallow trench isolation n-well Deep source/drain

  19. Intel’s 15 nm bulk transistor R. Chau et al., IEDM 2000

  20. Transistor scaling trends - SCE Uniform doping Retrograde doping 7.0 6.0 5.0 MOSFET Aspect Ratio (lateral/vertical) 4.0 3.0 2.0 1.0 0.25 0.18 0.13 0.10 0.07 Technology Generation (um) Short channel effect (SCE) as measured as aspect ratio has been worsening with scaling Le Tox Dj D Aspect Ratio:

  21. Transistor scaling challenges - Dj • Junction depth reduction: + Device channel length decrease for same SCE - Series resistance to the channel increases

  22. Transistor scaling challenges - Tox • Thinning gate oxide • Increased gate tunneling leakage • Electrical thickness is ~2X physical thickness • Gate stress now limits max VCC • Solutions • New decoupling caps • Modified oxides/gate materials • Model gate leakage in circuit simulation

  23. VCC and VT scaling

  24. Vcc scaling & Soft errors • Vcc and cap scaling with technology reduces charge stored • Soft errors prominent in logic circuits • No error correction in logic circuits • Storage nodes per chip increasing • Higher soft errors at the chip level

  25. Motivation SER A diff µ bit C V gate cc • Soft error rate (SER) per bit staying constant in future processes • T. Karnik et al, 2001 VLSI Circuits Symposium • Need to reduce SER/bit Goal: Reduce chip-level SER with no performance penalty and minimum power penalty

  26. Measured Latch Data Will need ~2X SER improvement in latches with no performance loss. SERX 2.25 7,000 2 5,250 Original Errors 3,500 SER ImprovementX 1,750 Hardened 0 1 0.5 0.7 0.9 1.1 1.3 Supply Voltage (V) T. Karnik et al, 2001 VLSI Circuits Symposium

  27. VT vs. leakage Leakage rises as the VT is lowered • MOS has a sub-threshold slope of ~110mV/decade • Lower VT by 50mV  3X leakage Solutions • Dual VT • Stacking of off gates • Controlled back gate bias? • Multiple process technologies: Mobile vs. Performance?

  28. Sub-threshold Leakage MOS Transistor Characteristics 10000 DVt 1000 Ids (log) Ioff (na/u) 100 Exponential Increase in Ioff 10 1 Vgs 30 50 70 90 110 130 Temp (C) Sub-threshold leakage current will increase exponentially Assumtions: 0.25mm, Ioff = 1na/m 5X increase each generation at 30ºC

  29. Leakage Power Excessive sub-threshold leakage power

  30. Leakage Power increases 50% 8KW 40% 1.7KW 30% Drain Leakage Power 400W 88W 20% 12W 10% 0% 2000 2002 2004 2006 2008 Year 100,000 0.18u 0.13u 0.1u 0.07u 0.05u 10,000 1,000 Ioff (na/u) 100 10 30 40 50 60 70 80 90 100 Temp (C) Drain leakage will have to increase to meet freq demand Results in excessive leakage power

  31. Wide Domino Functionality CLK CLK Q2 Q1 A B C B C Static Gate D2 Domino Gate CLK D1 Domino Gate • Lower AC noise margin ~ Vt • Ioff could limit NOR fan-in • High activity, higher power, ~2X • Irreversible logic evaluation • Scalability is not good • High performance ~30% over static • High fan-in NOR, less logic gates • High fan-in complex gates possible • Smaller area

  32. Bitline Delay Scaling Problem 1.2 Logic circuit delay Bit line delay (15% swing scaling) 1 Bit line delay (const swing) 0.8 Normalized delay 0.6 0.4 0.2 0 0.25 0.18 0.13 0.10 Technology generation (um) • Bit line swing limited by parameter mismatch & differential noise • Cell stability degrades with Vt lowering • Bit line delay a(Cap/W)*Vswing/(Ion/W - #rows*Ioff/W) • Reducing # of rows per bitline approaching limit

  33. Restrict transistor leakage 10000 7 GHz 5.5 GHz 4 GHz 2.5 Ghz 1000 Pentium 4 Frequency (Mhz) Pentium II 100 Pentium 486 386 10 1985 1990 1995 2000 2005 2010 Year Reduce leakage Frequency will not double every 2 years

  34. Interconnect scaling trends

  35. Interconnect performance R increases faster at lower levels C increases faster at higher levels RC increases ~40-60%

  36. Interconnect distribution Interconnect distribution does not change significantly

  37. Wire Scaling • Tall wires to reduce R • thickness to width ratios of 2 to 1 • large cross cap • Uarch for short wires • Repeaters

  38. Optimum Repeater P size 2 N size 2 Repeater distance 1 • Best speed at • space ~2X width • Include metal thickness and optimize for PD3 • thickness ~2X width Pitch • Vary • N size, P size • Repeater distance • Metal width, space

  39. P, V, T Variations Voltage Process • Chip activity change • Current delivery—RLC • Dynamic: ns to 10-100us • Within-die variation • Die-to-die variation • Within-die variation • Static for each die Very slow Device Ion Temperature Years • Activity & ambient change • Dynamic: 100-1000us • Within-die variation Time dependent degradation

  40. Frequency & SD Leakage 1.4 1.3 1.2 1.1 1.0 0.9 0 5 10 Low Freq Low Isb High Freq High Isb High Freq Medium Isb Normalized Frequency 0.18 micron ~1000 samples 30% 20X 15 20 Normalized Leakage (Isb)

  41. Vt Distribution High Freq Medium Isb Low Freq Low Isb High Freq High Isb 120 0.18 micron ~1000 samples 100 80 ~30mV # of Chips 60 40 20 0 -39.71 -25.27 -10.83 3.61 18.05 32.49 D VTn(mv)

  42. Frequency Distribution High Freq Medium Isb Low Freq Low Isb High Freq High Isb 150 100 # of Chips 50 0 1.37 1.30 1.22 1.15 1.07 1.00 Freq (Normalized)

  43. Isb Distribution High Freq Medium Isb Low Freq Low Isb High Freq High Isb 100 # of Chips 1 20.11 16.29 12.47 8.64 4.82 1.00 Isb (Normalized)

  44. Supply Voltage Variation Reliability & power  Vmax Supply voltage (V) Vmin  frequency Time (msec) • Activity changes • Current delivery RI and L(di/dt) drops • Dynamic: ns to 10-100us • Within-die variation

  45. Handling di/dt • Land-side / package capacitors • High frequency or local VRMs • Low leakage on-die capacitors Bulk Decoupling High Frequency Decoupling VRM Response Local Decoupling Silver BoxResponse On DieDecoupling

  46. Vcc Variation Reduction • On die decoupling capacitors reduce DVcc • Cost area, and gate oxide leakage concerns • On die voltage down converters & regulators

  47. Temperature Variation Cache 70ºC Core 120ºC • Activity & ambient change • Dynamic: 100-1000us • Within-die variation

  48. Major Paradigm Shift • From deterministic design to probabilistic and statistical design • A path delay estimate is probabilistic (not deterministic) • Multi-variable design optimization for • Yield and bin splits • Parameter variations • Active and leakage power • Performance

  49. Performance Efficiency of mArch Pollack’s Rule 4 3 Area(Lead / Compaction) 2 Growth (X) Performance(Lead / Compaction) 1 *Note: Performance measured using SpecINT and SpecFP 0 1.5 1 0.7 0.5 0.35 0.18 Technology Generation • Implications (in the same technology) • New microarchitecture ~2-3X die area of the last uArch • Provides 1.5-1.7X performance of the last uArch We are on the wrong side of a Square Law

  50. Frequency & Performance 100 Additional due to Pentium® 4 proc uArch 7X Perf due to Freq Relative Performance Pentium® II & III proc 10 14X Pentium® proc i486 1 1.0m 0.7m 0.5m 0.35m 0.25m 0.18m • Frequency increased 61X • 18.3X process technology • Additional 3.3X  uArch • Performance increased 100X • 14X  process technology • Additional 7X  uArch, design

More Related