FE-I4 Chip Design

FE-I4 Chip Design Marlon Barbero, Bonn University Vertex 2009, Putten, The Netherlands Feb. 17th 2009

Contents • IBL (sensor damage) / sLHC (increased luminosity). From FE-I3, current ATLAS pixel FE, to FE-I4, new ATLAS pixel FE for IBL & sLHC. • Analog pixel. • Digital pixel and digital Double-Column. • Periphery. • Yield. • Conclusion and milestones.

FE-I4 for IBL & sLHC • sLHC tentative layout (>2017): 4 to 5 pixel layers, small radii / large(r) radii(note: Discussion on boundary pixel / short strips, …). • IBL (~2014): inserted layer @ 3.7cm in current pixel detector. ATLAS present Inner Detector (ID) FE-I4 Present beam pipe & B-Layer tentative ID layout for sLHC? Existing B-layer • - All Silicon. • Long Strips/ Short Strips / Pixels. • Pixels: • 2 or 3 fixed layers at ‘large’ radii (large area at 16 / 20 / 25 cms?) • 2 removable layers at ‘small’ radii New beam pipe r~37 mm Tobias Flick’s IBL mounted on beam pipe

Motivation for Redesign of FE FE-I3FE-I4 • Need for a new FE? • Smaller b-layer radius + potential luminosity increase • higher hit rate. • FE-I3 column-drain architecture saturated. • FE-I4 new digital architecture: local regional memories, stop moving hits around (unless RO). • FE-I4 has smaller pixel (reduced cross-section). • New technology:  Higher integration density for digital circuits, rad-hard, availibility. 100 Inefficiency [%] sLHC 80 IBL 60 40 FE-I3 at r=3.7 cm! LHC 20 0 Hit prob. / DC The “inefficiency wall” 0.25 μm130 nm

Future FE-I4-Based Module (& Consequences for FE-I4) Flex • Increased active area: from less than 75 % to ~90 %:  Reduced periphery; bigger IC; cost down for sLHC (main driver is flip-chip costs per chip). • No MCC:  More digital functionality in the IC. • Power:  Analog design for reduced currents; decrease of digital activity (digital logic sharing for neighbor pixels); new powering concepts. 8 metal layers [2 thick Alu.]  power routing. 4 4 Sensor 3 1 2 FE-Chip 3 1 2 5 1) Big chip (periphery on one side of module). 2) Reduce size of periphery (2.8 mm2 mm). 3) Thin down FE chips (190 μm90 μm). 4) Thin down the sensor (250 μm 200 μm)? 5) Less cables (powering scheme)? challenging: power (routing, start-up), clk. distrib., simulation / management, yield

Some Target Specs for FE-I4 • Rad.-hardness: >200 MRad ionizing dose (FE-I3: >50 Mrad). • Minimal guidelines: no ELT, NMOS guard rings for analog & sensitive digital circuitry. • ToT coded 4 bits. • DC leakage current tolerant to > 100 nA. biggest in HEP to date analog / digital power tuned for IBL occupancy

Analog Pixel • In FE-I4_proto1 (FE-I4 prototype submitted in 2008): • 2-stage architecture optimized for low power, low noise, fast rise time.  regul. casc. preamp. nmos input.  folded casc. 2nd stage pmos input.  Additional gain, Cc/Cf2~6.  2nd stage decoupled from leakage related DC potential shift.  Cf1~17fF (~4 MIPs dyn. range). • 12b configuration:  FDAC: tuning feedback current.  TDAC: tuning of discriminator threshold.  Local charge injection circuitry. TDAC Amp2 50 mm discri Preamp Config Logic FDAC 150 mm

Analog Pixel: Noise & Irradiation ~200 Mrad • Excellent un-tuned threshold dispersion. • Dose received: ~200 Mrad Low I (10 μA): noise increases ~20 %. (17μA) (12μA) (loaded ~400fF) (loaded ~400fF) m~65 e- m~90 e- m~65 e- m~100 e- Low current (which is target value) is 10 μA/pixel for preamp+amp2+comparator Current for minimum noise: 17 μA/pixel

Digital Pixel: Regional Architecture 4-Pixel Unit Digital Region Read & Trigger Token disc. top left disc. top right hit proc.: TS/sm/big/ToT 5 ToT memory /pixel disc. bot. left disc. bot. right L1T Read Neighbor 5 latency counter / region low traffic on DC bus local storage • Store hits locally in region until L1T. • Only 0.25% of pixel hits are shipped to EoC  DC bus traffic “low”. • Each pixel is tied to its neighbors -time info- (clustered nature of real hits). Small hits are close to large hits! To record small hits, use space instead of time. Handle on TW. • Consequences: • physics simulation shows efficient architecture. • lowers digital power consumption. • spatial association of digital hit to recover lower analog performance.

Performance / Efficiency IBL: charge sharing in Z comparable with r/phi. Regional Buffer Overflow η=0 0.6% @ IBL rate, pile-up inefficiency is the dominant source of inefficiency • Inefficiency: • Pile-up inefficiency (related to pixel x-section and return to baseline behavior of analog pixel)  ~ 0.5%. • Regional Buffer overflow  ~0.05% • Inefficiency under control for IBL occupancy Mean ToT = 4

Digital Column Architecture • 168 regions + CLK + buffering scheme  1 digital DC

Digital Power Drop on Vdd • Digital power: • at IBL occupancy, • digital power < 10μW/pixel. 4-pixel region for 21 regions <7mV

Pixel Layout 250 mm TDAC Amp2 50 mm synthezised digital region (1/4th ) discri Preamp FDAC Config Logic Note: Digital ground tied to substrate, mixed signal environment BUT digital region placed in “T3” deep n-well.

FE-I4 Periphery Pix Array: 80×336 pixel array L1T, token, read, … pixel config token token 28 b × 40 DC EoC EoC EoC L1T, token, read, … data formatting / compression config. monitoring Periphery: Asynch. FIFO (hamming code) PLL (40MHz  160MHz), high speed serializer… The pixel array + End of Column logic (triple redundant token) pixel config Powering: DC-DC converters, Shunt-LDO readout FIFO Data formatting block digital ctrl block Command decoder: configuration, reset & L1T 8b10b encoder unit LVDS output, 160 Mb/s, suited for IBL occupancy trigger FIFO digital ctrl block L1T 160MHz global config PLL, 40MHz in, 160MHz out global register bank interface clk select 40MHz ‘LVDS’-out 160Mb/s Powering aux 2

Yield • Estimated from: • Small analog test chips. • 8 fully tested wafers of Medipix 3 ICs, assuming same defect density for synthesized logic. • Expect of order ~39% digitally perfect chips. • Yield enhancement: • Triple redundant read tokens. • Hamming coded pixel data and address (w. minimal # of gates). • Redundant configuration shift register. • Fully functional chips yield might be as high as 76%. (with isolated dead pixels at level <0.1%).

Test Chip Submission FE-I4-P1 3mm LDORegulator SEU test IC 61x14 array ShuLDO+trist LVDS/LDO/10b-DAC Control Block ChargePump 4-LVDS Rx/Tx CapacitanceMeasurement turboPLL: PLL core + PRBS + 8b10b coder + LVDS driv low power discri. 4mm DACs CurrentReference

FE-I4 Collaboration • Meeting 1 time a week. • Collaborate remotely using Cliosoft platform. • Participating institutes: Bonn: D. Arutinov, M. Barbero, T. Hemperek, A. Kruth, M. Karagounis. CPPM: D. Fougeron, M. Menouni. Genova: R. Beccherle, G. Darbo. LBNL: S. Dube, D. Elledge, M. Garcia-Sciveres, D. Gnani, A. Mekkaoui. Nikhef: V. Gromov, R. Kluit, J.D. Schipper FE-I3 18 160 FE-I4 Schedule: Submission planned for November 2009 (with submission readiness review 3-4 Nov. 2009)

backup BACKUP SLIDES

Existing B-layer Newbeam pipe IBL mounted on beam pipe Present beam pipe through present B-Layer

Preamp. & Leakage Compensation I_leakage compensation • Regulated cascode preamp.  high gain.  less crosstalk path through biasing voltage. • Triple well NMOS input.  shield from substrate noise. • Feedback capacitor discharged by NMOS feedback transistor. • Leakage current compensation scheme based on differential amplifier. Vout[V] 2ke<Qin<22ke Vout[V] 250m 100nA leakage 10mV DC shift 380m 225m 300m 200m t[s] 0.1m 0.5m Cst I feedback 200m t[s] 0.1m 0.5m

2nd stage & Comparator 2nd stage amplif. • PMOS input folded regulated cascode (straight cascode in futur?). • Negative going output. • Classic 2-stage comparator. ENC[e-] 150 ENC=160e- @ Cd=0.4p & Il=100nA tLE[s] Il=100nA 100 20n Discriminator Il=0nA 60 10n 100f 200f 300f Cd[F] 20ns timewalk for 2ke- < Qin < 52ke- & threshold @1.5ke- 0 Qin[C] 10k 20k 30k 40k

Clock Distribution • Physical implementation • Differential Logic. • Single Ended. • Possible structures • simple • H-tree • with skew balancing 22

Clock Multiplier I/O choices for ATLAS IBL, ATLAS Pixel System Design Task Force • For IBL, need to transmit data out at BW of 160Mb/s • 2 options: • send a 80MHz CLK to the FE and use both edges to transmit • Needs modification of BOC / ROD to produce higher speed TTC • Needs synchronization protocol on the FE between 80MHz clock & beam crossing. • A new DORIC needs to decode CLK at twice frequency • send a 40MHz CLK to the FE and multiply clock on FE • Needs a clock multiplier on chip • Note: synergy with what the strip MCC need • In FE-I4, we will provide both options: • Clock multiplier from the 40MHz input clock • AUX: possibility to send the 80MHz to the FE

8b10b encoder I/O choices for ATLAS IBL, ATLAS Pixel System Design Task Force • For IBL, need to transmit data out at BW of 160Mb/s • At BOC/ROD: • Data rate 4 times the clock rate • Phase adjustment • Use Clock Data Recovery mechanism • CDR requires an output data stream with good engineering properties • 8b10b: • adequate for this purpose, enough transitions for reliable CDR • widely used  easy to implement • provides some level of error detection • provides comma for frame identification & synchronization

SEU-hardened latch • CPPM has studied the influence of various layout of a DICE latch on the SEU x-section. Physical separation of sensitive node pairs. Latch5.1 and latch5.2 ; Area :12µm × 4µm = 48 μm2 nMos separation : 7µm ; pMos separation : 3 µm Triple Redundant Logic with Interleaved Layout. Calin et al, IEEE Trans. Nucl. Sci. vol43, n.6, 1996 • X-section [cm2.bit-1]: • Standard Latch: ~ 5.10-14 • DICE w. improved layout: ~ 3.10-16 1.a 2.a 3.a 1.a 2.a 3.a 1.b 2.b 3.b 1.b 2.b 3.b X-section : < 1.10-17

PLL Overview Voltage Controlled Oscillator Charge Pump Loop Filter Phase Frequency Detector 640 MHz 40 MHz Frequency Divider Conversion and Buffering

Upset detection II 3 pC upset charge in fast divider Feedback too fast signal Feedback too slow signal

Settling Behaviour Control Voltage 3 pC upset charge in fast divider Frequency of single-ended 40 MHz clock Frequency of single-ended 640 MHz clock

3×LHC / b-layer replacement FE-I3, 50μm×400μm. FE-I4 simul., 50μm×250μm. r [mm] η=1.0 η=0.1 η=0.2 η=0.3 η=0.4 η=0.5 η=0.6 η=0.7 η=0.8 η=0.9 200 η=1.2 160 1.41 1.24 1.26 1.26 1.37 1.34 1.33 122.5 120 rates given in [pixel hits.bx-1cm-2] η=1.5 2.55 2.56 2.54 2.55 2.64 2.65 2.64 88.5 80 η=2.0 6.30 6.46 6.03 5.85 5.91 6.46 6.11 50.5 η=2.5 40 12.10 11.53 12.01 11.85 11.72 12.11 8.02 37 η=3.0 η=3.5 z [mm] 0 600 0 100 200 300 400 500

Pixel occupancy  Data bandwidth • Pixel hit rate  FE output bandwidth: • # bits / pixel transmitted? • address 7+9 bits, analog info 4+2 bits 22b? • data output protocol? • Reduce data output by taking into account clustered nature of real physics hits. NUMBER OF PIXELS 3xLHC FE-I4, central module, 3.7cm layer 3xLHC 10xLHC FE-I4, central module, 3.7cm layer FE-I4, central module, 21cm layer

Pixel occupancy  Data bandwidth preliminary assumption: 100kHz L1T, 336×80 pixels FE-I4 • Example 3: clustered data out with fixed format. • compression factor (all at 3×LHC) 3.7cm (vs. 21cm), η=0 • indiv pixels: 4.09 (0.25)×(7+9+4+2)= 1.00 (1.00) A.U. • static 1×2: 3.45 (0.18)×(7+8+2×4+2)=0.96 (0.83) A.U. • dynamic 1×2: 3.02 (0.15)×(7+9+2×4+2)= 0.87(0.74) A.U. • static 1×4: 2.86 (0.17)×(6+8+4×4+4)=1.08(1.08) A.U. • dyn. in-DC 1×4: 2.43 (0.15)×(6+9+4×4+4)= 0.95(0.95) A.U. • dynamic 1×4: 2.13 (0.14)×(7+9+4×4+4)= 0.85(0.94) A.U. (×336) column NL row 106.count.FE-1.s-1 row ToT DC (×40) Disclaimer: no header, trailer, DC-balancing, error correction…

Shunt Regulator (FE-I3 approach) • Shunt regulator generates a constant output voltage out of the current supply • current that is not drawn by the load is shunted by transistor M1 • Very steep voltage to current characteristic • Mismatch & process variation will lead to different Vref and Vout potentials • Most of the shunt current will flow to the regulator with lowest Vout potential • Potential risk of device break down at turn on • Using an input series resistor reduces the slope of the voltage to current characteristic I=f(V) • RSLOPE helps distributing the shunt current between the parallel placed regulators • RSLOPE does not contribute to the regulation and consumes additional power without resistor Iin[mA] with resistor 750 500 250 0 Vout[V] Slide 32 0 0.5 1 1.5 2

LDO Regulator with Shunt Transistor (ShuLDO) Simplified Schematic • Combination of LDO and shunt transistor • M4 shunts the current not drawn by the load • Fraction of M1 current is mirrored & drained into M5 • Amplifier A2 & M3 improve mirroring accuracy • Ref. current defined by resistor R3 & drained into M6 • Comparison of M5 and ref. current leads to constant current flow in M1 • Ref. current depends on voltage drop VIin which again depends on supply current Iin • „Shunt-LDO“ regulators having completely different output voltages can be placed in parallel without any problem regarding mismatch & shunt current distribution • Resistor R3 mismatch will lead to some variation of shunt current (10-20%) • „Shunt-LDO“ can cope with an increased supply current if one FE-I4 does not contribute to shunt current e.g. disconnected wirebond  ref current goes up • Can be used as an ordinary LDO when shunt is disabled Slide 33

Parallel Regulator Operation Simulation Vout=1.5 Vout=1.2 • 2 regulators placed in parallel with Vout1=1.2 and Vout2=1.5 • Output voltages settle at different potentials • Current flowing through the regulator stays the same

LVDS Driver • Standard LVDS architecture for 320MHz clock rate and adapted to low supply voltage 1.5-1.2V • Current is routed to/from the output by four switches M3-M6 • Common-mode voltage is measured by a resistive divider and controlled by a common- mode feedback circuit • Output current is switchable between 0.6 -3mA Boni et al., IEEE JSSC VOL. 36 NO. 4, 2001

LVDS Receiver • parallel PMOS/NMOS comparator input stages allow operation in a wide range of common-mode voltages • cross coupled positive feedback structure allows introduction of hysteresis • second stage sums singals from both input stage and converts them to a CMOS output signal Tyhach et al., A 90-nm FPGA I/O Buffer Design With 1.6-Gb/s Data Rate for Source-Synchronous System and 300-MHz Clock Rate for External Memory Interface, IEEE JSSC, Sept. 2005

LVDS transciever IBM 130nm • For IBL and outer layers sLHC, need for a 320Mb.s-1 BW/ LVDS i/0. • LVDS transciever IC irradiated up to ~180Mrad. No degradation observed. 1.8mm tests with differential probe and 100 Ω on board term. @ 1.2V supply TX output Chained RxTx output @ 320 MHz Clock Clock-Rate 1050mV 320MHz 600mV 160MHz 150mV 40MHz Common Mode Voltage 0.8mm

Output data protocol • 8b10b frame: SOF (K.28.7), followed by 24 bits record word(s) and an EOF (K.28.5) + idle (K.28.1) Table: possible data words

Future Directions • Still higher rate capability • Need smaller, faster pixel • Yet need more memory per pixel to buffer higher rate • Two directions to explore: 3D and 65nm • FE-I4 region placed on 2 130nm tiers would have 60% pixel size and 50% more logic/memory FE-I3 pixel -- 3.2 bits FE-I4 -- 25 bits Goal ~35 bits

FE-I4 Chip Design

FE-I4 Chip Design

Presentation Transcript

FE-I4 irradiated chip tests at low temperature

Cold tests of irradiated FE-I4 at CERN

The FE-I4 Pixel Readout System-on-Chip for ATLAS Experiment Upgrades

Multi-IO board and FE-I4 emulator

The Design for Test Architecture in Digital Section of the ATLAS FE-I4 Chip

Shunt-LDO in FE- I4

Status of measurements of FE-I4 SEU and PRD

Design Tools, Flows and Library Aspects during the FE-I4 Implementation on Silicon

Status of measurements of FE-I4 SEU and PRD

Status of measurements of FE-I4 SEU and PRD

Noise in the irradiated FE-I4-B versus temperature

Status of measurements of FE-I4 SEU and PRD

FE-I4a Single Chip Card issues

FE-I4 Digital Test Bench

Chip design

FE-I4 pixel readout chip and IBL module

August 2011 GR SEU measurements of FE-I4

FE-I4 and testing status

Status of measurements of FE-I4 SEU and PRD

FE-I4 Architecture & Performance

Test Setup for FE-I3 single chips / modules, FE-I4_proto1 and for full scale FE-I4

FE-I4 Test Setup Hardware Needs: Boards & Interfaces

FE-I4 Chip Design

FE-I4 Chip Design

Presentation Transcript

FE-I4 irradiated chip tests at low temperature

Cold tests of irradiated FE-I4 at CERN

The FE-I4 Pixel Readout System-on-Chip for ATLAS Experiment Upgrades

Multi-IO board and FE-I4 emulator

The Design for Test Architecture in Digital Section of the ATLAS FE-I4 Chip

Shunt-LDO in FE- I4

Status of measurements of FE-I4 SEU and PRD

Design Tools, Flows and Library Aspects during the FE-I4 Implementation on Silicon

Status of measurements of FE-I4 SEU and PRD

Status of measurements of FE-I4 SEU and PRD

Noise in the irradiated FE-I4-B versus temperature

Status of measurements of FE-I4 SEU and PRD

FE-I4a Single Chip Card issues

FE-I4 Digital Test Bench

Chip design

FE-I4 pixel readout chip and IBL module

August 2011 GR SEU measurements of FE-I4

FE-I4 and testing status

Status of measurements of FE-I4 SEU and PRD

FE-I4 Architecture &amp; Performance

Test Setup for FE-I3 single chips / modules, FE-I4_proto1 and for full scale FE-I4

FE-I4 Test Setup Hardware Needs: Boards &amp; Interfaces

FE-I4 Architecture & Performance

FE-I4 Test Setup Hardware Needs: Boards & Interfaces