230 likes | 514 Vues
Summary of the Workshop on FPGAs for High-Energy Physics. PhD Student: Salvatore Danzeca Supervisor: Giovanni Spiezia. Summary. Introduction Altera Arria GX Test for LHCb Microsemi Igloo 2 for CMS HCAL Kintex 7 Mitigation techinques and general TMR
 
                
                E N D
Summary of the Workshop on FPGAs for High-Energy Physics PhD Student: Salvatore Danzeca Supervisor: Giovanni Spiezia
Summary • Introduction • Altera Arria GX Test for LHCb • Microsemi Igloo 2 for CMS HCAL • Kintex 7 Mitigation techinques and general TMR • Experience from using SRAM based FPGAs in the ALICE TPC Detector and Future Plans
The eternal fight FLASH BASED FPGA 130 nm dual Poly process CMOS (ProAsic3) 65 nm process (IGLOO2) Xilinx 28 nm process high-k metal gate (HKMG) technology
RADWG Community • Most of the project in the RADWG community use the Flash based FPGA. • Better SEU immunity • Easy to harden against SEU by use of GLOBAL TMR • Resources available are comparable to an SRAM FPGA of 3 years ago • TID limit not high as the SRAM FPGA • In which case we should use SRAM FPGA?? • TID is a concern • Performances are a concern • SEU can be tollerated
Proton irradiation test of an Altera SRAM-based FPGA for the possible usage in the readout electronics of the LHCb experimentPresenter: Christian FAERBER • 2x Arria GX – EP1AGX35DF780I6 (90nm) • Application LHCb Outer Tracker detector • FPGA used as TDC and Gbit/s trans • Tested with 22 MeV protons
Results • Current • FPGA Core current rises after 150 krad(Si) and reaches 107% after 7 Mrad(Si). • FPGA I/O current starts to drop after 400 krad(Si) and reaches 94% at 7 Mrad(Si). • All permanent current changes are between 5% - 20% and begin after 150 krad(Si) • Stability of Implemented TDC • Wrong time measurement after a TID of 400 krad(Si) • Shifted time measurement after a TID of 4 Mrad • Stability of PLL • 3 PLL clock signals monitored • 3 frequencies did not changed • The phase between clk1 and clk2 shows a shift from -150° to larger values after 3 Mrad(Si) • FPGA Gbit/s Transceiver Tests • Loss of bit alignment: Recovered by sending next bit alignment word. Cross section: (1.3±0.5) x10-10cm²/GBittransceiver • De-synchronization of transmitter and receiver: Needed reprogramming of the FPGA Cross section: (8±4) x10-11cm²/GBit transceiver • FPGA configuration registers • cyclic redundancy checker tool from Altera Cross section:(1.6±0.2) x10-9cm²/FPGA
FPGAs in the upgrade of CMS HCALPresenter: TullioGrassi FPGAs both in the Front-End Electronics (FEE) mounted on the detector and in the Back-End electronics (BEE) located in the counting rooms.
Solutions • Microsemi ProASIC3L • interface the ProASIC3L to the Cern GBTX  need the ability to receive SLVS (a differential signal similar to LVDS but with smaller amplitude). • ProASIC3L can receive SLVS :A Belloni et al, “Radiation tolerance of an SLVS receiver based on commercial components”, Journal of Instrumentation (JINST 2014) • MicroSemi Igloo 2 • On-going tests by Univ. of Minnosota with 230 MeV • failure after 2x1012 protons • no SEU seen on a TMR-type shift-register • no SET seen • PLL : observed 400 SEUs over a fluence of 1011 protons/cm2 • LATEST NEWS (2014) : serializer running at 4.8 Gbps: loss-of-sync observed with cross section = 1.7 E-10 cm2. A power cycle was issued after every loss-of-sync, after that the link was working again. It was not attempt to reset (part of) the serializer.
Scrubbing Approaches for Kintex-7 FPGAsPresenter: Michael Wirthlin • Xilinx Kintex 7 • Commercially available FPGA • 28 nm, low power programmable logic • High-speed serial transceivers (MGT) • High density (logic and memory) • Built-In Configuration Scrubbing • Support for Configuration Readback and Self-Repair • Auto detect and repair single-bit upsets within a frame • SEU Mitigation IP for correcting multiple-bit upsets • Proven mitigation techniques • Single-Event Upset Mitigation (SEM) IP • Configuration scrubbing • Triple Modular Redundancy (TMR) • Fault tolerant Serial I/O State machines • BRAM ECC Protection
Kintex 7 ARCHITECURE and SCRUBBING • Device configuration organized as “Frames” • Smallest unit of configuration and readback • Individual frames can be configured (partial reconfiguration) • Individual frames can be read (readback) • 101 words x 32 bits/word = 3232 bits/frame • Frames organized into different “Blocks” • Block 0: Logic/Routing Configuration Data (22546 frames) • Block 1: BlockRAM configuration/contents (5774 frames) • Frames can be “scrubbed” during device operation • Writing individual configuration frames overwrites previous data • Replaces “bad” data in the presence of upsets • Writes “same” data when no presence of upsets • Scrubbing involves continuous reading/writing of configuration data
SCRUBBING CONFIGURATION DATA • Each Frame contains SECDED ECC Code • Provides single-bit correction and double bit detection • Identifies the location of the single-bit upset • Identifies presence of double bit upset • Double-error detection can be masked with >2 upsets in frame • Entire bitstream checked with global CRC • Detects failure of individual ECC words (masked ECC) • Suggests full reconfiguration if global CRC error detected • Internal FrameECC Block • Dedicated block for ECC computation and error correction INTERNAL Scrubber EXTERNAL Scrubber • must respond to >2 bit frame errors
Triple Modular Redundancy (TMR) • TMR has lower reliability than non-redundant for long mission times • Effective TMR almost always is coupled with “repair” • TMR + Repair= Very Reliable!
Fault repair through scrubbing • Fixes the cause of the error • Does NOT fix the state of the circuit • State of circuit must be synchronized to working circuits
BYU-LANL TMR Tool • BYU-LANL Triple Modular Redundancy • Developed at BYU under the support of Los Alamos National Laboratory (Cibola Flight Experiment) • Used to test TMR on many designs • Fault injection, Radiation testing, in Orbit • Testbed for experimenting with various TMR application techniques (used for research)
Experience from using SRAM based FPGAs in the ALICE TPC Detector and Future PlansPresenter: Johan Alme • 1000 samples/event (10bit) • 4356 * 128 channels • 700MByte/event • 200 Hz/1kHz eventrate • 142-710 GByte/s (Raw) • Data compression:5-20 Gbyte/s (~x30) • The RCU main FPGA sits in the datapath • Data readout is handled by the Readout Node • 92% CLBs • 75% BRAM blocks (Remaining 25% BRAM can not be used due to the Active Partial Reconfiguration) • Result: TMR or any other mitigation techniques are not applicable • Solution: Reconfiguration • Consists of: • A radiation tolerant flash memory, a radiation tolerantflash based FPGAand the DCS board – an Embedded PC with Linux. • Corrects SEUs in the configuration memory of the Xilinx Virtex-II pro vp7 • Why it works: • Active Partial Reconfiguration
Problems and solution • 2011Pb-Pbrun: • 300 - 400 x 1024 cm-2s-1 : ~5 SEUs/h for all 216 • Run 2 scenario: • Peak luminosity 1 – 4 x 1027cm-2s-1 : ~45 SEUs/h for all 216 FPGAs • Solution : Upgrades the RCU –> RCU2 • New «state of the art» System on Chip FPGA – Microsemi smartFusion2 • Faster, bigger, better in radiation! • First flashbased FPGA with SERDES • Test carried out at the end of April • Waiting for the results!!
Monitoring of Radiation Levels • On the present RCU we have the Reconfiguration Network acting as a radiation monitor • Additional SRAM memory and Microsemi proASIC3 250 added to the RCU2 • Cypress SRAM – same as used for the latest LHC RadMon devices
Clooser Look • Workshop on FPGAs for High-Energy Physics, http://indico.cern.ch/event/300532/timetable/#20140321 • Fault-Tolerance Techniques for SRAM-Based FPGAs (Frontiers in Electronic Testing) by Fernanda Lima Kastensmidt and Ricardo Reis (May 3, 2006) • Kintex 7 Article http://www.eetimes.com/author.asp?section_id=36&doc_id=1287740&page_number=1 • Soft Error Rate Estimations of the Kintex-7 FPGA within the ATLAS Liquid Argon (LAr) Calorimeter, TakaiHelio, http://indico.cern.ch/event/228972/session/19/contribution/49/material/slides/0.pdf • What's Microsemi Done With Actel's IGLOO Product Range? http://www.eetimes.com/author.asp?section_id=36&doc_id=1319435