1 / 82

FPGA: From Flashing LED to Reconfigurable Computing

FPGA: From Flashing LED to Reconfigurable Computing. Wu, Jinyuan Fermilab IIT Mar, 2009. Outline. Electronic Aspect of FPGA: LED Flashing Logic Elements in a Nutshell TDC and ADC FPGA as a Computing Fabric: Moore’s Law Forever? Space Charge Computing with FPGA Cores

rowena
Télécharger la présentation

FPGA: From Flashing LED to Reconfigurable Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FPGA: From Flashing LED to Reconfigurable Computing Wu, Jinyuan Fermilab IIT Mar, 2009 Wu Jinyuan, Fermilab jywu168@fnal.gov

  2. Outline • Electronic Aspect of FPGA: • LED Flashing • Logic Elements in a Nutshell • TDC and ADC • FPGA as a Computing Fabric: • Moore’s Law Forever? • Space Charge Computing with FPGA Cores • Doublet Matching & Hash Sorter • Triplet Matching & Tiny Triplet Finder • Enclosed Loop Micro-Sequencer (ELMS) Wu Jinyuan, Fermilab jywu168@fnal.gov

  3. Flashing LED, The First Thing First Counter Q[23..0] • At least design an LED for an FPGA. • When a board is first powered up, first test the LED flashing function. • Many things have to be right so that the LED flashes: • Power pins must be all connected. • Configuration devices must be in correct mode. • Design software must be correct. Wu Jinyuan, Fermilab jywu168@fnal.gov

  4. LUT Counter A Q[23..0] A<B B LED Brightness Variation Counter A Q[23..0] A<B • The LED brightness is varied by changing the output pulse duty-cycle. • Comparator input A is the brightness and B is the clock cycle count. • Look-up table can be added to input A for different brightness variation curve. B Wu Jinyuan, Fermilab jywu168@fnal.gov

  5. DAC Input A A>B Counter B Q Duty-Cycle Based Single-Pin DAC (1) • The duty-cycle or pulse width of the comparator output is proportional to the DAC input at port A. • Use external RC as low-pass filter. • Output voltage of an ideal LP filter is proportional to the DAC input. Wu Jinyuan, Fermilab jywu168@fnal.gov

  6. LED Brightness Exponential Drop if (CO==1) {Q = Q - Q/32;} S(-) SET D Q • Narrow pulse are typically stretched for LED display with fix brightness. • The circuit here provides gradually dim of the LED for better visual effect. A A<B Counter CO B Q Wu Jinyuan, Fermilab jywu168@fnal.gov

  7. Exponential Sequence Generator if (CO==1) {Q = Q - Q/32;} S(-) SET D Q Possible Student Lab • An exponential sequence is generated using an accumulator shown above. • Note that not even one multiplier is used. • Other function sequences: sine, co-sine, tangent, co-tangent etc. can also be generated similarly. Wu Jinyuan, Fermilab jywu168@fnal.gov

  8. S CO DAC Input D Q Duty-Cycle Based Single-Pin DAC (2) Possible Student Lab • Use carry-out of the accumulator as the output. • The number of pulses is proportional to the DAC input. • Rounding error is carried to later cycles. • Output is smoother. Wu Jinyuan, Fermilab jywu168@fnal.gov

  9. Outline • Electronic Aspect of FPGA: • LED Flashing • Logic Elements in a Nutshell • TDC and ADC • FPGA as a Computing Fabric: • Moore’s Law Forever? • Space Charge Computing with FPGA Cores • Doublet Matching & Hash Sorter • Triplet Matching & Tiny Triplet Finder • Enclosed Loop Micro-Sequencer (ELMS) Wu Jinyuan, Fermilab jywu168@fnal.gov

  10. D D Q Q ENA ENA CLRN CLRN Logic Elements A B C D LUT4 (16 RAM Cells) Normal Mode: LUT4 + DFF LUT = Look-Up Table CI A LUT3 8 Cells Arithmetic Mode: 2 x LUT3 + DFF LUT3 8 Cells B CO Wu Jinyuan, Fermilab jywu168@fnal.gov

  11. “Any” 4-in Functions What Can Be Done With a Lookup Table A B C D Wu Jinyuan, Fermilab jywu168@fnal.gov

  12. D Q ENA CLRN Xilinx Look-Up Table 16-bit Distributed RAM RAM16 16-bit Shift Register SRL16 LUT4 4-input Look-Up Table Wu Jinyuan, Fermilab jywu168@fnal.gov

  13. D D D Q Q Q ENA ENA ENA CLRN CLRN CLRN Pipeline Structure LUT4 (16 RAM Cells) LUT4 (16 RAM Cells) LUT4 (16 RAM Cells) LUT4 (16 RAM Cells) Logic cells are usually designed in pipeline structures. Wu Jinyuan, Fermilab jywu168@fnal.gov

  14. D D Q Q ENA ENA CLRN CLRN Logic Element as a Full Adder Bit CI A LUT3 8 Cells LUT3 8 Cells B A LUT3 8 Cells LUT3 8 Cells B A Logic cell resembles a full adder bit. CO Wu Jinyuan, Fermilab jywu168@fnal.gov

  15. Myths on FPGA • We commonly heard about FPGA: • FPGA is cheap. • FPGA is fast. • FPGA is large. • FPGA can do anything. • Not really, at least it is not always the case. • The reality is: • FPGA is ultra-flexible. • As the cost of the flexibility, the transistor usage in FPGA is NOT efficient. • Good design tricks are needed. Wu Jinyuan, Fermilab jywu168@fnal.gov

  16. 4-Input NAND, 4-Input NOR, 4-Input NAOR 8 transistors each A B C D A B C D A B C D Y Y Y A B C D A C A B B D Y C Y A In ASIC D B Y C C D A B C D D A B Wu Jinyuan, Fermilab jywu168@fnal.gov

  17. D Q ENA CLRN Transistor Usage of Logic Element At least 96 transistors LUT 16-bit X 16 6-transistor RAM bit In FPGA Wu Jinyuan, Fermilab jywu168@fnal.gov

  18. A B B A B Ci B A A Ci Sb Ci Cob Ci A A B A B A B Ci B The Mirror Adder (Weste93) In ASIC 24-28 transistors Wu Jinyuan, Fermilab jywu168@fnal.gov

  19. Full Adder CI A S D Q B CO D Q ENA CLRN Full Adder At least 96 transistors LUT 8-bit LUT 8-bit In FPGA Wu Jinyuan, Fermilab jywu168@fnal.gov

  20. Other FPGA Resources • Other resources are available in FPGA devices: • RAM Blocks • Multipliers • Serial Data Receivers, Power PC, etc. Multipliers RAM Blocks 16 Logic Elements Wu Jinyuan, Fermilab jywu168@fnal.gov

  21. Outline • Electronic Aspect of FPGA: • LED Flashing • Logic Elements in a Nutshell • TDC and ADC • FPGA as a Computing Fabric: • Moore’s Law Forever? • Space Charge Computing with FPGA Cores • Doublet Matching & Hash Sorter • Triplet Matching & Tiny Triplet Finder • Enclosed Loop Micro-Sequencer (ELMS) Wu Jinyuan, Fermilab jywu168@fnal.gov

  22. TDC Using FPGA Logic Chain Delay • This scheme uses current FPGA technology  • Low cost chip family can be used. (e.g. EP2C8T144C6 $31.68)  • Fine TDC precision can be implemented in slow devices (e.g., 20 ps in a 400 MHz chip).  IN CLK Wu Jinyuan, Fermilab jywu168@fnal.gov

  23. Two Major Issues In a Free Operating FPGA • Widths of bins are different and varies with supply voltage and temperature. • Some bins are ultra-wide due to LAB boundary crossing Wu Jinyuan, Fermilab jywu168@fnal.gov

  24. Auto Calibration Using Histogram Method • It provides a bin-by-bin calibration at certain temperature. • It is a turn-key solution (bin in, ps out) • It is semi-continuous (auto update LUT every 16K events) 16K Events DNL Histogram S LUT In (bin) Out (ps) Wu Jinyuan, Fermilab jywu168@fnal.gov

  25. The Test Module Data Output via Ethernet FPGA with 8ch TDC Two NIM inputs BNC Adapter to add delay @ 150ps step. Wu Jinyuan, Fermilab jywu168@fnal.gov

  26. As good as ASIC TDC Test ResultNIM Inputs RMS 10ps 140ps 0 1 2 Wave Union TDC B BNC adapters to add delays @ 140ps step. Wave Union TDC B + NIM/ LVDS Wave Union TDC B Wave Union TDC B - LeCroy 429A NIM Fan-out Wave Union TDC B NIM/ LVDS Wave Union TDC B + Wave Union TDC B Wu Jinyuan, Fermilab jywu168@fnal.gov Wave Union TDC B

  27. Clock Domain Changing Multi-Sampling TDC FPGA Multiple Sampling Q3 QF c0 c0 QE Q2 • Ultra low-cost: 48 channels in $18.27 EP2C5Q208C7. • Sampling rate: 360 MHz x4 phases = 1.44 GHz. • LSB = 0.69 ns. c90 QD Q1 c180 Q0 c90 c270 DV T0 T1 Trans. Detection & Encode 4Ch Coarse Time Counter TS Logic elements with non-critical timing are freely placed by the fitter of the compiler. This picture represent a placement in Cyclone FPGA Wu Jinyuan, Fermilab jywu168@fnal.gov

  28. V1 V1 V3 V3 V2 V2 V4 V4 T1 T1 T2 T2 T3 T3 T4 T4 FPGA ADC Using FPGA AMP & Shaper ADC AMP & Shaper ADC • Analog signals from AMP & Shapers are directly fed to FPGA pins. • FPGA outputs and passive RC network are used to generate ramping reference voltage VREF. • The input voltages and VREF are compared using FPGA differential input receivers. • The times of transitions representing input voltage values are digitized by TDC blocks in FPGA. AMP & Shaper ADC AMP & Shaper ADC FPGA AMP & Shaper TDC AMP & Shaper TDC AMP & Shaper TDC AMP & Shaper TDC VREF R1 R1 C R2 Wu Jinyuan, Fermilab jywu168@fnal.gov

  29. ADC Test: Waveform Digitization on BD3_19 FPGA TDC TDC Possible Student Lab VREF 50 50 Input Waveform, Overlap Trigger & Reference Voltage 1000pF 100 Raw Data Converted Wu Jinyuan, Fermilab jywu168@fnal.gov

  30. Outline • Electronic Aspect of FPGA: • LED Flashing • Logic Elements in a Nutshell • TDC and ADC • FPGA as a Computing Fabric: • Moore’s Law Forever? • Space Charge Computing with FPGA Cores • Doublet Matching & Hash Sorter • Triplet Matching & Tiny Triplet Finder • Enclosed Loop Micro-Sequencer (ELMS) Wu Jinyuan, Fermilab jywu168@fnal.gov

  31. Moore’s Law • Number of transistors in a package: x2 /18months Taken from www.intel.com Wu Jinyuan, Fermilab jywu168@fnal.gov

  32. Status of Moore’s Law: an Inconvenient Truth • # of transistors • Yes, via multi-core. • Clock Speed • ? Taken from www.intel.com Wu Jinyuan, Fermilab jywu168@fnal.gov

  33. The Fever of Moore’s Law vs. Maxwell’s Equations Op/sec WRW MIT, 2002 1998 2000 2002 2004 2006 2008 2010 • During the hot days of Moore’s Law, the rules of thumb are: • BRB – Buy Rather than Build • URU – Use Rather than Understand • WRW – Wait Rather than Work • From fundamental principles like Maxwell’s Equations, it is known limits of Moore’s Law exist. The technology advance comes from hard work. Wu Jinyuan, Fermilab jywu168@fnal.gov

  34. The Execution & Non-Execution Cycles From MIT 6.823 Open Course Site • In current micro-processors: • Each instruction takes one clock cycle to execute. • It takes many clock cycles to prepare for executing an instruction. • Pipelined? Yes. But the non-execution pipeline stages consume silicon area, power etc. • To execute an instruction != to do useful calculation. • Can we do something different? Wu Jinyuan, Fermilab jywu168@fnal.gov

  35. Outline • Electronic Aspect of FPGA: • LED Flashing • Logic Elements in a Nutshell • TDC and ADC • FPGA as a Computing Fabric: • Moore’s Law Forever? • Space Charge Computing with FPGA Cores • Doublet Matching & Hash Sorter • Triplet Matching & Tiny Triplet Finder • Enclosed Loop Micro-Sequencer (ELMS) Wu Jinyuan, Fermilab jywu168@fnal.gov

  36. The Space Charge Computing • Each electron sees sum of Coulomb forces from other N-1 electrons. • The total number of calculations is about N2 and each calculation of the Coulomb force requires a square root, a division and several multiplications. • Regular sequential computers are not fast enough. Wu Jinyuan, Fermilab jywu168@fnal.gov

  37. The FPGA Board • Up to 16 FPGA devices ($32 ea) can be installed onto each board. • Each FPGA host one core. Wu Jinyuan, Fermilab jywu168@fnal.gov

  38. xj yj zj - X xi - X yi - X zi vyj vzj vxj x2 LUT 10b in 16b out + S S S + + + + x2 + + 32-bit Forces 16-bit Velocities 16-bit Coordinates x2 The 16-bit Demo Core Wu Jinyuan, Fermilab jywu168@fnal.gov

  39. x2 x2 + x2 The Lookup Table LUT 10b in 16b out Wu Jinyuan, Fermilab jywu168@fnal.gov

  40. Two Electrons with Natural Scales 256 nm 28ps Wu Jinyuan, Fermilab jywu168@fnal.gov

  41. 256 Charged Particles, Iteration 0 Wu Jinyuan, Fermilab jywu168@fnal.gov

  42. 256 Charged Particles, Iteration 5 Wu Jinyuan, Fermilab jywu168@fnal.gov

  43. 256 Charged Particles, Iteration 10 Wu Jinyuan, Fermilab jywu168@fnal.gov

  44. 256 Charged Particles, Iteration 15 Wu Jinyuan, Fermilab jywu168@fnal.gov

  45. 256 Charged Particles, Iteration 20 Wu Jinyuan, Fermilab jywu168@fnal.gov

  46. 256 Charged Particles, Iteration 25 Wu Jinyuan, Fermilab jywu168@fnal.gov

  47. 256 Charged Particles, Iteration 30 Wu Jinyuan, Fermilab jywu168@fnal.gov

  48. 256 Charged Particles, Iteration 35 Wu Jinyuan, Fermilab jywu168@fnal.gov

  49. 256 Charged Particles, Iteration 40 Wu Jinyuan, Fermilab jywu168@fnal.gov

  50. Speed Comparison with Regular CPU • The FPGA core is x10 faster than a typical 2.2 GHz CPU core. • The FPGA core runs at 200 MHz or 200 M Coulomb force calculations/s. • It seems the CPU core needs 80-100 clock cycles for each Coulomb force calculation. Wu Jinyuan, Fermilab jywu168@fnal.gov

More Related