Reconfigurable Computing: HPC Network Aspects - PowerPoint PPT Presentation

reconfigurable computing hpc network aspects n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Reconfigurable Computing: HPC Network Aspects PowerPoint Presentation
Download Presentation
Reconfigurable Computing: HPC Network Aspects

play fullscreen
1 / 19
Reconfigurable Computing: HPC Network Aspects
92 Views
Download Presentation
zarola
Download Presentation

Reconfigurable Computing: HPC Network Aspects

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Reconfigurable Computing: HPC Network Aspects Craig Ulmer (8963) cdulmer@sandia.gov Mitch Sukalski (8961) David Thompson (8963) Pete Dean R&D Seminar December 11, 2003

  2. FPGAs are promising… But what’s the catch? There are three main challenges that need to be addressed in order to apply to practical, scientific computing.

  3. RC Challenge #1: Floating Point • Most FPGAs fine grained • Floating point units are large • 32b FP occupies ~1,000 CLBs • Commercial capacity improving • 2000: 6,000 CLBs • 2003: 40,000 CLBs (Max: 220,000) • Keith Underwood at Sandia/NM • LDRD: Working on high-speed 64b floating-point cores 32b FP in Xilinx V2P7

  4. RC Challenge #2: Design Tools • Hardware design is non-trivial • Micromanage computations, clock-by-clock • Not appropriate for most scientists • Need languages, APIs that are easy to use • Maya Gokhale at LANL • Streams-C: C-like language for HW design • Pipeline/unroll loops • Schedules access to external memory

  5. RC Challenge #3: High-speed I/O • FPGAs have large internal computational power • How do we get data into/out of FPGA? • How do we connect to our existing HPC machines? • Mitch Sukalski, David Thompson, Craig Ulmer • LDRD: Connect FPGAs to high-performance SANs ? FPGA FPGA

  6. Outline • Where we have been Networking FPGAs using external NI cards • Where we are going Networking FPGAs using internal transceivers • Project status Early details

  7. Previous Work Where we’ve been..

  8. CPU FPGA PCI Bus NIC Networking Earlier FPGAs • Previous generation of FPGAs were like blank ASICs • Configurable logic and pins • Attach a network card to an FPGA card • Communication over PCI • Examples: • Virginia Tech: Myrinet • Washington U. in St. Louis: ATM (inline) • Clemson University: Gigabit Ethernet • Georgia Tech: Myrinet

  9. SRAM 0 SRAM 1 SRAM 2 SRAM 3 CPU CPU CPU CPU CPU CPU Control & Switching GRIM FPGA PCI FPGA FPGA FPGA RAID Ethernet GRIM Project at Georgia Tech • Add multimedia devices to cluster • Message layer connects CPUs, memory, and peripherals • Myrinet between hosts,PCI within hosts • Celoxica RC-1000 FPGA • Virtex FPGA (1M logic gates) • Four SRAM banks • PCI w/ PMC

  10. Incoming Message Queues Outgoing Message Queues Application Data Communication Library API Memory API User Circuit API User Circuit 1 User Circuit n FPGA Organization FPGA Card Memory Frame Circuit Canvas FPGA

  11. Page C Page Fault Function Fault Circuit E Circuit F Circuit G Message: Use Circuit F on $C0000000 Circuit X Circuit Y Lessons Learned Host CPU • Frame provides simple OS • Isolates users from board • Portability • Dynamically manage resources • Card memory • Computational circuits • PCI bottleneck • Distance between NI and FPGA • PCI difficult to work with Page A Page C SRAM 1 Page B FPGA SRAM 2 NIC

  12. Network Features of Recent FPGAs Where we’re going…

  13. CPU User-defined Computational Circuits NI Tx NIC Rx CPU System Area Network NI Tx NIC Rx CPU FPGA NIC FPGA Network Improvements • Recent FPGAs have special, built-in cores • High-speed transceivers, dedicated processors • Idea: Build our NI inside the FPGA • FPGA becomes a networked, compute resource • Removes the PCI bottleneck

  14. Up to 4 PowerPC405 cores Embedded version of PPC 300-400MHz Multiple gigabit transceivers Run at 600Mbps to 3.125Gbps Up to twenty-four transceivers Additional cores Distributed internal memory Arrays of 18b multipliers Digital clock multipliers, PLLs Xilinx Virtex-II/Pro FPGA Xilinx V2P20

  15. FPGA Fabric Rocket I/O PIN PIN Rocket I/O PIN PIN Rocket I/O PIN PIN FPGA Fabric - PIN CRC 8B/10B Encoder Tx FIFO Serializer PIN + Clock Recover CRC check - PIN Rx Elastic Buffer 8B/10B Decoder Deserializer PIN + Multi-Gigabit Transceivers: Rocket I/O • Flexible, high-speed transceivers • Can be configured to connect with different physical layers • InfiniBand, GigE, FC, 10GigE, Aurora • Note: low-level interface (commas, disparity, clock mismatches)

  16. Why MGTs are Important • Direct connection to networks • Same chip, different network • Remove PCI from equation • Fast connections between FPGAs • Reduces analog design issues • Chain FPGAs together • Reduce pin count • Update: Virtex II/ProX • Now 2.488 Gbps – 10.3125 Gbps • Chips have either 8 or 20 transceivers 3.125 Gbps over 44” FR4 * * From Xilinx, http://www.xilinx.com/products/virtex2pro/mgtcharacter.htm

  17. Hard PowerPC Core • PowerPC 405 • 16KB Instruction / 16KB Data caches • Real and Virtual memory modes • GCC is available • Multiple memory ports for core • On-chip memory (OCM) • Processor Local Bus (PLB) • User-defined memory map • Connect memory blocks or cores • External memory cores available PowerPC I-Cache D-Cache Processor Local Bus (PLB) On-Chip Memory (OCM) Interface

  18. Commercial SoC Designing with cores Customize system New tools Rapidly connect cores Library of cores & buses Saves on wiring legwork System on a Chip (SoC) Xilinx Platform Studio

  19. Current Status • Exploring V2P • New architecture, new tools • Two reference boards • ML300 (V2P7-6) • Avnet (V2P20-6) • Transceiver work • Raw transmission over fiber • Working towards IB http://cdulmer.ran.sandia.gov