280 likes | 404 Vues
This report outlines the progress in exploring and optimizing FPGA-based infrastructures, focusing on high-performance and high-throughput computing platforms. It details the architecture of the IBOB, ROACH, and BEE2 platforms, highlighting their key features such as high memory bandwidth and advanced communication interfaces. Testing requirements and results are documented, emphasizing asynchronous clock domains and limitations encountered during high-frequency operations. Recommendations for future designs are provided, along with an overview of the design environment and tools used for FPGA programming and testing.
E N D
Progress Report FPGA-based Infrastructure Henry Chen henryic@ee.ucla.edu June 11, 2010
Motivation • Architectural & algorithmic exploration/optimization • High-performance/high-throughput computation • Closed-loop testenvironment [1,2]
Platform Architecture [3] • Large design effort; amortize widely • As general-purpose as possible • Large memories • High I/O bandwidth • Use embedded CPU to provide high-level interface to FPGA resources
IBOB • IBOB (Interconnect Break-Out Board) • 1x Virtex-II Pro (FPGA + PowerPC405) • 2x 18Mb (36-bit) SRAMs (~250MHz) • 2x CX4 10Gb high-speed-serial • 2x Z-DOK+ high-speed differential GPIO (80 diff pairs) • 80x LCMOS/LVTTL GPIO • RS232 UART to PPC; major I/O bottleneck • read_xps/write_xps • Our primary test platform; have 2 in-house
ROACH • ROACH (Reconfigurable Open Architecture Compute Hardware) • 1x Virtex 5 FPGA • External PPC440 • 1x DDR2 DIMM • 2x 72Mbit (18-bit) QDR SRAMs (~350MHz) • 4x CX4 • 2x Z-DOK+ (80 diff pairs) • External PPC provides much faster interface to FPGA resources (1GbE) • None in-house (for now)
BEE2 • BEE2 (Berkeley Emulation Engine) • 5x Virtex-II Pro • 20x DDR2 DRAM DIMMs (200MHz) • 18x CX4 ports • High-End Reconfigurable Computer • High I/O bandwidth per FPGA • High memory bandwidth per FPGA • High memory capacity per FPGA • Have one in-house
BORPH [4] • Linux kernel modification for hardware abstraction; run on embedded CPU connected to FPGA • “Hardware process” • Programming an FPGA running Linux executable • Some FPGA resources accessible in Linux process memory space • Makes FPGA board look just like Linux workstation • Used on BEE2, ROACH; limited version on IBOB w/ expansion board
Design Environment • Simulink • Schematic-like • Integration w/ Matlab for analysis • Good for dataflow designs (ie., DSP) • Designed by BWRC, now maintained by international collaboration • Tutorials aplenty! See wiki
Design Environment • Xilinx System Generator for Simulink • Custom DSP and system blocksets • One-click design compilation
Testing w/ ROACH + KATCP • Digital frontend receiver (Rashmi)
1GbE PowerPC Matlab FPGA LVDS IO ASIC Test Board QDR SRAM ASIC BRAM
Testing Requirements • High TX clock rate (400MHz target) • Beyond practical limits of IBOB’s V2P • Long test vectors (~4Mb) • Asynchronous clock domains for TX and RX
Asynchronous Clock Domains • Easily supported by FPGA hardware • XSG has very limited capability for expressing multiple clocks; CE toggling • Further restricted by bee_xps tool automation; assumes single clock design (though many different clocks available)
Asynchronous Clock Domains • Manually merged separate designs for test vector and readback datapaths Fixed 60MHz RX 255-315 MHz TX
Results • Test up to 315MHz w/ loadable vectors in QDR;up to 340MHz with pre-compiled vectors in ROMs • 55dB SNR @ 20MHz bandwidth
Limitations • DDR output FF critical path @ 340MHz (clock out) • QDR SRAM bus interface critical path @ 315MHz • Output clock jitter? • LVDS receivers usually only 400500Mbps • OK for data, not good for faster clocks • Get LVDS I/O cells?
Future Design Recommendations • Send source-synchronous clock with returned data • Send synchronization information with returned data • “Vector warning” or frame start • Data valid
KATCP • Comm. protocol interfacing to BORPH • Can be implemented over TCP telnet connection • Libraries and clients for C, Python
KATCP Matlab Client • For our purposes, replaces read_xps, write_xps • Can program FPGA from directly from Matlabno more JTAG cable! • Provides byte-level read/write granularity • Increases speed from ~KB/s to ~MB/s • Room for improvement; currently high protocol overhead
Towards Streaming • Transition to TCP/IP-based protocols facilitates streaming • Osort test vectors 10Mb of data at ~Mb/s (IBOB) • Single-vector load and read via SRAM • LWIP UDP read/write_xps • Ethernet streaming w/o going through shared memory
New Windows Server(s) • dsp experiencing severe stability problems • eecls-{1, 2, 3, 4}.ee.ucla.edu • Windows Server 2008 (32-bit) • Matlab R2007b (+ XSG 10.1) • Matlab R2009b (+ XSG 11.5, Synphony 2009.12) • Xilinx Suite 10.1 • Xilinx Suite 11.5 • ModelSim 6.6a • Synplify 2010.03 • sherwin is now a print server
References [1] Marković, D., et al., “ASIC Design and Verification in an FPGA Environment,” IEEE CICC, 2007 [2] Dejan Marković, UCLA EEM216A Fall 2008 Lecture 20 [3] Chang, C., et al., “BEE2: A High-End Reconfigurable Computing System”, IEEE Design & Test of Computers, 2005 [4] H. So, R. Brodersen, “A Unified Hardware/Software Runtime Environment for FPGA-Based Reconfigurable Computers using BORPH,” ACM TECS, 2008.