300 likes | 422 Vues
RAMP Infrastructure. Andrew Putnam University of Washington RAMP Retreat January 17, 2008. Complaint. I can’t get RAMP [color] to work because Greg doesn’t have [feature] in RDLC. Fundamental Problem. RDLC will someday support: Platform independence Tool independence
E N D
RAMP Infrastructure Andrew Putnam University of Washington RAMP Retreat January 17, 2008
Complaint • I can’t get RAMP [color] to work because Greg doesn’t have [feature] in RDLC.
Fundamental Problem • RDLC will someday support: • Platform independence • Tool independence • Integrated debugging • HW / SW integration • But what do we do now? • Use existing tools? (EDK, Synplicity, BlueSpec…) • How do we make sure it’s compatible with RDL?
Problem Statement • RDLC3 is a tool in development • We need a way to make progress when RDLC is missing features • Solution: provide a clear, simple coding specification for hand-coded modules (RDF) • Ensure that these are compatible with RDL
Terminology – Inside the Riddle • RDF: RAMP Design Framework • RDL: RAMP Design Language • RDLC: RAMP Design Language Compiler RDLC RDL RDF
RDF Model (Detailed) See Greg’s Breakout Presentation for more details
FSL Unit/Channel FSL Link __WRITE __READ FSL Control __VALID __VALID FSL Control __READY __READY __STALL __STALL Data Control Write Full Data Control Read Exists FSL Output FSL Input Implementation varies based on Latency, Bitwidth, FIFO Depth
RAMP Purple Andrew Putnam University of Washington RAMP Retreat January 17, 2008
Simple, Correct >> Fast, Buggy • Eric Chung (CMU) • Jan 08 RAMP Retreat
Basic Idea • Use the RAMP library of parts when you can • Use MicroBlaze to prototype parts you don’t have • Refine the MicroBlaze to HDL • Add your HDL to the RAMP library
Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc
Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc
Proc Proc Proc Proc Proc Proc Proc Proc D$ D$ D$ D$ I$ I$ I$ Proc Proc Proc Proc Proc Proc Proc I$ Proc Proc Proc Proc Proc Proc Proc Proc Proc D$ D$ D$ D$ I$ I$ I$ Proc Proc Proc Proc Proc Proc Proc I$ Proc Proc Proc Proc Proc Proc Proc Proc Proc D$ D$ D$ D$ I$ I$ I$ I$ Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc Proc D$ D$ D$ D$ I$ I$ I$ Proc Proc Proc Proc Proc Proc Proc I$ Proc
Proc Proc Proc Proc Proc Proc Proc Proc D$ D$ D$ D$ I$ I$ I$ Proc Proc Proc Proc Proc Proc Proc I$ Proc L3 L3 Proc Proc Proc Proc Proc Proc Proc Proc D$ D$ D$ D$ I$ I$ I$ Proc Proc Proc Proc Proc Proc Proc I$ Proc Proc Proc Proc Proc Proc Proc Proc Proc D$ D$ D$ D$ I$ I$ I$ I$ Proc Proc Proc Proc Proc Proc Proc Proc L3 L3 Proc Proc Proc Proc Proc Proc Proc Proc D$ D$ D$ D$ I$ I$ I$ Proc Proc Proc Proc Proc Proc Proc I$ Proc
RAMP Purple Compute Node Cycle Control FSL Input Control Processor (MicroBlaze) Clk Clk FSL Output Control
RAMP Purple Compute Node Cycle Control • Reuse commercial IP if it can be clock gated • Can’t have internal clock management • Can’t have negative-edge sensitivity • ~8-16 BUFGMUXs / FPGA, which limits cores / chip FSL Input Control Processor (MicroBlaze) Clk Clk FSL Output Control
RAMP Purple Cache Cycle Control Start Done FSL Input Control Processor (MicroBlaze) Clk Clk FSL Output Control DRAM Port (non-RDF)
RAMP Purple Cache Cycle Control • MicroBlaze emulation of generic unit • Monitor special memory locations for Start, Done • Sure it’s slow • 250 MHz on Virtex-5 Start Done FSL Input Control Processor (MicroBlaze) Clk Clk FSL Output Control DRAM Port (non-RDF)
FSL Unit/Channel FSL Link __WRITE __READ FSL Control __VALID __VALID FSL Control __READY __READY __STALL __STALL Data Control Write Full Data Control Read Exists FSL Output FSL Input Implementation varies based on Latency, Bitwidth, FIFO Depth
Software • Control Core -- Linux • Handles syscalls • Dispatches jobs to compute cores • Could be arranged hierarchically • Compute Cores -- Xilinx MicroKernel • XMK provides libc, p-thread, scheduling, semaphores • Cores wait on task queue in main memory • Floating point unit is optional, can be shared
RAMP Purple WaveScalar Processor Processor Processor Processor Coherent Data Cache Instruction Cache Processor Processor Processor Processor
RAMP Purple WaveScalar PE PE PE PE Coherent Data Cache Instruction Cache PE PE PE PE
RAMP Purple WaveScalar PE PE PE PE Coherent Data Cache I$ I$ I$ I$ PE PE I$ I$ I$ I$ PE PE
RAMP Purple WaveScalar PE I$ PE I$ PE I$ PE I$ WaveOrdered Coherent Data Cache PE I$ PE I$ PE I$ PE I$
RAMP Purple WaveScalar I$ I$ PE PE PE PE I$ I$ I$ I$ PE PE PE PE I$ I$ I$ PE PE I$ I$ PE PE I$ I$ PE PE I$ I$ PE PE I$ WaveOrdered Coherent Data Cache I$ PE PE I$ I$ PE PE I$ I$ PE PE I$ I$ PE PE I$ I$ I$ PE PE PE PE I$ I$ I$ I$ PE PE PE PE I$ I$
WaveScalar Cluster I$ I$ PE PE PE PE I$ I$ I$ I$ PE PE PE PE I$ I$ I$ PE PE I$ I$ PE PE I$ I$ PE PE I$ I$ PE PE I$ WaveOrdered Coherent Data Cache I$ PE PE I$ I$ PE PE I$ I$ PE PE I$ I$ PE PE I$ I$ I$ PE PE PE PE I$ I$ I$ I$ PE PE PE PE I$ I$
Destiny: You were meant for me. Perhaps as a punishment.