230 likes | 329 Vues
Explore Soft Connections for Flexible Communication Hierarchy Control in FPGA Hardware-Software Integration, Challenges, Solutions, and Case Studies, including improved Logical vs Physical Topologies.
E N D
On Controllers, Soft Connections, and Logical Topologies Michael Pellauer MIT CSAIL Angshuman Parashar, Michael Adler, Joel Emer Intel VSSAD
The Setup • (For both our HAsim simulator and the talk) • Virtex5 110t on HiTechGlobal PCIe accelerator • Future: FSB-based accelerators. Larrabee? • Use HAsim’s Remote-Request-Response (RRR) • Protocol of communication between SW/HW • Allows calls from one to the other run program emulate instr FPGA Host Processor translate address dump stats PCIe
The Problem of the Day • Just because you can talk doesn’t mean you have anything interesting to say! • We must control higher-level interactions between software and hardware • Example: “Dump Stats” command • Transmit requests intra-FPGA, aggregate responses • Future: think about multiple-FPGA setup Cache dump stats PCIe Interface RRR Controller … FPGA Branch Pred
The HAsim Controller • Software sees it as… • Hardware sees it as… run, pause, … Controller Host Software setParam RRR dump stats Different modules access different services run, pause, … setParam Controller dump stats Which modules use which service is very fluid enable events debug assertion fail
Problem: HDLs’ Inflexible Interfaces Core Controller Front End RRR Fetch PCIe Simulator HW Module Instantiation • Branch Predictor has a bug • Want to send some debug info to the Controller • Fundamental Problem: HDLs allow communication only up and down hierarchy • Verilog OOMRs are not an acceptable solution • Gets worse if we have alternative modules Branch Pred
Our Solution: Soft Connections • Goal: “soften” rigid communication hierarchy • Users separately instantiate named endpoints • Can read and write as if they were half of a guarded FIFO (FI and FO) • Instantiator’s interface does not change • Bluespec standard ModuleCollect library send() recv() mkSend mkRecv “fet2dec” “fet2dec” Added During Bluespec Static Elaboration Compiler Phase
Review: Static Elaboration Phase Hardware Toolflow: Software Toolflow: source source Elaborate w/params Compile design1 design2 design3 .exe run w/ params run w/ params run1 run1 run2 run3 run1 run1.1 run1 run1 run2.1 run1 run1 run3.1 run1 run1 … … … … • Inline function calls and datatypes as combinational logic • Instantiate modules with specific parameters • Resolve polymorphism/overloading
Elaboration-Time Algorithm let (sends, recvs) = getCollection() // Get from ModuleCollect foreachsinsendsdo letrs = matchByName(s.name, recvs) ifrs == {} andnots.optionalthen error(“Unmatched Send:” + s.name) elseifrs == {r} then connect(s, r) // instantiate buffering else error(“Multiple Receives connected to:” + s.name) recvs = recvs – rs // remove matched recvs foreachrinrecvsdo error(“Unmatched Receive:” + r.name) Open Question: Can we do this in SystemVerilog as well?
“Multicast” Connections Standard receive modules ID + data send() mkSend “debug_out” send() mkSend Standard send modules listen() “debug_out” mkListener “debug_out” send() mkSend “debug_out” • A one-to-many Send (broadcast) • A many-to-one Recv (listener) recv() mkRecv (now multiple recvs are no longer an error) “start_prog” recv() mkRecv “start_prog” broadcast() mkBcast recv() mkRecv “start_prog” “start_prog”
Building 2-Way Communication Pair of normal send and recv getReq() mk Server makeResp() “stats_count” broadcastReq() mk Client makeReq() mk Client getResp() mk Server getReq() getResp() getReq() “stats_count” “mem_load” mk Server makeResp() “stats_count” makeResp() ID + data makeReq() mk Client “mem_load” getResp() “mem_load” Standard Server modules ID + data Standard Client modules • More complex abstractions from primitives • Client/Server • “Multicast” Client/Server makeReq() getReq() mkClient mkServer getResp() makeResp() “mem_load” “mem_load”
Controller Services: Revisited • Which should get which type of soft connection? • Commands/Params: • Receive from software, send to many modules • One-to-Many Broadcast • Can make a nice abstraction for local commands, params • Events/Stats: • Receive from software, send to many modules, aggregate responses • Many-to-one Client • Assertions/Debug: • Receive from many modules, send to software • Many-to-one Receive
Case Study: span • span(c) = number of instantiation boundaries crossed between sender and receiver • Roughly, the pain of changing a communication path • In HAsim, 118/217 connections are to/from Controller • We start to worry about the massive fan-in
Logical Topology vs Physical Topology station station station station station station • We described the “logical” communication topology • Could be implemented with different physical topology • Could use Rings/Trees/Grids to offset massive fan-in • Implemented: Rings and Trees • So far no improvement over physical point-to-point this station doesn’t have #5 Station routing tables made at elaboration station has an address for “foo” #5 “foo” send station has to know #5 means “foo” send recv Connection interface does not change! “foo” recv recv
Take Aways • FPGA-as-accelerator model is rapidly maturing • The FPGA-as-raw-fabric model is not ideal • Something like HAsim’s Controller helps • Coordinates interaction between FPGA/SW • Need different Hardware-design techniques for FPGA accelerators • More flexibility needed: reconfigurations common • Soft Connections bring flexibility to interfaces • Make it easier to have a fluid set of modules which interact with the controller • Logical topology != Physical topology • Designer needs help with both
Thank You! pellauer@csail.mit.edu
The Controller’s Services • Commands: • Receive “start” or “pause” from software • Controller distributes to all interested hardware modules • Params: • Receive dynamic command line values • Controller distributes to interested hardware modules • Events: • Software can enable, disable • Controller aggregates, sends to software • Stats: • Software requests dump periodically • Controller passes on request, aggregates responses • Assertions: • Controller passes failures on to software • Debug: • Controller passes info on to software
Making “Gateware” more like Software • Ultimately we want many distributed “services” throughout the FPGA talking to software • They communicate at different rates • It makes sense for the variable/rare services to share the same interconnect on the FPGA • Flexibility of communication == Easier development • Today: Development plan and issues
Review: Soft Connections Point-to-Point “Smart” Synthesis Boundaries Client/Server makeReq() getReq() mkClient mkServer getResp() makeResp() “funcp_fet” “funcp_fet” A try_xfer() xfer_ack() B mkB send() send “fet2dec” send() recv() mkRecv mkSend “fet2dec” “fet2dec” try_xfer() xfer_ack() mkB addDanglingSend(mkB.outg[3], “fet2dec”, “Inst”); outg outg outg outg outg … Compiler Log: “Dangling Send fet2dec [3] {Inst}”
Proposed Primitive: One-To-Many Standard receive modules • A “Broadcast” Send • when (r[0] == 0): • try_xfer(q.first()) • if (ack) r[0] <= 1 • rule when (all r == 1): • all r <= 0 • q.deq() recv() mkRecv “start_prog” • when (r[1] == 0): • try_xfer(q.first()) • if (ack) r[1] <= 1 recv() mkRecv “start_prog” broadcast() mkBcast • when (r[2] == 0): • try_xfer(q.first()) • if (ack) r[2] <= 1 “start_prog” recv() mkRecv “start_prog” • when (r[3] == 0): • try_xfer(q.first()) • if (ack) r[3] <= 1 recv() mkRecv “start_prog” All rules and registers inserted during static elaboration (don’t know how many receivers during instantiation) • Tougher alternative: many FIFOs
Proposed Primitive: Many-to-One Standard send modules ID + data • A “listener” receive send() • rule when (q0.notEmpty): • try_xfer(q0.first(), 0) • if (ack) q0.deq() mkSend All rules inserted during static elaboration (don’t know IDs during instantiation) “debug_out” • rule when (q1.notEmpty): • try_xfer(q1.first(), 1) • if (ack) q1.deq() send() mkSend “debug_out” listen() mkListener • rule when (q2.notEmpty): • try_xfer(q2.first(), 2) • if (ack) q2.deq() “debug_out” send() mkSend “debug_out” • rule when (q3.notEmpty): • try_xfer(q3.first(), 3) • if (ack) q3.deq() send() mkSend “debug_out” • Is a fairness guarantee needed?
Proposed Primitive: Hub Servers ID + data Standard Client modules • Hub Server, Distributed Clients • 1 Many-to-One Connection • Reverse is many One-to-One connections • Remove the ID and send it to the appropriate destination makeReq() mkClient getResp() getReq() mkHub Server “mem_load” makeResp() makeReq() mkClient “mem_load” getResp() “mem_load”
Proposed Primitive: Hub Client ID + data Standard Server modules • Hub Client, Distributed Servers • 1 One-to-Many Connection • 1 Many-to-One Connection getReq() mkServer makeResp() “stats_count” broadcastReq() mkHub Client getReq() getResp() mkServer “stats_count” makeResp() “stats_count” Ability to send to individuals as well?