1 / 24

HAsim On-Chip Network Model Configuration

HAsim On-Chip Network Model Configuration. Michael Adler. IMEM. FET. The Front End Multiplexed. Legend: Ready to simulate?. 1. redirect. No. CPU 1. CPU 2. (from Back End). training. 1. Line Pred. (from Back End). Branch Pred. 1. 2. fault. vaddr. pred. 1. mispred. 0. 1.

dudley
Télécharger la présentation

HAsim On-Chip Network Model Configuration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HAsim On-Chip Network ModelConfiguration • Michael Adler

  2. IMEM FET The Front End Multiplexed Legend: Ready to simulate? 1 redirect No CPU 1 CPU 2 (from Back End) training 1 Line Pred (from Back End) Branch Pred 1 2 fault vaddr pred 1 mispred 0 1 inst or fault 0 first FET ITLB IMEM PC Resolve Inst Q 0 1 1 1 0 vaddr paddr enq or drop 0 deq paddr 0 rspImm 0 I$ 1 rspDel 1 slot

  3. On-Chip Networks in a Time-Multiplexed World

  4. Problem: On-Chip Network CPU L1/L2 $ msg credit [0 1 2] [0 1 2] CPU 1 L1/L2 $ CPU 0 L1/L2 $ CPU 2 L1/L2 $ r r r r Memory Control • Problem: routing wires to/from each router • Similar to the “global controller” scheme • Also utilization is low msg msg r credit credit router

  5. Multiplexing On-Chip Network Routers Router 0 Router 1 Router 0..3 reorder σ(x) = (x + 1) mod4 Router 3 Router 2 reorder σ(x) = (x + 2) mod4 σ(x) = (x + 3) mod4 reorder 1 2 3 1 2 3 Simulate the network without a network 2 3 0 2 3 0 3 0 1 3 0 1 0 1 2 0 1 2

  6. On-Chip Network Model Multiplexed Topology

  7. HAsim’s Network Model is Abstract • In a software model the target network can be built at run-time • Dynamism is expensive in FPGAs and recompilation is slow • Solution: Constrained dynamism • Fixed parameters: Max nodes, max edges per node, max VCs • Dynamic: • Number of active contexts (nodes) • Endpoints of each edge (indirection table) • Routing table • Address mapping of distributed LLC

  8. Topology Manager • Software – runs once at startup so no need to optimize • HASIM_CHIP_TOPOLOGY_CLASS: • Manages streaming of parameters to the FPGA • Iterates over all software topology mapping classes until convergence • Namespace defined by dictionaries • .dic files are preprocessed by LEAP tools • Hierarchy of enumerated types

  9. How do I… • Map address ranges to LLC segments? • Map target cores to nodes? • Pick a number of memory controllers and map them to nodes? • Define a target machine network topology? • Manage interleaving for multiplexing the network and cores?

  10. Map Address Ranges to LLC Segments (SW) • Build a table of n_llc_map_entries, where each entry is an index to a portion of the distributed LLC. • icn-mesh.cpp: for (intaddr_idx = 0; addr_idx < n_llc_map_entries; addr_idx++){boolis_last = (addr_idx + 1 == n_llc_map_entries);topology->SendParam(TOPOLOGY_NET_LLC_ADDR_MAP,&cores_net_pos[addr_idx % num_cores],sizeof(TOPOLOGY_VALUE),is_last);}

  11. Map Address Ranges to LLC Segments (FPGA) Consume the table that was streamed in from SW • last-level-cache-no-coherence.bsv: // Define a node that will stream in the topology. This builds a node// on a ring. The node looks for messages tagged TOPOLOGY_NET_MEM_CTRL_MAP// and emits associated payloads.let ctrlAddrMapInit <- mkTopologyParamStream(`TOPOLOGY_NET_MEM_CTRL_MAP);// Allocate a local memory and initialize it with the streamed-in entries.LUTRAM#(Bit#(TLog#(TMul#(8, MAX_NUM_MEM_CTRLS))),STATION_ID) memCtrlDstForAddr<-mkLUTRAMWithGet(ctrlAddrMapInit);// Map an address to a node ID using the tablefunction STATION_ID getMemCtrlDstForAddr(LINE_ADDRESS addr); // Use the low bits of the address as the index (resize does this).return memCtrlDstForAddr.sub(resize(addr));endfunction

  12. Map Address Ranges to LLC Segments (LLC Hub) • rule . . .// Incoming request from coreif (m_reqFromCore matches tagged Valid .req) begin// Which instance of the distributed cache is responsible?let dst = getLLCDstForAddr(req.physicalAddress);if (dst == local_station_id) begin// Local cache handles the address.if (can_enq_reqToLocalLLC &&& ! isValid(m_new_reqToLocalLLC)) begin// Port to LLC is available. Send the local request.did_deq_reqFromCore = True;m_new_reqToLocalLLC = tagged Valid LLC_MEMORY_REQ { src: tagged Invalid,mreq: req};debugLog.record(cpu_iid, $format("1: Core REQ to local LLC, ") + fshow(req)); endendelse if (can_enq_reqToRemoteLLC && ! isValid(m_new_reqToRemoteLLC)) begin// Remote cache instance handles the address and the OCN requestport is available. //// These requests share the OCN request port since only one type of request goes to // a given remote station. Memory stations get memory requests above. LLC stations get// core requests here.did_deq_reqFromCore = True;m_new_reqToRemoteLLC = tagged Valid tuple2(dst, req);debugLog.record(cpu_iid, $format("1: Core REQ to LLC %0d, ", dst) + fshow(req)); endend . . .endrule

  13. Map Cores and Memory Controllers to Nodes • All computed (currently) in icn-mesh.cpp • Given number of target cores and number of memory controllers: • Builds a rectangle of cores as close to square as possible • Adds a row of memory controllers at the top and bottom • Topology streamed to FPGA using same mechanism as address mapping • E.g., 15 cores and 3 memory controllers: • x M M xC CCCCCCCCCCCCCC xx M x x

  14. Network Topology: Map Cores/Memory Controllers to Nodes • Multiplexed order of nodes is the same as order of cores • No permutations required for local port • Nodes are connected to: • Core • Memory controller • Nothing • The node doesn’t care what is connected! • Hide indirection in ports

  15. Network Topology: Map Cores/Memory Controllers to Nodes • In icn-mesh.bsv: • //// Local ports are a dynamic combination of CPUs, memory controllers, and// NULL connections. //// localPortMap indicates, for each multiplexed port instance ID, the type// of local port attached (CPU, memory controller, NULL). //let localPortInit <- mkTopologyParamStream(`TOPOLOGY_NET_LOCAL_PORT_TYPE_MAP);LUTRAM#(Bit#(TLog#(TAdd#(TAdd#(MAX_NUM_CPUS, 1),NUM_STATIONS))),Bit#(2)) localPortMap <- mkLUTRAMWithGet(localPortInit);PORT_SEND_MULTIPLEXED#(MAX_NUM_CPUS, OCN_MSG) enqToCores<-mkPortSend_Multiplexed("Core_OCN_Connection_InQ_enq");PORT_SEND_MULTIPLEXED#(MAX_NUM_MEM_CTRLS, OCN_MSG) enqToMemCtrl<-mkPortSend_Multiplexed("ocn_to_memctrl_enq");PORT_SEND_MULTIPLEXED#(NUM_STATIONS, OCN_MSG) enqToNull<-mkPortSend_Multiplexed_NULL(); let enqToLocal<- mkPortSend_Multiplexed_Split3(enqToCores, enqToMemCtrl, enqToNull,localPortMap);

  16. Network Topology: Defining Inter-Node Edges • Each network node: N Local W E S

  17. Network Multiplexing • Logically, there are n nodes in the network. • Each has a local port connected either to a core, to memory or to nothing. • Network connection mapping and routing will determine the topology. • Topology manager defines the routing table. • Note: Dateline not yet implemented

  18. Network Topology and Routing • Torus:

  19. Network Topology and Routing • Mesh (connections identical, routing table ignores some edges):

  20. Network Topology and Routing • Bi-directional ring:

  21. Network Topology and Routing • Uni-directional ring:

  22. Final Problem: Multiplexing On-Chip Network Routers Router 0 Router 1 Router 0..3 reorder σ(x) = (x + 1) mod4 Router 3 Router 2 σ(x) = (x + 2) mod4 reorder reorder σ(x) = (x + 3) mod4 1 2 3 1 2 3 2 3 0 2 3 0 3 0 1 3 0 1 0 1 2 0 1 2

  23. Network Topology: Communication Across Multiplexed Nodes • Each node talks to a different multiplexed node instance • Naïve port binding would have each node talk only to itself • A-Ports are already buffered • Bury transformation in A-Ports • Retain simple read next / write next port semantics within models

  24. Network Topology: Communication Across Multiplexed Nodes • icn-mesh.bsv: • // Initialization from topology managerReadOnly#(STATION_IID) meshWidth <- mkTopologyParamReg(`TOPOLOGY_NET_MESH_WIDTH);ReadOnly#(STATION_IID) meshHeight <- mkTopologyParamReg(`TOPOLOGY_NET_MESH_HEIGHT); // Outbound and inbound ports are loopbacks to the same multiplexed module. Ports connect // to logically different nodes but physically to the same simulator object. Vector#(NUM_PORTS, PORT_SEND_MULTIPLEXED#(NUM_STATIONS, MESH_MSG)) enqTo = newVector();Vector#(NUM_PORTS, PORT_RECV_MULTIPLEXED#(NUM_STATIONS, MESH_MSG)) enqFrom = newVector(); // Outbound port is a normal A-Port. It has no buffering.enqTo[portEast] <- mkPortSend_Multiplexed("mesh_interconnect_enq_E"); // Inbound port provides buffering for multiplexing. Instead of forwarding messages FIFO // it must transform the messages so they cross to the correct multiplexed instance when // instances (nodes) are traversed sequentially.enqFrom[portWest] <- mkPortRecv_Multiplexed_ReorderLastToFirstEveryN("mesh_interconnect_enq_E", 1,meshWidth, meshHeight); . . .enqFrom[portEast] <- mkPortRecv_Multiplexed_ReorderFirstToLastEveryN("mesh_interconnect_enq_W", 1,meshWidth, meshHeight);enqFrom[portSouth] <- mkPortRecv_Multiplexed_ReorderFirstNToLastN("mesh_interconnect_enq_N", 1,meshWidth);enqFrom[portNorth] <- mkPortRecv_Multiplexed_ReorderLastNToFirstN("mesh_interconnect_enq_S", 1,meshWidth);

More Related