260 likes | 394 Vues
This paper presents a novel approach for packet buffering in routers, emphasizing deterministic packet departures to optimize performance. By utilizing interleaved DRAM banks and addressing bank conflicts, we propose a high-speed buffer architecture to manage incoming packets at 40Gbps with minimal latency. Our model effectively calculates and schedules packet access based on known departure times, allowing for efficient memory operations while leveraging small SRAM bypass buffers. This method enhances the throughput of router line cards while addressing challenges associated with traditional DRAM accesses.
E N D
Fast Buffer Memory with Deterministic Packet Departures Mayank Kabra, Siddhartha Saha, Bill Lin University of California, San Diego
Packet Buffer in Routers Linecards • Incoming linecards have 40byte@40Gbps = 8ns to read and write a packet. • The routers need to store the packets to deal with congestion. • Bandwidth X RTT = 40Gb/s*250ms = 1Gb buffer. • Too big to store in SRAM, hence need to use DRAM. • Problem: DRAM access time ~40ns. So, there is roughly 10x speed difference. In Router Core: Scheduler and Packet Buffers Out In Out In Out
Parallel and Interleaved DRAM banks • Assume the speed difference is 3x P P P P P P SRAM DRAMs
Problems with Parallelism • The access pattern can create problems. • If we try to access 3, 6, 9 and 11 one after another, it is possible to issue interleaved read requests and read those packets out at Line Speed. 1 2 3 7 6 5 4 8 9 12 11 10 13 14 DRAMs
Problems with Parallelism • But, accessing 2 & 3or 10 & 11 in succession is problematic. • This is an example of a Bank Conflict 1 2 3 7 6 5 4 8 9 12 11 10 13 14 DRAMs
Use The Packet Departure Time • Wide classes of routers (Crossbar Routers) where the packets departures are determined by the scheduler on the fly. • Packet buffers which cater to these routers exist but are complex • There are other high performance routers such as Switch-Memory-Switch, Load Balance Routers for which packet departure time can be calculated when the packet is inserted in the buffer. Solution Idea: We will use the known departure times of the packets to schedule them to different DRAM banks such that there won’t be any conflicts.
Packet Buffer Abstraction • Fixed sized packets, time is slotted (Example: 40Gb/s, 40 byte packet => 8ns). • The buffer may contain arbitrary large number of logical queues, but with deterministic access. • Single-write Single-read time-deterministic packet buffer model.
Packet Buffer Architecture • Interleaved memory architecture with multiple slower DRAM banks. • K slower DRAM banks. • b time slots to complete a single memory read or write operation. • b consecutive time slots is a frame. • A time slot t belongs to frame [t/b]
Packet Buffer Operation 1 2 K-1 K DRAMs ... aggregate de-aggregate b packets … … arriving packets departing packets SRAM Bypass Buffer
Packet Arrival [Frame 1] • Frame 1: • Assume b = 3 • Packets P1, P2 & P3 arrive in time slot 1, 2 and 3 respectively. • They are aggregated before writing to the DRAM. 1 2 P3 P2 P1 3 4 5 DRAMs
Packet Arrival [Frame 2] • Frame 2: • Packets P1, P2 & P3 are being written to the DRAM banks (1, 2 & 3) during Frame 2. • New packets P4, P5, P6 comes, which are stored in the buffer. 1 2 P1 P6 P5 P4 P2 3 P3 4 5 DRAMs
Packet Departure [Frame 19] • Packets P58, P59 & P60 are scheduled to depart at time slots 58, 59 and 60 respectively (frame 20). • They will be read from the DRAM banks one frame slot before their departure frame slot (frame 19) 1 P59 2 3 P60 4 P58 5 DRAMs
Packet Departure [Frame 20] • Packets P58, P59 & P60 are read from the buffer and are output from the switch at time slot 58, 59 and 60 respectively. 1 P59 2 P60 3 P58 4 5 DRAMs
SRAM Bypass Buffer • The operational model dictates that the minimum round trip latency to write and read a packet from one of the DRAM banks is 4 frames. • Thus, a packet with a departure time less than 4b-1 time slots away cannot be stored into DRAM. • A small amount of SRAM (size 4b) is used as a bypass buffer.
Number of DRAM banks • Arrival Write Conflicts: At any current frame f, there can be at most b packets that will be written to the DRAM banks (including the current packet). P P P Hence, for each packet, there will be maximum of b-1 “Arrival Write Conflicts” DRAMs
Number of DRAM banks • Arrival Read Conflicts: At any current frame f, there can be at most b packets that will be read from the DRAM banks. Those b banks will be busy in the current time frame and will be unavailable. P P P Hence, for each packet, there will be maximum of b “Arrival Read Conflicts” DRAMs
Number of DRAM banks • Departure Read Conflicts: Any packet that is written in the current frame f, it will eventually need to be read in a future frame d for departure. At that future frame d, there are b-1 other departing packets. P Hence, for each packet, there will be maximum of b-1 “Departure Read Conflicts” P P DRAMs
How Many DRAM Banks? • Total Conflicts: • Arrival Write: (b-1) • Arrival Read: b • Departure Read: (b-1) • Hence, total (3b-2) conflicts. • If the number of banks is more than (3b-2), we will always have a free bank for all the packets. P DRAMs
DRAM Bank Selection • To find a compatible memory, maintain a two dimensional read-transaction bitmap R. • Each row corresponds to a frame slot. • Each column corresponds to a DRAM bank (hence 3b – 1 columns). • R(f, m) denotes whether mthDRAM bank has an already stored packet that must be read at the fth frame slot.
DRAM Bank Selection • Write-reservation bitmap W of size (3b – 1) • W(m) denotes that in current frame, mthmemory bank has been assigned an arriving packet.
DRAM Bank Selection • Approach: Greedy solution avoiding the three types of conflicts. • To check if a memory bank is compatible for a packet p arriving at timeframe f, and having a departure timeframe d: • Check NOT(W(m) | R(f,m) | R(d, m)) • Instead of checking one memory bank at a time, we can check all of them at once: • V = NOT(W | R(f) | R(d)), where R(f) and R(d) are the row vectors. • From V, get the index of the first compatible memory. • If n is the bank selected for p, then set W(n) = 1 and R(d,n) = 1.
Size of the Bitmap • Size of the packet buffer is T packets i.e., T is the farthest departure time slot relative to the current time slot. • Farthest departure frame: • Each row in the bitmap is (3b – 1) bits, then the size of the bitmap is: • Assuming a RTT of 250ms and a line rate of 40Gb/s, the packet buffer would correspond to a memory requirement of T = 3 x 107 packets, which makes the bitmap size close to 11MB.
Additional Details • Location of a packet in the DRAM: • Once a bank has been selected, need a way to assign the actual memory location to write, and later, read the packet. • Determine the memory location based on the departure frame using a circular indexing to map a frame to a packet location in the memory. • How to reorder/de-aggregate the packets? • Store the timestamp in the DRAM with the packet.
Conclusion • Developed a simple packet buffer architecture when the packet departure times are known e.g., Switch-Memory-Switch and Load-Balanced Routers. • Can support arbitrary large number of logical queues. • Number of DRAM banks and SRAM bypass buffer depend only on the physical parameters.