240 likes | 451 Vues
Winter 2012 Lecture 7 Packet Buffers. EE384 Packet Switch Architectures. Sundar Iyer. The Problem. All packet switches (e.g. Internet routers, Ethernet switch) require packet buffers for periods of congestion.
E N D
Winter 2012 Lecture 7 Packet Buffers EE384 Packet Switch Architectures Sundar Iyer
The Problem • All packet switches (e.g. Internet routers, Ethernet switch) require packet buffers for periods of congestion. • Size: A commonly used “rule of thumb” says that buffers need to hold one RTT (about 0.25s) of data. Even if this could be reduced to 10ms, a 4x10Gb/s linecard would require 400Mbits of buffering. • Speed: Clearly, the buffer needs to store (retrieve) packets as fast as they arrive (depart). At 4x10Gb/s, minimum sized packets must arrive and depart every 8ns.
Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns UnpredictableScheduler Requests An ExamplePacket buffers for a 40Gb/s linecard Buffer Memory Buffer Manager Memory needs to be accessed for write or read every 4ns
Memory Operations Per Second (MOPS) What is MOPS? • Num. Unique Memory Operations Per Second • Refers to the speed of the address (not data) bus • Inverse of Random Access Time Examples • SRAM with 4ns access time = 250M MOPS • DRAM with 50 ns access time = 20M MOPS
Memory Technology Use SRAM? + Fast enough random access time, but • Low density, high cost, high power. Use DRAM? + High density means we can store data, but • Can’t meet random access time.
SRAM (S) FCRAM/RLDRAM (F) XDRAM (X) DDR3 (D) 25M MOPS 2c per Mb 3200 Mb/s per pin 800M MOPS $1 per Mb 800 Mb/s per pin 50M MOPS 4c per Mb 1000 Mb/s per pin 25M MOPS 1c per Mb 1600 Mb/s per pin X D F S The Problem: No single memory technology is a good match Ideal to have access/s of SRAM, Cost & Density of DRAM
Sol 1: Can’t we just use lots of DRAMs as separate memories in parallel? Read, write 40B every 4ns from a different ‘32ns access time’ memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory 40B 40B 40B 40B 40B 40B 40B 40B Solution • Write 40B packets to available banks • Read 40B packets from specified banks Problem • What if back to back reads occur from a small number of banks?
Read Rate, R One 40B packet every 8ns Sol 2: Can’t we just use lots of DRAMs as one monolithic memory in parallel? Read/write 320B every 32ns Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Bytes: 0-39 40-79 … … … … … 280-319 320B 320B Write Rate, R Buffer Manager One 40B packet every 8ns
320B 320B 320B 320B 320B 320B 320B 320B 320B 320B 40B 40B 40B 40B 40B 40B 40B 40B Sol 2: Works fine if there is only one FIFO Slow Buffer Memory Bytes: 0-39 40-79 … … … … … 280-319 320B 320B Write Rate, R Read Rate, R Buffer Manager 40B 40B 320B 320B One 40B packet every 8ns One 40B packet every 8ns
Sol 2: Works fine if there is only one FIFO & Supports Variable Length Packets Buffer Memory 320B 320B 320B 320B 320B 320B 320B 320B 320B 320B Bytes: 0-39 40-79 … … … … … 280-319 320B 320B Write Rate, R Read Rate, R Buffer Manager ?B ?B 320B 320B One 40B packet every 8ns One 40B packet every 8ns
320B 320B Write Rate, R Read Rate, R Buffer Manager ?B ?B 320B 320B One 40B packet every 8ns One 40B packet every 8ns Sol 2: In practice, buffer holds many FIFOs 1 320B 320B 320B 320B How can we writemultiple variable-lengthpackets into different queues? Q might be 1k – 64k 2 320B 320B 320B 320B Q 320B 320B 320B 320B Bytes: 0-39 40-79 … … … … … 280-319
Problem A block contains packets for different queues, which must be written to, or read from different memory locations.
Small Probability of Miss Rate Sol 3: Hybrid Memory Hierarchy Big slow memory DRAM Small fast cache SRAM Arriving Packet processor Departing Packets Packets R R A CPU cache is probabilistic Q: Why is randomness a problem in this context?
Large DRAM memory holds FIFO body 54 53 52 51 50 10 9 8 7 6 5 1 95 94 93 92 91 90 89 88 87 86 15 14 13 12 11 10 9 8 7 6 2 86 85 84 83 82 11 10 9 8 7 DRAM Q Reading b bytes Writing b bytes 1 1 4 3 1 2 Arriving Departing 55 60 59 58 57 56 2 Packets Packets 2 1 2 4 3 5 97 96 R R Q Q 6 5 4 3 2 1 SRAM 87 88 91 90 89 Unpredictable Scheduler Small SRAM Small SRAM Requests for FIFO heads for FIFO tails Sol 4: Hybrid Memory Hierarchy with 100% Cache Hit Rate
Design questions • What is the minimum SRAM needed to guarantee that a byte is always available in SRAM when requested? • What algorithm minimizes the SRAM size?
Bytes Replenish Bytes Bytes Bytes t = 0 t = 1 t = 2 t = 3 Replenish Bytes Bytes Bytes Bytes t = 4 t = 5 t = 6 t = 7 An Example Q = 5, w = 9+, b = 6
Bytes Bytes Bytes Bytes t = 8 t = 9 t = 10 t = 11 Replenish Replenish Bytes Bytes Bytes Bytes Read t = 13 t = 19 t = 23 t = 12 An Example Q = 5, w = 9+, b = 6
The size of the SRAM cache Bytes Necessity • How large does the SRAM cache need to be under any management algorithm? • Claim: wQ > Q(b - 1)(2 + lnQ) Sufficiency • For any pattern of arrivals, what is the smallest SRAM cache needed so that a byte is always available when requested? • For one particular algorithm: wQ = Qb(2 + lnQ) Q w w
Definitions Occupancy: X(q,t) The number of bytes in FIFO q(in SRAM) at time t. Deficit: D(q,t) = w - X(q,t) Q w w deficit occupancy
Smallest SRAM cache In addition, each queue needs to hold (b – 1) bytes in case it is replenished with b bytes when only 1 byte has been removed. Therefore, SRAM size must be at least: Qw > Q(b – 1)(2 + lnQ).
Examples: • 40Gb/s linecard, b=640, Q=128: SRAM = 560kBytes • 160Gb/s linecard, b=2560, Q=512: SRAM = 10MBytes Most Deficit Queue First • Algorithm: Every b timeslots, replenish the queue with the largest deficit. • Claim: An SRAM cache of size Qw > Qb(2 + lnQ) is sufficient.
Examples: • 40Gb/s line card, b=640, Q=128: SRAM = 560kBytes • 160Gb/s line card, b=2560, Q=512: SRAM = 10MBytes Intuition for Theorem • The maximum number of un-replenished requests for any i queues wi, is the solution of the difference equation - • with boundary conditions