1 / 23

Sundar Iyer

Winter 2012 Lecture 7 Packet Buffers. EE384 Packet Switch Architectures. Sundar Iyer. The Problem. All packet switches (e.g. Internet routers, Ethernet switch) require packet buffers for periods of congestion.

brit
Télécharger la présentation

Sundar Iyer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Winter 2012 Lecture 7 Packet Buffers EE384 Packet Switch Architectures Sundar Iyer

  2. The Problem • All packet switches (e.g. Internet routers, Ethernet switch) require packet buffers for periods of congestion. • Size: A commonly used “rule of thumb” says that buffers need to hold one RTT (about 0.25s) of data. Even if this could be reduced to 10ms, a 4x10Gb/s linecard would require 400Mbits of buffering. • Speed: Clearly, the buffer needs to store (retrieve) packets as fast as they arrive (depart). At 4x10Gb/s, minimum sized packets must arrive and depart every 8ns.

  3. Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns UnpredictableScheduler Requests An ExamplePacket buffers for a 40Gb/s linecard Buffer Memory Buffer Manager Memory needs to be accessed for write or read every 4ns

  4. Memory Operations Per Second (MOPS) What is MOPS? • Num. Unique Memory Operations Per Second • Refers to the speed of the address (not data) bus • Inverse of Random Access Time Examples • SRAM with 4ns access time = 250M MOPS • DRAM with 50 ns access time = 20M MOPS

  5. Memory Technology Use SRAM? + Fast enough random access time, but • Low density, high cost, high power. Use DRAM? + High density means we can store data, but • Can’t meet random access time.

  6. SRAM (S) FCRAM/RLDRAM (F) XDRAM (X) DDR3 (D) 25M MOPS 2c per Mb 3200 Mb/s per pin 800M MOPS $1 per Mb 800 Mb/s per pin 50M MOPS 4c per Mb 1000 Mb/s per pin 25M MOPS 1c per Mb 1600 Mb/s per pin X D F S The Problem: No single memory technology is a good match Ideal to have access/s of SRAM, Cost & Density of DRAM

  7. Sol 1: Can’t we just use lots of DRAMs as separate memories in parallel? Read, write 40B every 4ns from a different ‘32ns access time’ memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory 40B 40B 40B 40B 40B 40B 40B 40B Solution • Write 40B packets to available banks • Read 40B packets from specified banks Problem • What if back to back reads occur from a small number of banks?

  8. Read Rate, R One 40B packet every 8ns Sol 2: Can’t we just use lots of DRAMs as one monolithic memory in parallel? Read/write 320B every 32ns Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Bytes: 0-39 40-79 … … … … … 280-319 320B 320B Write Rate, R Buffer Manager One 40B packet every 8ns

  9. 320B 320B 320B 320B 320B 320B 320B 320B 320B 320B 40B 40B 40B 40B 40B 40B 40B 40B Sol 2: Works fine if there is only one FIFO Slow Buffer Memory Bytes: 0-39 40-79 … … … … … 280-319 320B 320B Write Rate, R Read Rate, R Buffer Manager 40B 40B 320B 320B One 40B packet every 8ns One 40B packet every 8ns

  10. Sol 2: Works fine if there is only one FIFO & Supports Variable Length Packets Buffer Memory 320B 320B 320B 320B 320B 320B 320B 320B 320B 320B Bytes: 0-39 40-79 … … … … … 280-319 320B 320B Write Rate, R Read Rate, R Buffer Manager ?B ?B 320B 320B One 40B packet every 8ns One 40B packet every 8ns

  11. 320B 320B Write Rate, R Read Rate, R Buffer Manager ?B ?B 320B 320B One 40B packet every 8ns One 40B packet every 8ns Sol 2: In practice, buffer holds many FIFOs 1 320B 320B 320B 320B How can we writemultiple variable-lengthpackets into different queues? Q might be 1k – 64k 2 320B 320B 320B 320B Q 320B 320B 320B 320B Bytes: 0-39 40-79 … … … … … 280-319

  12. Problem A block contains packets for different queues, which must be written to, or read from different memory locations.

  13. Small Probability of Miss Rate Sol 3: Hybrid Memory Hierarchy Big slow memory DRAM Small fast cache SRAM Arriving Packet processor Departing Packets Packets R R A CPU cache is probabilistic Q: Why is randomness a problem in this context?

  14. Large DRAM memory holds FIFO body 54 53 52 51 50 10 9 8 7 6 5 1 95 94 93 92 91 90 89 88 87 86 15 14 13 12 11 10 9 8 7 6 2 86 85 84 83 82 11 10 9 8 7 DRAM Q Reading b bytes Writing b bytes 1 1 4 3 1 2 Arriving Departing 55 60 59 58 57 56 2 Packets Packets 2 1 2 4 3 5 97 96 R R Q Q 6 5 4 3 2 1 SRAM 87 88 91 90 89 Unpredictable Scheduler Small SRAM Small SRAM Requests for FIFO heads for FIFO tails Sol 4: Hybrid Memory Hierarchy with 100% Cache Hit Rate

  15. Design questions • What is the minimum SRAM needed to guarantee that a byte is always available in SRAM when requested? • What algorithm minimizes the SRAM size?

  16. Bytes Replenish Bytes Bytes Bytes t = 0 t = 1 t = 2 t = 3 Replenish Bytes Bytes Bytes Bytes t = 4 t = 5 t = 6 t = 7 An Example Q = 5, w = 9+, b = 6

  17. Bytes Bytes Bytes Bytes t = 8 t = 9 t = 10 t = 11 Replenish Replenish Bytes Bytes Bytes Bytes Read t = 13 t = 19 t = 23 t = 12 An Example Q = 5, w = 9+, b = 6

  18. The size of the SRAM cache Bytes Necessity • How large does the SRAM cache need to be under any management algorithm? • Claim: wQ > Q(b - 1)(2 + lnQ) Sufficiency • For any pattern of arrivals, what is the smallest SRAM cache needed so that a byte is always available when requested? • For one particular algorithm: wQ = Qb(2 + lnQ) Q w w

  19. Definitions Occupancy: X(q,t) The number of bytes in FIFO q(in SRAM) at time t. Deficit: D(q,t) = w - X(q,t) Q w w deficit occupancy

  20. Smallest SRAM cache

  21. Smallest SRAM cache In addition, each queue needs to hold (b – 1) bytes in case it is replenished with b bytes when only 1 byte has been removed. Therefore, SRAM size must be at least: Qw > Q(b – 1)(2 + lnQ).

  22. Examples: • 40Gb/s linecard, b=640, Q=128: SRAM = 560kBytes • 160Gb/s linecard, b=2560, Q=512: SRAM = 10MBytes Most Deficit Queue First • Algorithm: Every b timeslots, replenish the queue with the largest deficit. • Claim: An SRAM cache of size Qw > Qb(2 + lnQ) is sufficient.

  23. Examples: • 40Gb/s line card, b=640, Q=128: SRAM = 560kBytes • 160Gb/s line card, b=2560, Q=512: SRAM = 10MBytes Intuition for Theorem • The maximum number of un-replenished requests for any i queues wi, is the solution of the difference equation - • with boundary conditions

More Related