1 / 18

Analysis of a Memory Architecture for Fast Packet Buffers

Analysis of a Memory Architecture for Fast Packet Buffers. Sundar Iyer, Ramana Rao Kompella & Nick McKeown (sundaes,ramana, nickm)@stanford.edu Departments of Electrical Engineering & Computer Science, Stanford University http://klamath.stanford.edu. Example

xuxa
Télécharger la présentation

Analysis of a Memory Architecture for Fast Packet Buffers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of a Memory Architecture for Fast Packet Buffers Sundar Iyer, Ramana Rao Kompella & Nick McKeown (sundaes,ramana, nickm)@stanford.edu Departments of Electrical Engineering & Computer Science, Stanford University http://klamath.stanford.edu

  2. Example OC768c = 40Gb/s; RTT*BW = 10Gb; 64 byte packets Write Rate, R Buffer Memory Read Rate, R 1 packet every 12.8 ns 1 packet every 12.8 ns Packet Buffer Architecture Goal Determine and analyze techniques for building high speed (>40Gb/s) electronic packet buffers. Stanford University

  3. This is not Possible Today Use SRAM? + fast enough random access time, but - too expensive, and - too low density to store 10Gb of data. Use DRAM? + high density means we can store data, but - too slow (typically 50ns random access time) Stanford University

  4. Characteristics of Packet Buffer Architectures • The total throughput needed is at least 2(Ingress Rate) • Size of Buffer is at least R * RTT • The buffers have one or more FIFOs • The sequence in which the FIFOs are accessed is determined by an arbiter and is unknown apriori Stanford University

  5. Large DRAM memory holds the middle of FIFOs 54 53 52 51 50 10 9 8 7 6 5 1 95 94 93 92 91 90 89 88 87 86 15 14 13 12 11 10 9 8 7 6 2 86 85 84 83 82 11 10 9 8 7 Q Reading b cells Writing b cells 1 1 55 60 59 58 57 56 1 4 3 2 Arriving Departing 2 2 Packets Packets 97 96 2 1 4 3 5 R R Q Q 87 88 91 90 89 6 5 4 3 2 1 Arbiter or Scheduler Small ingress SRAM Small egress SRAM Requests cache of FIFO tails cache of FIFO heads Memory Hierarchy Stanford University

  6. Some Thoughts • The buffer architecture itself is well known • Many buffers are designed to give only statistical guarantees • We would like to be able to give deterministic guarantees Stanford University

  7. System Design Parameters • Main Parameters • SRAM Size • Latency faced by a cell • System Parameters • I/O Bandwidth • Number of addresses • Use single address on every DRAM • Use different addresses on every DRAM • Use/Non Use of DRAM Burst Mode • (non) Existence of Bank conflicts Stanford University

  8. The Six Commandments…. Thou shall … • Assumptions on system parameters • Have no speedup on DRAM • Have no speedup on I/O = 2R • Use a single address bus for every DRAM • Read/Write cells from the same queue • Split packets into cells of size “C”. • Use no DRAM Burst Mode • Design with no bank conflicts Stanford University

  9. Symmetry Argument • The analysis and working of the ingress and egress buffer architectures are similar • We shall analyze only the egress buffer architecture Stanford University

  10. Questions How large does the SRAM cache need to be: • To guarantee that a packet is immediately available in SRAM when requested, or • To guarantee that a packet is available within a maximum bounded time? What Memory Management Algorithm (MMA) should we use? Stanford University

  11. Cells w Cells Cells Cells Q t = 0 t = 1 t = 2 t = 3 Cells Cells Cells Cells t = 4 t = 5 t = 6 t = 7 A Bad Case for the Queues …1 Q=5, w=9+, b=5 Stanford University

  12. Cells Cells Cells Cells t = 8 t = 9 t = 10 t = 11 Cells Cells Cells Cells t = 13 t = 14 t = 17 … t = 12 A Bad Case for the Queues …2 Q=5, w=9+, b=5 Stanford University

  13. ResultSingle Address Bus • Impatient Arbiter: MDQF-MMA (maximum deficit queue first), with a SRAM buffer of size Qb[2 +ln Q] guarantees zero latency. Stanford University

  14. Questions How large does the SRAM cache need to be: • To guarantee that a packet is immediately available in SRAM when requested, or • To guarantee that a packet is available within a maximum bounded time? What Memory Management Algorithm (MMA) should we use? This talk… Stanford University

  15. Cells • In SRAM: A Dynamic Buffer of size Q(b-1) Q Requests in lookahead buffer b - 1 • Lookahead: Arbiter requests for future Q(b-1) + 1 slots are known Q(b-1) + 1 • Compute: Find out the queue which will runs into “trouble” earliest green! Cells • Replenish: “b” cells for the “troubled” queue Q b - 1 Earliest Critical Queue First (ECQF-MMA) Stanford University

  16. Cells Cells Cells Requests in lookahead buffer Requests in lookahead buffer Requests in lookahead buffer t = 0; Green Critical t = 1 t = 2 Cells Cells Cells Requests in lookahead buffer Requests in lookahead buffer Requests in lookahead buffer t = 3 t = 5 t = 4; Blue Critical Cells Cells Cells Requests in lookahead buffer Requests in lookahead buffer Requests in lookahead buffer t = 6 t = 7 t = 8; Red Critical Example of ECQF-MMA: Q=4, b=4 Stanford University

  17. ResultSingle Address Bus • Patient Arbiter: ECQF-MMA (earliest critical queue first), minimizes the size of SRAM buffer to Q(b-1) cells; and guarantees that requested cells are dispatched within Q(b-1)+1 cell slots. Stanford University

  18. Implementation Numbers (64byte cells, b = 8, DRAM T = 50ns) • VOQ Switch - 32 ports • Brute Force : Egress. SRAM = 10 Gb, no DRAM • Patient Arbiter : Egress. SRAM = 115kb, Lat. = 2.9 us, DRAM = 10Gb • Impatient Arbiter : Egress. SRAM = 787 kb, DRAM = 10Gb • Patient Arbiter(MA) : No SRAM, Lat. = 3.2us, DRAM = 10Gb • VOQ Switch - 32 ports, 16 Diffserv classes • Brute Force : Egress. SRAM = 10Gb, no DRAM • Patient Arbiter : Egress. SRAM = 1.85Mb, Lat. = 45.9us, DRAM 10Gb • Impatient Arbiter : Egress. SRAM = 18.9Mb, DRAM = 10Gb • Patient Arbiter(MA) : No SRAM, Lat. = 51.2us, DRAM = 10Gb Stanford University

More Related