1 / 117

Packet Arbitration in VoQ switches and Others and QoS

Packet Arbitration in VoQ switches and Others and QoS. Recap. High-Performance Switch Design We need scalable switch fabrics – crossbar, bit-sliced crossbar, Clos networks. We need to solve the memory bandwidth problem Our conclusion is to go for input queued-switches

tirza
Télécharger la présentation

Packet Arbitration in VoQ switches and Others and QoS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Packet Arbitration in VoQ switches and Othersand QoS

  2. Recap • High-Performance Switch Design • We need scalable switch fabrics – crossbar, bit-sliced crossbar, Clos networks. • We need to solve the memory bandwidth problem • Our conclusion is to go for input queued-switches • We need to use VOQ instead of FIFO queues • For these switches to function at high-speed, we need efficient and practically implementable scheduling/arbitration algorithms

  3. Cell Data Port Processor Port Processor Crossbar optics optics LCS Protocol LCS Protocol optics optics Request Grant/Credit Switch core architecture Port #1 Scheduler Port #256

  4. Algorithms for VOQ Switching • We analyzed several algorithms for matching inputs and outputs • Maximum size matching: these are based on bipartite maximum matching – which can be solved using Max-flow techniques in O(N2.5) • These are not practical for high-speed implementations • They are stable (100% throughput for uniform traffic) • They are not stable for non-uniform traffic • Maximal size matching: they try to approximate maximum size matching • PIM, iSLIP, SRR, etc. • These are practical – can be executed in parallel in O(logN) or even O(1) • They are stable for uniform traffic and unstable for non-uniform traffic

  5. Algorithms for VOQ Switching • Maximum weight matching: These are maximum matchings based weights such queue length (LQF) (LPF) or age of cell (OCF) with a complexity of O(N3logN) • These are not practical for high-speed implementations. Much more difficult to implement than maximum size matching • They are stable (100% throughput) under any admissible traffic • Maximal weight matching: they try to approximate maximum weight matching. They use RGA mechanism like iSLIP • iLQF, iLPF, iOCF, etc. • These are “somewhat” practical – can be executed in parallel in O(logN) or even O(1) like iSLIP BUT the arbiters are much more complex to build

  6. Differences between RRM, iSlip & FIRM

  7. Algorithms for VOQ Switching • Randomized algorithms • They try in a smart way to approximate maximum weight matching by avoiding using an iterative process • They are stable under any admissible traffic • Their time complexity is small (depending on the algorithm) • Their hardware complexity is yet untested.

  8. Can we avoid having schedulers altogether !!!

  9. OQ routers: • + work-conserving (QoS) • - memory bandwidth = (N+1)R R R R R R R • IQ routers: • + memory bandwidth = 2R • - arbitration complexity Bipartite Matching Remember: Two Successive Scaling Problems

  10. Today: 64 ports at 10Gbps, 64-byte cells. • Arbitration Time = = 51.2ns • Request/Grant Communication BW = 17.5Gbps 64bytes 10Gbps IQ Arbitration Complexity • Scaling to 160Gbps: • Arbitration Time = 3.2ns • Request/Grant Communication BW = 280Gbps • Two main alternatives for scaling: • Increase cell size • Eliminate arbitration

  11. Desirable Characteristics for Router Architecture • Ideal: OQ • 100% throughput • Minimum delay • Maintains packet order • Necessary: able to regularly connect any input to any output • What if the world was perfect? Assume Bernoulli iid uniform arrival traffic...

  12. Round-Robin Scheduling • Uniform & non-bursty traffic => 100% throughput • Problem: traffic is non-uniform & bursty

  13. 1 1 1 N N N Two-Stage Switch (I) External Inputs Internal Inputs External Outputs First Round-Robin Second Round-Robin

  14. 1 1 1 N N N Load Balancing Two-Stage Switch (I) External Inputs Internal Inputs External Outputs First Round-Robin Second Round-Robin

  15. 1 2 2 1 1 1 1 N N N Two-Stage Switch Characteristics External Inputs Internal Inputs External Outputs Cyclic Shift Cyclic Shift 100% throughput Problem: unbounded mis-sequencing

  16. Two-Stage Switch (II) New N3 instead of N2

  17. a b 1 3 2 Expanding VOQ Structure Solution: expand VOQ structure by distinguishing among switch inputs

  18. What is being done in practice(Cisco for example) • They want schedulers that achieve 100% throughput and very low delay (Like MWM) • They want it to be as simple as iSLIP in terms of hardware implementation • Is there any solution to this !!!!!

  19. Typical Performance of ISLIP-like Algorithms PIM with 4 iterations

  20. What is being done in practice(Cisco for example)

  21. Can we make these scheduling algorithms simpler?Using a Simpler Architecture

  22. Buffered Crossbar Switches • A buffered crossbar switch is a switch with buffered fabric (memory inside the crossbar). • A pure buffered crossbar switch architecture, has only buffering inside the fabric and none anywhere else. • Due to HOL blocking problem, VOQ are used in the input side.

  23. Flow Control Arbiter 1 1 … …. N … 1 Arbiter 2 …. N … … … … … 1 … Arbiter …. N N … • Data • Input Cards Arbiter Arbiter Arbiter Buffered Crossbar Architecture Output Card Output Card Output Card 2 N 1

  24. Scheduling Process • Scheduling is divided into three steps: • Input scheduling:each input selects in a certain way one cell from the HoL of an eligible queue and sends it to the corresponding internal buffer. • Output scheduling: each output selects in a certain way from all internally buffered cells in the crossbar to be delivered to the output port. • Delivery notifying:for each delivered cell, inform the corresponding input of the internal buffer status.

  25. Advantages • Total independence between input and output arbiters (distributed design) (1/N complexity as compared to centralized schedulers) • Performance of Switch is much better (because there is much less output contention) – a combination of IQ and OQ switches • Disadvantage: Crossbar is more complicated

  26. I/O Contention Resolution 1 2 3 4 1 2 3 4

  27. I/O Contention Resolution 1 2 3 4 1 2 3 4

  28. The Round Robin Algorithm • InRr-OutRr • Input scheduling: InRr (Round-Robin) - Each input selects the next eligible VOQ, based on its highest priority pointer, and sends its HoL packet to the internal buffer. • Output scheduling: OutRr (Round-Robin) - Each output selects the next nonempty internal buffer, based on its highest priority pointer, and sends it to the output link.

  29. 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 Input Scheduling (InRr.) 1 2 3 4 1 2 3 4

  30. 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 Output Scheduling (OutRr.) 4 1 3 2 1 4 1 3 2 2 3 4 1 3 2 4 1 3 2 4 1 2 3 4

  31. 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 1 4 1 3 2 2 4 1 3 2 3 4 1 3 2 4 Out. Ptrs Updt + Notification delivery 1 2 3 4

  32. Performance study Delay/throughput under Bernoulli Uniform and Burtsy Uniform Stability performance:

  33. 32x32 Switch under Bernoulli Uniform Traffic OQ RR-RR 3 10 1-SLIP 4-SLIP 2 10 Average Delay 1 10 0 10 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized Load Bernoulli Uniform Arrivals

  34. 32x32 Switch under Bursty Uniform Traffic OQ RR-RR 1-SLIP 4-SLIP 3 10 Average Delay 2 10 1 10 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized Load Bursty Uniform Arrivals

  35. Scheduling Process • Because the arbitration is simple: • We can afford to have algorithms based on weights for example (LQF, OCF). • We can afford to have algorithms that provide QoS

  36. Buffered Crossbar Solution: Scheduler • The algorithm MVF-RR is composed of two parts: • Input scheduler – MVF (most vacancies first) Each input selects the column of internal buffers (destined to the same output) where there are most vacancies (non-full buffers). • Output scheduler – Round-robin Each output chooses the internal buffer which appears next on its static round-robin schedule from the highest priority one and updates the pointer to 1 location beyond the chosen one.

  37. Buffered Crossbar Solution: Scheduler • The algorithm ECF-RR is composed of two parts: • Input scheduler – ECF (empty column first) • Each input selects first empty column of internal buffers (destined to the same output). If there is no empty column, it selects on a round-robin basis. • Output scheduler – Round-robin • Each output chooses the internal buffer which appears next on its static round-robin schedule from the highest priority one and updates the pointer to 1 location beyond the chosen one.

  38. Buffered Crossbar Solution: Scheduler • The algorithm RR-REMOVE is composed of two parts: • Input scheduler – Round-robin (with remove-request signal sending) • Each input chooses non-empty VOQ which appears next on its static round-robin schedule from the highest priority one and updates the pointer to 1 location beyond the chosen one. It then sends out at most one remove-request signal to outputs • Output scheduler – REMOVE • For each output, if it receives any remove-request signals, it chooses one of them based on its highest priority pointer and removes the cell. If no signal is received, it does simple round-robin arbitration.

  39. Buffered Crossbar Solution: Scheduler • The algorithm ECF-REMOVE is composed of two parts: • Input scheduler – ECF (with remove-request signal sending) • Each input selects first empty column of internal buffers (destined to the same output). If there is no empty column, it selects on a round-robin basis.It then sends out at most one remove-request signal to outputs • Output scheduler – REMOVE • For each output, if it receives any remove-request signals, it chooses one of them based on its highest priority pointer and removes the cell. If no signal is received, it does simple round-robin arbitration.

  40. Round-robin arbiter Round-robin arbiter Highest priority pointer Any grant Grants Grants Selector 0 Selector N-1 Arbitration results Hardware Implementation of ECF-RR: An Input Scheduling Block

  41. Performance Evaluation: Simulation Study Uniform Traffic

  42. Performance Evaluation: Simulation Study ECF-REMOVe over RR-RR

  43. Performance Evaluation : Simulation Study Bursty Traffic

  44. Performance Evaluation: Simulation Study ECF-REMOVe over RR-RR

  45. Performance Evaluation : Simulation Study Hotspot Traffic

  46. Performance Evaluation: Simulation Study ECF-REMOVe over RR-RR

  47. Quality of Service Mechanisms for Switches/Routers and the Internet

  48. VOQ Algorithms and Delay • But, delay is key • Because users don’t care about throughput alone • They care (more) about delays • Delay = QoS (= $ for the network operator) • Why is delay difficult to approach theoretically? • Mainly because it is a statistical quantity • It depends on the traffic statistics at the inputs • It depends on the particular scheduling algorithm used • The last point makes it difficult to analyze delays in i /q switches • For example in VOQ switches, it is almost impossible to give any guarantees on delay.

  49. Link 1, ingress Link 1, egress Link 2, ingress Link 2, egress Link 3, ingress Link 3, egress Link 4, ingress Link 4, egress VOQ Algorithms and Delay • This does not mean that we cannot have an algorithm that can do that. It means there exist none at this moment. • For this exact reason, almost all quality of service schemes (whether for delay or bandwidth guarantees) assume that you have an output-queued switch

  50. Policer Classifier Policer QoS Router Queue management Policer Per-flow Queue Scheduler Classifier shaper Policer Per-flow Queue Per-flow Queue Scheduler shaper Per-flow Queue

More Related