350 likes | 487 Vues
Configuring a Load-Balanced Switch in Hardware. Srikanth Arekapudi, Shang-Tse (Da) Chuang, Isaac Keslassy, Nick McKeown Stanford University. Outline. Load Balanced Switch Scalability Reconfiguration Algorithm Hardware Implementation. Typical Router Architecture.
E N D
Configuring a Load-Balanced Switch in Hardware Srikanth Arekapudi, Shang-Tse (Da) Chuang, Isaac Keslassy, Nick McKeown Stanford University
Outline • Load Balanced Switch • Scalability • Reconfiguration Algorithm • Hardware Implementation
Typical Router Architecture Input Output Input Output Input Output N x N Switch Fabric R 1 R 2 R R 1 R R Scheduler
Load-Balanced Switch Out Out Out Forwarding mesh Load-balancing mesh R R In 3 2 1 R/N R/N R/N R/N R/N R/N R/N R/N R R In R/N R/N R/N R/N R/N R/N R/N R R R/N In R/N R/N
Load-Balanced Switch Out Out Out Forwarding mesh Load-balancing mesh R R In R/N R/N R/N R/N 1 R/N R/N R/N R/N R R In R/N R/N 2 R/N R/N R/N R/N R/N R R R/N In R/N R/N 3
Load-Balanced Switch Out Out Out Forwarding mesh Load-balancing mesh R R In R/N R/N R/N R/N R/N R/N • 100% throughput for broad class of traffic • No scheduler needed a Scalable R/N R/N R R In R/N R/N R/N R/N R/N R/N R/N R R R/N In 3 R/N R/N
N*2R/N = 2R = R +R R R In In Out Out In In Out Out In In Out Out In In Out Out A Single Combined Mesh 2R/N
R R (N-1)*2R/N < R +R In In Out Out In In Out Out In In Out Out In In Out Out A Single Combined Mesh 2R/N
1 2 3 4 7 1 2 3 4 5 8 6 1 2 3 4 5 6 7 8 ScalabilityN=8 2R/8
8 2 1 3 4 7 6 5 5 3 2 1 6 7 8 4 When N is Too LargeDecompose into groups (or racks) 2R 2R 4R 4R 4R/4 2R 2R
1 1 2 L L 2 1 2 L L 2 1 When N is Too LargeDecompose into groups (or racks) Group/Rack 1 Group/Rack 1 2R 2R 2RL/G 2R 2R 2RL 2RL 2R 2R 2RL/G Group/RackG Group/Rack G 2RL/G 2R 2R 2R 2R 2RL 2RL 2R 2RL/G 2R
2RL/G 2RL/G 2RL/G 2RL/G 2RL/G 2RL/G + + = 2RL/G 2RL/G 2RL/G 2RL 2RL/G + = + L L 2 L 1 1 2 1 2 2 L 1 When Linecards are MissingFailures, Incremental Additions, and Removals… Group/Rack 1 Group/Rack 1 2R 2R 2RL 2RL/G 2R 2R 2RL 2RL 2R 2R • Solution: replace mesh with sum of permutations Group/RackG Group/Rack G 2R 2R 2R 2R 2RL 2RL 2R 2R
1 1 2 L L 2 1 1 2 L L 2 When Linecards Fail Group/Rack 1 Group/Rack 1 2R 2R 2R 2R MEMS Switch 2R 2R MEMS Switch Group/RackG Group/Rack G 2R 2R 2R 2R 2R 2R
Questions • Number of MEMS Switches? • TDM Schedule?
Example – 3 Linecards 2R/3 In In R R Out Out In In R R Out Out In In R R Out Out
2 1 1 1 Example2 Groups Group/Rack 1 Group/Rack 1 2R 2R 1 8R/3 4R 4R 2R 2R 2 4R/3 Group/Rack 2 Group/Rack 2 4R/3 2R 2R 2R 2R/3 2R
2 1 1 1 Example2 Groups Group/Rack 1 Group/Rack 1 2R 2R 1 4R/3 4R 4R 2R 2R 4R/3 2 4R/3 Group/Rack 2 Group/Rack 2 4R/3 2R 2R 2R 2R 2R/3
Number of MEMS Switches • MEMS switches between groups i and j • Total Number of MEMS switches:M ≤ L+G-1
Questions • Number of MEMS Switches? • TDM Schedule?
Constraints on groups at each time-slot 2R 2R 2R 2R Constraints on linecards at each time-slot 2 1 1 2 2 1 TDM Schedule Group A Group A 2R 2R 1 4R 4R 2R 2R 2 Group B Group B 2R 2R 4R 4R 2R 2R
Rules for TDM Schedule At each time-slot: • Each transmitting linecard sends one packet • Each receiving linecard receives one packet • (MEMS constraint) Each transmitting group i sends at most one packet to each receiving group j through each MEMS connecting them In a schedule of N time-slots: • Each transmitting linecard sends exactly one packet to each receiving linecard
Tx Group A Tx Group B TDM Schedule
Tx Group A Tx Group B TDM Schedule
Tx Group A Tx Group B Bad TDM Schedule
TDM Schedule Algorithm • The algorithm constructs three consecutive schedules. • Sending Groups to Receiving Groups • Connection Assignment Problem • Sending Linecards to Receiving Groups. • Matrix Decomposition Problem • Sending Linecards to Receiving Linecards • Matrix Decomposition Problem
Tx Group A Tx Group B Good TDM Schedule
Tx Group A Tx Group B Good TDM Schedule
G1 G1 2 2 G1 G1 0 0 G2 G2 1 1 G2 G2 1 0 0 1 G3 G3 1 1 G3 G3 0 1 0 Connection Assignment Problem Not Scheduled Scheduled
G1 G1 0 0 0 G2 G2 1 0 0 1 G3 G3 1 1 1 G1 G1 G1 G1 0 0 0 0 1 G2 G2 1 0 G2 G2 1 0 1 0 1 0 G3 G3 G3 G3 0 1 0 0 1 1 Connection Assignment Problem After Greedy Back Tracing G1 G1 0 0 G2 G2 1 0 1 0 G3 G3 1 1
Matrix Decomposition Problem 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 + + =
Matrix Decomposition Problem • Use of sparsity of matrices to represent the ones as a row-column pair • Consists of two stages • Greedy Algorithm • Slepian-Duguid Algorithm • Decomposes all the permutation matrices at once • Uses the row-column pair list structure
Synthesis • 40 Groups and 640 Linecards • 0.13u process • Cycle time within 4ns • Connection Assignment Problem • 10K gates • 24Kbits memory • Matrix Decomposition Problem • 25K gates • 230Kbits of memory