VL2 – A Scalable & Flexible Data Center Network Authors: Greenberg et al Presenter: Syed M Irteza – CS @ LUMS CS678: 2 April 2013
Brief Overview • VL2: A data center network architecture, that aims to remedy problems related to existing data center designs, such as oversubscription, collateral damage, fragmentation, etc.
Related Efforts • BCube: Server-centric network structure - servers forward packets on behalf of other servers. Servers connect to multiple layers of switches. • Monsoon: Multi-Homed Tree, using commodity switches, special servers manage load balancing • SEATTLE: like Monsoon, but it uses a Distributed Hash Table to store location of each of the servers. • PortLand (contemporary): based on Fat-Tree, geared to easy VM migration, use of hierarchical Pseudo MACs.
VL2: Contributions • Detailed Data Center Traffic Analysis • Unpredictable traffic. Network oscillates between 50 to 60 patterns in a day, most of the time, each pattern does not last longer than 100 seconds. • 0.09% failures last over 10 days! • Design & Deployment of VL2 • Shuffled 2.7 TB of data between 75 servers in 395 seconds. • Compare cost of a VL2 network with equivalent networks based on existing designs
Problems with Existing Design • Limited server-to-server capacity • Technical / financial barriers lead to heavy over-subscription at highest levels (near core router) • Fragmentation of resources: • Different VLANs, cannot shift a VM across VLAN borders without reconfiguration (which takes time), thus congestion in one part of DCN, even though other areas are idle • Poor reliability and utilization: • Layer 2 domain, uses Spanning Tree, thus only 1 path is used, even if multiple exist
Goals • Uniform high capacity • Performance isolation • Layer-2 semantics
VL2: 3-level Clos Topology • Scale-out (broader network of inexpensive devices) rather than scale-up (increase capacity and features of individual devices).
Routing Design • Switches operate as layer-3 routers • routing tables are calculated by OSPF • Multiple paths available, use Valiant Load Balancing (each flow’s packets bounce “off” of a randomly chosen intermediate switch at top-level of Clos) • The packets are further load balanced along the two segments (source to randomly chosen switch, then switch to destination) using ECMP (Equal Cost Multi Path) routing
Addressing • LA: Location-specific IP addresses • AA: Application-specific IP addresses • Separation was needed to enable easy VM migration, which is needed because we want to be able to pool large shared server pool among unpredictable independent services
Addressing • Directory Service stores this mapping (LA-AA) - this is realized on servers, not switches. • Eliminates ARP and DHCP bottlenecks • Directory service can enforce access control policies • Shim layer (layer 2.5) invokes the directory service
Anycast • We assign the same LA address to all Intermediate switches • The directory system returns this anycast address to agents upon lookup
Questions/Possible Issues • Elephant flows may disrupt work of VLB