310 likes | 428 Vues
CloudNet: Enterprise Ready Virtual Private Clouds. K. K. Ramakrishnan AT&T Labs Research, Florham Park, NJ Joint work with: Timothy Wood, Jacobus van der Merwe, and Prashant Shenoy Thanks to Toby Ford , Joe Houle and others of AT&T for
E N D
CloudNet: Enterprise Ready Virtual Private Clouds K. K. Ramakrishnan AT&T Labs Research, Florham Park, NJ Joint work with: Timothy Wood, Jacobus van der Merwe, and Prashant Shenoy Thanks to Toby Ford, Joe Houle and others of AT&T for providing a view of the competitive landscape and of their efforts with Synaptic Computing and Storage
Evolution of Computing and Communication Systems • We have seen over the last several decades, the evolution of computing, storage and networking capabilities • Each step in the evolution caused by changes in the relative speed or cost of computing, storage and communications • Caused processing and storage to be closer or farther from each other and from users • E.g., moved from mainframes to mini-computers to workstations and PCs as computing costs and capabilities evolved relative to I/O • Then moved to client-server and eventually to web based services as networking capabilities evolved • Each step may be viewed as attempt to take advantage of shared resources and multiplexing driven by cost, functionality and performance capabilities • So, is “Cloud Computing” déjà vu all over again?
Cloud Computing "The interesting thing about cloud computing is that we've redefined cloud computing to include everything that we already do. I can't think of anything that isn't cloud computing with all of these announcements.” Larry Ellison, CEO Oracle Wall Street Journal, September 26, 2008 • Cloud computing is taking advantage of current balance in cost, performance and capability of computing vs. communications
Cloud Computing • Rent computation and storage resources on demand • Fundamental Value Proposition: • Shared, dynamic allocation of resources to reduce cost through multiplexing • Virtualization is a key component • Accessed by multiple enterprise sites • Cloud Platform types: • Software as a Service • Hotmail, Google Docs • Platform as a Service • Google App Engine, Microsoft Azure • Infrastructure as a Service • Amazon EC2, VMware vCloud Cloud Platform Enterprise Sites
‘Cloud’ services grouped in 3 categories Simple DB Brew Connect / UC MEAP Android Mobile Apps / Vertically Integrated Apple iPhone Apple iPhone EC2, S3
ProcessingTier Cloud Platform Moving to the Cloud • Acme wants to move part of its payroll app into the cloud • Should be easy, right…? Acme LAN Front EndReports Data Store ProcessingTier
Cloud Eco-System Compute as a Service Admin Portal APIs and Services ServiceOrchestrationScheduling CustomerServicePolicy API Adaptation Services CloudEnablers Events Common XaaS Tools XaaS Application Platform as a Service ProvisioningConfiguration Storage as a Service Charging etc. Fault/PerfMonitoring Servers Storage OSS/BSS Cloud Eco-system LAN / LB Network Capabilities • Physical Resources • Provides basic server and storage facilities • Provides LAN and load balancing facilities • Compute as a Service • Virtualized machine: compute capacity • Image create / management / marketplace • Virtual servers/ Server accessible storage • Address management / Load balancing services • Multi-tenant shared and dedicated environments • Storage as a Service • IP accessible storage services (http, NFS, CIF, etc.) • Application Platform as a Service • Application environment to create, test, host/run applications (e.g. Java, .NET, J2EE, Python, etc.) • Multi-tenant shared and dedicated environments • XaaS • Other services (database, queuing, etc.) • Software as a Service • Advanced IaaS/PaaS/XaaS services • Cross service coordination and integration • Capacity / Availability Services, Etc. • Common XaaS Tools For Providing Services • Service Orchestration / Scheduling • Build/support enhanced and new service offerings from existing service and OSS/BSS APIs • Customer Service Policy/Rules • Support enhanced services, SLA management, etc. • Likely part of OSS/BSS framework (TBD) • API Adaptation • Enable the integration of disparate underlying service APIs into a coherent set of services to our customer • Cloud enablers • Support capabilities to aid in the development of XaaS services (e.g. logging, Resource /Service ID/reference mgmt,IDM, OSS/BSS integration, etc.) • Leveraging Services • Services should leverage the capabilities of other services as appropriate • Compute and Storage as Service are foundational services • Basic building block for other services • Managing Scale / Availability • Modularity is key to managing scale and availability • Physical/service modularity • Availability domains • Includes network design, storage, servers, etc. • Strive for commonality, but do not assume one solution fits all • At scale individual services may require specific architecture to scale appropriately
Enterprise Cloud Challenges • Existing platforms do not meet the needs of enterprise customers • Insufficient security controls • Need isolation at server and network level • Deployment is difficult - transparency • Cloud resources are completely separate from local ones • Can’t make VMs look like part of existing LAN • Limited control over network resources • Cannot specify network topology or IP addresses • Cannot reserve bandwidth or request QoS guarantees for network links
Vision and Research Direction • We have been working towards making compute and storage resources location transparent for enterprises, and applications in general • Enable Location of Compute and Storage resources remotely • Ensure Transparency and Security • Minimize performance impact from having remote resources • Facilitate Migration of Resources in a Transparent and Seamless manner • With as little application impact as possible • Provide capability for Disaster Recovery • Remote resources need to be far enough away to avoid sharing of risk between local and remote resources • Minimum RPO/RTO – to minimize impact on enterprise operations
GW GW Problem #1: Transparency • Application may have been written for LAN environment • Might utilize broadcast or LAN service discovery • Must add Internet gateways for apps previously only on LAN • - Must now communicate via public IPs or configure DNS Lack of transparency causes application modifications and infrastructure reconfigurations Acme LAN Cloud Platform Front End front.acme.com Processingproc.cloud.com Data Store data.acme.com
Problem #2: Security • Acme’s servers are now accessible from the public internet! • Servers formerly on secure LAN now exposed to malicious users • Must configure firewall rules to limit access • Fine grain rules are difficult to manage in dynamic environments Lack of secure cloud connections exposes enterprise to threats from both in and out of the cloud Acme LAN Cloud Platform Front End front.acme.com Processingproc.cloud.com Data Store data.acme.com Hacker123hax.cloud.com
Problem #3: Flexible Resource Mgmt • Benefit of cloud computing: ability to easily adjust resource capacities and add new VMs • After a change must deal with transparency and security issues all over again! • Current platforms do not support network resource reservation (Bandwidth/QoS guarantees) Enterprises want control over network resources. Cloud must support dynamic changes Acme LAN Cloud Platform Front End front.acme.com +1 Processingproc.cloud.com +1 Data Store data.acme.com +1 Processing #2proc2.cloud.com
Cloud Platform Disk VM Enterprise Sites Key Observation Existing cloud platforms primarily cover storage and computation + + Enterprise Clouds need control over the network as well
VM VM VM VM Virtual Private Clouds • A Virtual Private Cloud is… • A secure collection of server, storage, and network resources spanning one or more cloud data centers • That is seamlessly connected to one or more enterprise sites Virtual Private Networks (VPNs) • Layer 2 and 3 MPLS based VPNs • Created by network provider with no end host configuration • Already used by many businesses! Cloud Sites Enterprise Sites
VPC Benefits • For the customer: • Isolates network & compute resources • Cloud resources are only accessible through VPN • Simplifies deployment since cloud looks same as local resources • For the service provider: • Provides mechanism for control over resource reservation within provider network • Simplifies management of multiple data centers by combining them into large resource pools
Provider Edge Customer Edge CloudNet • Cloud Manager • Allocates computation and storage resources • Manages VLAN assignment within cloud network • Network Manager • Creates and configure VPN endpoints • Reserves network resources Routers Network Manager Cloud Manager VM VM VPN VLAN VPN VLAN VM VM
CloudNet: Virtual Private Clouds Cloud Site X VPN A Server VPN B VPC A Server PE PE AT&T Backbone VPN A Cloud Site Y PE Server VPN B PE Server Server VPC B • Virtual Private Cloud: • Collection of cloud resources presented to customer as a private set of cloud resources, transparently and securely connected to customer VPN
CloudNet System Components CloudNet Controller Portal VPN A VPN B Network Manager Cloud Platform Cloud Manager PE Server AT&T Backbone Server PE VPN A CE Router Server Server VPN B PE Server Cloud Domain Network Domain • High level abstraction: • Create compute resources • Map into VPN • Cross domain interaction • Cloud Manager: • Create compute resources • Map into VPN (cloud side) • Network Manager (IRSCP): • VPN management (network side) • Dynamic VPN mapping/stitching
CloudNet System Components - Network Domain Network Manager VPN A VPN B PE IRSCP PE VPN B PE VPN A • Network side: Dynamic VPN mapping with IRSCP • On PE connected to cloud platform • Pre-configure VPN interfaces with Route-targets • IRSCP re-writes the RTs as needed to dynamically map these interfaces into specific VPNs
CloudNet System Components - Network Domain Remapping Network Manager VPN A VPN B PE IRSCP VPN A PE VPN B VPN B PE VPN A • Network side: Dynamic VPN mapping with IRSCP • On PE connected to cloud platform • Pre-configure VPN interfaces with Route-targets • IRSCP re-writes the RTs as needed to dynamically map these interfaces into specific VPNs
System Components -Cloud Domain Remapping Cloud Manager Server Server CE PE Server CE Server Switch Router Server Cloud Computing Platform • On Cloud Computing Platform side, the Cloud Manager : • Dynamically creates and configures a logical router on a physical router platform • Create virtual machine with requested images and configuration • Hook all of them together
WAN Migration Layer 2 VPNs make WAN act like a LAN Can use existing LAN migration techniques to move across WAN
Layer 2 VPN (VPLS) VPN endpoint Router VLAN Switch WAN Migration Layer 2 VPNs make WAN act like a LAN CE Cloud Site 1 Customer Site A B PE PE VLAN ARP! CE ARP! PE B Cloud Site 2 Take advantage of LAN migration techniques, suitably enhanced, to move across WAN
WAN Migration Optimization • Once connectivity is setup, migration requires • Storage Migration • Live Memory Migration • Storage Migration is done through a combination of • Asynchronous Copy of disk storage to remote site initially • Synchronous copy of incremental updates subsequently during live memory migration • Live Memory Migration needs to balance multiple needs • Total Migration Time for live memory (reduced application performance) • Pause Time (application has to be quiescent for final transfer) • Amount of Data Transfer (Bandwidth Requirement)
Live Memory Migration over WANs • There has been quite a bit of work in the past for VM migration over LANs • But these need to be enhanced to work well over WANs, especially to accommodate varying/limited network capacities and latencies • We’ve been working on an approach to combine multiple techniques • Dynamic Stop and Copy • Content Based Redundancy • Incremental updates (page deltas) • Overall benefit is significant reduction in migration and pause times, especially for limited bandwidth between sites • We’ve been experimenting with our implementation for multiple applications
Performance of CloudNet Live Migration over WANs Kernel SpecJBB TPC-W
Performance of CloudNet Live Migration over WANs Kernel SpecJBB TPC-W
AT&T delivers services globally across a footprint of 38 IDCs, including five ‘Super IDCs’…. UK Amsterdam NY Metro Japan Europe - 6 Super IDC: Amsterdam Other Locations: London, Birmingham, Frankfurt, Paris, Nice San Diego DC Metro North America - 23 Super IDCs: Annapolis,Piscataway, San Diego Other Locations: Boston, New York, Secaucus, Ashburn, Atlanta(2), Orlando, Miami, Chicago (2), Dallas (2), Nashville, Mesa, Los Angeles (2), San Jose, San Francisco, Seattle, Toronto Singapore Super IDCs Additional AT&T IDCs Asia/Pacific - 9 Super IDC: Singapore Other Locations: Bangalore, Hong Kong, Shanghai (3), Tokyo, Osaka, Sydney Additional expansions underway and planned for 2009Locations and dates are subject to change Page 29
“The Enterprise Cloud” Flexible Delivery Models • Hybrid … • Access to client, partner network, and third party resources • Public … • Efficient, resilient • Preserves capital • Access by subscription • Flexible, Agile, Open, Fast • Leverage broad experience base • Faster time to market • Private … • Limited access • Added security and privacy Cloud Services “The Enterprise Cloud” “Workload Distribution Network” • Use Cases … • Disaster Recovery • Portable Migration • Follow the Sun; Follow the Moon • Low Cost • Lowest Latency Page 30
Summary • Cloud Computing for enterprises requires: • Security • Transparency • Flexibility • CloudNet can help provide these features • Defines interface between cloud platform and network provider • Uses VPNs for secure, seamless connections • Employs virtualization at server, router, and network levels to improve agility and efficiency Current Work • Algorithms to optimize migration time, pause time and network bandwidth requirements for WAN migration • Network optimizations to reduce latency of WAN migration