1 / 59

Cost of Data Centers

CMPT 880: Large-scale Multimedia Systems and Cloud Computing. Cost of Data Centers. Queenie Wong. Introduction. “ Facebook placed over 1 billion to a new datacenter in Iowa...” “Google spent $400 million to expand its datacenter, bringing its total spent $1.5 billion in the area…”

emmy
Télécharger la présentation

Cost of Data Centers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMPT 880: Large-scale Multimedia Systems and Cloud Computing Cost of Data Centers Queenie Wong

  2. Introduction “Facebook placed over 1 billion to a new datacenter in Iowa...” “Google spent $400 million to expand its datacenter, bringing its total spent $1.5 billion in the area…” GiGaom Tech News • Does a datacenter cost over billion dollars?? • How to calculate and model the cost of building and operating a datacenter? • What is the total cost of ownership (TCO) of a datacenter? • How to reduce the cost effectively? Queenie Wong

  3. Modeling Costs • Simplified model • Capital expense (Capex) of datacenter and server • Operational expense (Opex) of datacenter and server Total Cost of Ownership (TCO) = datacenter depreciation & Opex + server depreciation & Opex • Cost of software and administrators are omitted in calculation • Focus on running the physical infrastructure • Costs vary greatly Queenie Wong

  4. Capital Costs • Datacenter construction costs • Server costs • Infrastructure costs • Facilities dedicated to consistent power delivery • Networking • Switches, routers, and load balancers, etc Queenie Wong

  5. Datacenters • Datacenter construction costs • Design, size, location, reliability, and redundancy • Depreciate over 10-15 years • Interest rate • Most large DC cost $12-15/W to build, the very small or large ones cost more • Approximately 80% goes toward power and cooling, the remaining 20% toward the general building and site construction Queenie Wong

  6. Datacenter • Example • Cost $15/W • Amortized over 12 years • $1.25/W per year • $0.10/W per month • Financing at 8%, adding $0.06/W • Total of $0.16/W per month Queenie Wong

  7. Servers • Server costs • Depreciate over 3-4 years (shorter lifetime) • Interest rate • Characterize server costs per watt • Example • $4000 server with peak power consumption of 500W • $8/W • Depreciated 4 years • $0.17/W per month • Financing at 8% • Adding 0.03/W per month • Total cost $0.20/W per month Queenie Wong

  8. Infrastructure Cost • Facilities dedicated to consistent power delivery and to evacuating heat • Generators, transformers, and UPS systems Queenie Wong

  9. Networking • Links & transits • Inter-data center links between geographically distributed data centers • Traffic to Internet Service Providers • Regional facilities reach wide area network interconnection sites • Equipment • Switches, routers, load balancers Queenie Wong

  10. Operational Costs • Datacenter • Geographic location factors (Climate, taxes, salary levels) • Design and age • Security • Server • Hardware maintenance • Power Queenie Wong

  11. Power • US Environmental Protection Agency (EPA) 2007 report predicted the power consumption of DC could increase to 3% in 2011 • In 2010, datacenters in US consumed between 1.7% and 2.2% of total US electricity consumption that is much lower than EPA’s predication [Koomey, Analytics Press] • Google’s datacenters consumed less than 1% of electricity used by datacenters worldwide • Cost of electricity is still significant Queenie Wong

  12. Case Study A: High-end Servers Queenie Wong

  13. Case Study B: Low-end Servers Queenie Wong

  14. Real-World Datacenter • Costs are even higher than modeled in real-world • The model assume datacenter is 100% utilization with 50% CPU utilization • Empty space for future development • Supply maximum power consumption to server instead of the average value they consume to avoid overheat and trip a breaker (shut off) • Reserves 20–50% • For example • A DC with 10MW of critical power will often consume just 4-6 MW Queenie Wong

  15. Case Study C: Partially Filled Datacenter 3yr TCO = $12,968 Queenie Wong

  16. Energy Efficiency • Datacenter facilities • 30% utilization • Servers • 30% utilization • Power Usage Effectiveness (PUE) • A state of the art DC facilities have PUE of 1.7 • Inefficient DC facilities have PUE of 2.0 to 3.0 • Google have PUE of 1.12 recently Queenie Wong

  17. Resilience • Built at hardware level to mask failure • UPS • Generators • Proposed: build at system level • Eliminate expensive infrastructure (generators, UPS) • Failure unit becomes an entire datacenter • The workload of the failed DC can be distributed across sites Queenie Wong

  18. Agility • Any server can be dynamically assigned to any service anywhere in the datacenter • Dynamic growing and shrinking of server pools while maintaining high level of security and performance isolation between services • Rapid virtual machine migration • Conventional datacenter design against agility • Fragmentation of resources • Poor server to server connectivity Queenie Wong

  19. Design Objectives for agility • Location-independent Addressing • Decouple the server’s location from its address • Any server can become part of any server pool • Uniform Bandwidth and Latency • Service can be distributed arbitrarily in DC • No bandwidth choke point • Achieve high performance regardless of location Queenie Wong

  20. Design Objective for Agility • Security and Performance Isolation • Any server can be part of any service • Services are sufficiently isolated • Maintain high level of security • No impact on another service • E.g. Denial-of-Service attacks, configuration errors Queenie Wong

  21. Geo-Distribution • Goal: maximize performance • High speed and low latency • Google: 20% revenue loss • Caused by 500 msecs delay in display search result • Amazon: 1% sales decrease • Caused by additional 100 msecs delay • Strong motivation for building geographically distributed DCs to reduce delays Queenie Wong

  22. Placement • Optimal placement and size • Diverse locations • Reduce latency between DC and clients • Helps with redundancy, not all areas lose power • Size • Determined by • local demands • physical size • network cost • Maximum benefits Queenie Wong

  23. Geo-Distributing • Resilience at System Level • Allow entire DC to fail • Eliminate expensive infrastructure costs, such as UPS systems and generators • Turning geo-diversity into geo-redundancy • Requires applications distributed across sites and frameworks to support • Balance between communication cost and service performance Queenie Wong

  24. Cost saving approaches • Architectural redesigns • Maximizing utilization of datacenter • Energy-aware load balancing algorithm • Minimizing electricity cost • Energy cost-aware routing scheme • DC power • Virtualization • New cooling technologies • Multi-core servers Queenie Wong

  25. Internet-Scale Systems • Large distributed systems with request routing and replication incorporated • Able to manage millions of users concurrently • Composed of tens or even hundreds of sites • Tolerate faults • Dynamic mapping clients to servers • Replicate the data at multiple sites if necessary Queenie Wong

  26. Energy Elasticity • Assumption: Elastic clusters • Energy consumed by a cluster depends on the load placed on it • Ideal: consume no power in the absence of load • Reality: about 60% of peak in the absence of load • Savings can be achieved from routing power demand away from high priced areas, turning off under-utilized components • Key: System’s energy elasticity is turned into energy savings Queenie Wong

  27. Energy cost-aware routing • System requirements • Fully replicate • Clusters with energy elasticity • Electricity prices have temporal and geographic disparity • Map client requests to clusters where the total electricity cost of the system is minimized under certain constraints • Applicable to both large and small systems Queenie Wong

  28. Price variation • Geographic • US electricity market differ regionally • Different generation sources (coal, natural gas, nuclear power, etc) • Taxes • Temporal • Real-time markets: prices are calculated every 5 mins • volatile Queenie Wong

  29. Constraints • Latencies • High service performance with low client latencies • E.g. Map a client’s request to a cluster within the max radical geographic distance • Bandwidth • Temporal and spatial variation • Additional cost when exceeding the limit Queenie Wong

  30. Simulation • Data • Hourly electricity prices (Jan 2006 – Mar 2009) • Akamai workload data set at public clusters in 18 US cities • No sufficient network distance info, only coarse measurement • Routing schemes • Akamai’s original allocation • Price-conscious optimizer Queenie Wong

  31. Price-conscious Optimizer • Map a client to a cluster with lowest prices which within some predefined max radial distance • Consider another cluster if the selected cluster is nearing its capacity • Map a client to the closest cluster when no clusters fall within max radial distance, and consider any other nearby clusters • Controlled by two parameters • Price differentials threshold (minimum price difference) • Distance threshold (maximum radical geographic distance) Queenie Wong

  32. Simulation Results • Reduced energy cost • by at least 2% without any increase in bandwidth costs or significant reduction in performance • by 30% with relaxed bandwidth constraints • around 13% with strict bandwidth constraints • A dynamic solution (without distance constraint) beat a static solution (place all servers in cheapest market) without bandwidth constraints • 45% versus 35% savings Queenie Wong

  33. Cons • Only applicable to some locations with temporal and spatial electricity price variations • Increase in routing energy • Delay • reduction in client performance • Bandwidth • May increase bandwidth cost • Complexity Queenie Wong

  34. VL2 • Practical network architecture supports agility • Uniform high capacity between servers • Traffic flow should be limited only by the network-interface cards, not the architecture of the network • Performance isolation between services • Traffic of one service should not be affected by traffic of any other service • Virtual Layer 2 - Just as if each service was connected by a separate physical switch Queenie Wong

  35. VL2 • Ethernet layer-2 semantics • Flat addressing • allow services to be placed anywhere • Load balancing to spread traffic uniformly across the DC • Just as if servers were on a LAN - where any IP address can be connected to any port of an Ethernet switch • Configure server with whatever IP address the service expects Queenie Wong

  36. VL2 Addressing Scheme • Separate server names from locations • Two separate address families • Topologically significant Locator Addresses (LAs) • Flat Application Addresses (Aas) Queenie Wong

  37. FORTE • FORTE: Flow Optimization based framework for request-Routing and Traffic Engineering • Carbon emissions of a DC are depended on it’s electricity fuel in the region • Dynamically controls the user traffic directed to DC by weighting each request’s effect on three metrics: • Access latency • Carbon footprint • Electricity cost Queenie Wong

  38. FORTE • Allow operators to balance performance with cost and carbon footprint by applying the linear programming approach to solve the user assignments problem • Then, determine if data replication or migration to the selected DC is needed • Results: • Reduce carbon emission by 10% without increasing the mean latency nor the electricity bill Queenie Wong

  39. TIVC • TIVC: Time-Interleaved Virtual Clusters • Problems: • Current resource reservation model only provisions CPU and memory resources • Cloud applications with time-varying bandwidth nature • A new virtual network abstraction to specify the time-varying network requirement of cloud applications • Increase utilization of both network resources and VM Queenie Wong

  40. TIVC • Compared to virtual cluster (VC), TIVC reduced the completion time significantly Queenie Wong

  41. Energy Storage Devices • Different types of Energy Storage Devices (ESD) • Lead-acid batteries (common used in DCs) • Ultra-capacitors (UC) • Compressed Air Energy Storage (CAES) • Flywheels (gaining acceptance in DC) • Different trade-offs between their power, energy costs, lifetime, energy efficiency • Hybrid combinations may be more effective • Place different ESDs at different levels of power hierarchy according to their advantages Queenie Wong

  42. Lyapunov Optimization • Online control algorithm to minimize the time average cost • Make use of UPS to store electricity • Store the electricity when prices are low and draw it when the prices are high • No suffer from the “curse of dimensionality” as dynamic programming • Without requiring any knowledge of the system statistics • Easy to implement Queenie Wong

  43. Summary • Maximize utilization of datacenters • Minimize cost for electricity • Architectural redesign of datacenter, network and server • Geo-redundancy to mask failure of datacenter • Optimization of resources • Trends • High demand of Low-end server in order to lower hardware cost due to low utilization of datacenter • Electricity costs dominate TCO • Power & Energy Efficiency Queenie Wong

  44. Review Queenie Wong

  45. TCO Comparisons Queenie Wong

  46. TCO Breakdown Queenie Wong

  47. Geo-Redundancy • In the case of datacenter failure, requests can be directed to a different datacenter • Requirements • Data replication across sites • Special software and framework to support • Pros • Eliminate the cost of infrastructure redundancy • Cons • Expensive inter-data center communication costs • Reliability versus Communication costs Queenie Wong

  48. Energy Elasticity • Assumption: Elastic clusters • Energy consumed by a cluster depends on the load placed on it • Ideal: consume no power in the absence of load • Reality: about 60% of peak in the absence of load • Savings can be achieved from • routing power demand away from high priced areas • turning off under-utilized components • Key: System’s energy elasticity is turned into energy savings Queenie Wong

  49. Cost-aware Routing: Case 1 • Map a client to a cluster with lowest prices which within some predefined max radial distance • Consider another cluster if the selected cluster is approaching its capacity 1500 C4:43 distance threshold = 1500 price threshold = 5 C2:40 A C1:50 C3:35 Cluster: Electricity Price Queenie Wong

  50. Cost-aware Routing: Case 2 • Map a client to the closest cluster when no clusters fall within max radial distance • Consider any other nearby clusters < 50 km 1500 distance threshold = 1500 price threshold = 5 C4:43 B C3:35 Queenie Wong

More Related