Power Control for Data Centers

ECE 692 Topic Presentation • Power Control for Data Centers Ming Chen Oct. 8th, 2009

Why power control in Data Centers? Power is one of the most important computing resources. • Facility under-utilized • Cost of power facilities • Economically amortize • investment. • Provision to fully utilize power facility. • Facility over-utilized • Dangerous • System failure and • overheating • Power below the capacity

SHIP: Scalable Hierarchical Power Control • for Large-Scale Data Centers Xiaorui Wang, Ming Chen University of Tennessee, Knoxville, TN Charles Lefurgy, Tom W. Keller IBM Research, Austin, TX

Introduction • Data centers are expanding to meet new business requirement. • Cost-prohibitive to expand the power facility. • Upgrades of power/cooling systems lag far behind. • Example: NSA data center • Power overload may cause system failures. • Power provisioning CANNOT guarantee exempt of overload. • Over-provisioning may cause unnecessary expenses. Power control for an entire data center is very necessary.

Challenges • Scalability: One centralized controller for thousands of servers? • Coordination: if multiple controllers designed, how do they interact with each other? • Stability and accuracy: workload is time-varying and unpredictable. • Performance: how to allocate power budgets among different servers, racks, etc.?

State of The Art • Reduce power by improving energy-efficiency : [Lefurgy], [Nathuji], [Zeng], [Lu], [Brooks], [Horvath], [Chen] • NOT enforce power budget. • Power control for a server [Lefurgy], [Skadron], [Minerick], arack, [Wang], [Ranganathan], [Femal] • Cannot be directly applied for data centers. • No “Power” Struggles presents a multi-level power manager. [Raghavendra] • NOT designed based on power supply hierarchy • NO rigorous overall stability analysis • Only simulation results for 180 servers

What is This Paper About? • SHIP: a highly Scalable Hierarchical Power control architecture for large-scale data centers • Scalability: decompose the power control for a data center into three levels • Coordination: hierarchy is based on power distribution system in data centers. • Stability and accuracy: theoretically guaranteed by Model Predicative Control (MPC) theory. • Performance: differentiate power budget based on performance demands, i.e. utilization.

Power Distribution Hierarchy • A simplified example • for a three-level data • center • Data center-level • PDU-level • Rack-level • Thousands of servers • in total

Control Architecture RPC Rack Power Controller RPC PM PM PDU Power Controller Power Monitor … Utilization Monitor Frequency Modulator PDU-Level Power Monitor UM FM UM FM UM FM HPCA08 paper This paper

PDU-level Power Model • System model: the total power of PDU the power change of rack i • Uncertainties: the change of power budget for rack i gi is the power change ratio. • Actual model:

Model Predictive Control (MPC) • Control objective: • Design steps: • Design a dynamic model for the controlled system. • Design the controller. • Analyze the stability and accuracy.

MPC Controller Design Constraints System Model Power budget Least Squares Solver Measured power Reference Trajectory Cost Function Budget changes Ideal trajectory to track budget Tracking error Control penalty

Stability • Local Stability • giis assumed to be 1 at design time. • giisunknown a priori. • 0 < gi< 14.8:14.8 times of the allocated budget • Global Stability • Decouple controllers at different levels by running • them in different time scales. • The period of upper-level control loop > the • settling time of the lower-level • Sufficient but not necessary

System Implementation • Physical testbed • 10 Linux servers • Power meter (Wattsup) • error: • sampling period: 1 sec • Workload: HPL, SPEC • Controllers: • period: 5s for rack, 30s for PDU • Simulator (C++) • Simulate large-scale data centers in three levels. • Utilization trace file from 5,415 servers in real data centers • Power model is based on experiments in servers.

Precise Power Control (Testbed) • Power can be precisely controlled at the budget. • The budget can be reached within 4 control periods. • The power of each rack is controlled at their budgets. • Budgets are proportional to • . • Tested for many power set points (See the paper for more results.)

Power Differentiation (Testbed) Budgets differentiated by utilization; CPU: 100% CPU: 80% CPU: 50% Budget allocation proportional to estimated max consumptions; • Capability to differentiate budgets based on workload to improve performance • Take the utilization as the optimization weights. • Other differentiation metrics: response time, throughput

Simulation for Large-scale Data Centers • 6 PDU, 270 racks • Real data traces • 750 kW • Randomly generate • 3 data centers • Real data traces

Budget Differentiation for PDUs PDU5 PDU2 • Power differentiation in large-scale data centers; • Minimize the difference with estimated max power consumption. • Utilization is the weight. • The difference order is consistent with the utilization order.

Scalability of SHIP Overhead of SHIP The max scale of centralized Execution time of the MPC controller Vs. the # of servers

Conclusion • SHIP: a highly Scalable HIerarchicalPower control architecture for large-scale data centers • Three-levels: rack, PDU, and data center • MIMO controllers based on optimal control theory (MPC) • Theoretically guaranteed stability and accuracy • Discussion on coordination among controllers • Experiments on a physical testbed and a simulator • Precise power control • Budget differentiation • Scalable for large-scale data centers

Power Provisioning • for a Warehouse-sized Computer Xiaobo Fan, Wolf-Dietrich Weber, Luiz Andre Barroso Acknowledgments: The organization order and contents of some slides are based on Xiaobo Fan’s slides in pdf.

Introduction utilization Electricity < $0.8/watt-year Power facilities $10-$20/watt 0.85 • Strong economic incentives to fully utilize facilities • Investment is best amortized. • Upgrades without any new power facility investment 0.5 ~10 ~18 years • Run risk of outages or costly violations of SLA. • Power provisioning given the budget

Reasons for Facility Under-utilization • Staged deployment • new facilities are rarely fully populated • Fragmentation • Conservative machine power rating (nameplate) • Statistical effects • Larger machine population, lower probability of simultaneous peaks • Variable load

What is This Paper About? • Investigate over-subscription potential to increase power facility utilization. • A light-weight and accurate model for estimating power • Long-term characterization of simultaneous power usage of a • large number of machines • Study of techniques for saving energy as well as peak power. • Power capping (physical testbed) • DVS (simulation) • Reduce idle power (simulation)

Main Supply Transformer ATS Generator Switch Board 1000 kW UPS UPS STS PDU … STS 200 kW PDU 50 kW Panel Panel Circuit 2.5 kW Rack Data Center Power Distribution Data center level 5-10 PDUs PDU level 20-40 racks Rack level 40-80 servers

Power Estimation Model • Direct measurements are not always available. • Input: CPU utilization • Models: • Pidle+(Pbusy – Pidle)u • Pidle+(Pbusy – Pidle)(2u-ur) • Measure and derive <Pidle ,Pbusy, r> • Model is predicted for each family of machines. • Greater interest is for a group of machines.

Model Validation • PDU-level validation example (800 machines) • Almost constant offset • Loads not accounted in the model: networking equipments. • Relative error is below 1%.

Analysis Setup • Data center setup • Pick up more than 5,000 servers for each workload. • Rack: 40 machines,PDU: 800 machines, Cluster: 5000+ • Monitoring period: 6 months every 10 mins • Distribution of power usage • Aggregate power at each time interval at different levels. • Normalized to aggregated peak power

Webmail 92% 86% 88% • Higher level, narrower range • More difficult to improve facility utilization in lower levels. 65% 72% • Peak lowers as more machines are aggregated. • 16% more machines can be deployed.

Websearch 98% 93% 98% 93% • Higher level, narrower range • More difficult to improve facility utilization in lower levels. 45% 52% • Peak lowers as more machines are aggregated. • 7% more machines can be deployed.

Real Data Centers • Clusters have much narrower dynamic range compared to racks. • Clusters peak at 72%. • 39% more machines • Mapreduce has the similar results.

Summary of Characterization • Average power: utilization of the power facilities • Dynamic range: difficulty to improve facility utilization • Peak power: potential of deployment over-subscription

Power Capping CDF CDF Power Time in power capping 1.0 1.0 Power saving Time Power Power • Small fraction of time in power capping • Substantial saving in peak power • Provide a safety valve when workload is unexpected.

Results for Power Capping • For workload with loose SLA or low priority • Websearch and Webmail are excluded; • De-scheduling tasks or DVFS

CPU Voltage/Frequency Scaling • Motivation • A large portion of dynamic power is consumed by CPU. • DVS is widely available in modern CPUs. CPU power • Method • Oracle-style policy • Threshold: 5%, 20%, 50% • Simulation • CPU power is halved when DVS is triggered. threshold utilization

Results for DVS • Energy saving is larger than peak power reductions. • Biggest saving in data centers. • Benefits vary with workloads

Lower Idle Power • Motivation • Idle power is high. (more than 50% of peak) • Most of time is in • non-peak activity level. • What if idle power is • 10% of peak? CPU power Peak 0.6 0.1 utilization • keeping peak power • unchanged. • Simulation

Conclusions • Power provisioning is important to amortize facility investment. • Load variation and statistical effects lead to facility under-utilization. • Over-subscribing deployment is more attractive in cluster level than rack level. • Three simple strategies to improve facility utilization: power capping, DVS, and lower idle power

Comparison of the Two Papers

Critiques • Paper 1 • Workload is not typical in real data centers. • Power model may include CPU utilization. • No convincing baseline is compared. • Paper 2 • Power provisioning Vs. performance violations • Power model is workload-sensitive. • Estimation accuracy in rack-level? • Quantitative analysis on idle power and peak power reduction

Thank you !

Power Control for Data Centers