Latency-aware and performance-preserving Power Capping

Latency-aware and performance-preserving Power Capping Arka Bhattacharya, David Culler (UCB) AmanKansal, SriramSankar, SriramGovindan (Microsoft)

What do I mean by power capping ? • Restrict server power consumption to a specific power budget , through manipulation of load or scaling of processor frequency.

Summary of the talk : • Data Centers need power capping. • Any power capping technique should be • Fast ; and • Ensure graceful degradation of performance • Related work has proposed power capping either through • frequency scaling, or • processor utilization capping • In an open system, using any one of these knobs might lead to a cascading failure • Hence, to maintain a stable system one needs to • Maintain desired power level through admission control, • Implement a frequency scaling governor for safety

Data Center Cost Analysis • James Hamilton’s 2010 figures for a 50k server, 8 MW facility. • Power Distribution and Cooling is close to ~20% of total data center budget - James Hamilton

Why do power capping ? • To under-provision UPS batteries/Generators • According to previous figure, annual cost of power distribution and cooling equipment is > $7m for about 50k servers • Current UPS provisions are mostly based on worst-case faceplate or spec-power ratings.

PDF of power consumption of a colo containing an online application Aggressive UPS provisioning Peak power Headroom CURRENT UPS PROVISIONING

Other reasons for doing power capping • Ensure circuit protection • Re-claim UPS re-charge budget • Shave off data center peak power usage (for data centers paying peak-pricing rates) • Differentiating service among data center apps • React to change in power supply from utility

Ramp Rate of power spikes 95th percentile S1 S2 Power consumption Power Spike = PowerConsumption(S2)-PowerConsumtion(S1) Ramp Rate = PowerSpike per sampling period Time

Power spikes in case of under-provisioned UPS Sampling rate = 30sec

For circuit protection : Latency Analysis of power-capping methods

Prior work in feedback based power-capping e.g • Power Budgeting for virtualized data centers – Lim et.al (ATC,2011) • Coordinated power control and performance management for virtualized server clusters –Wang et. Al (IEEE TOPDS,2010) • Ship: Scalable hierarchical power control for large-scale data centers (PACT, 2009) • Dynamic Voltage Scaling in multitier web servers with end-to-end delay control – Horvath et.al (IEEE Trans. Comput. 2007)

Worst-case power rise in Servers • Power rise in an Intel Xeon L5520 Server • Fastest observed power rise (from min to max): 100ms • Power rise in an Intel Xeon L5640 Server • Fastest observed power rise (from min to max) : 200ms

Methods to decrease server power • DVFS (Dynamic Voltage and Frequency Scaling) : • Reduces the frequency and voltage the processor runs at. • Processor utilization capping : • Imposes a certain number of idle cycles on the CPU , while running at the same frequency • Admissions control : • Reduces the amount of network traffic that the server serves.

Time Line of events Central controller gives actuation command Settings changed in hardware Command received by Daemon 3 5 1 2 4 time Function Call returns Command reaches destination server Power decreases

Central controller gives actuation command Settings changed in hardware Command received by agent Freq. scaling : 200-350ms Proc. Capping : ~2 sec Admission Control : > 2sec ~20ms < 1ms time < 1ms <40-60ms in current implementation (using user-level code) Command reaches destination server Function call returns Power decreases * Still to be measured accurately

If UPS capacity < Peak Power of IT equipment , one needs to be implement non-feedback governor (based on DVFS/proc capping/hardware capping) Take-away 1:

Why do we need Network Admissions Control ? • In an open system , if a frequency scaled server is stuck with more work than it can handle , • Server latency goes up ( because of filled queues ) • Requests getting dropped are retried by TCP stack of clients. • The entire load on the system keeps increasing = > cascading failure • In a closed system, • An implicit admissions control takes place because new requests are not issued until old requests are served. • Latency increases, but does not dive into cascading failure.

Experiment Setup to check frequency scaling effects on closed and open systems Open System => a Xeon X5550 server running wikipedia benchmark on apache web server , on linux Closed System => StockTrader Application on Windows Server2008, Xeon L5520 . In both systems , load was generated by 3-5 external servers.

Open loop :Effect of frequency scaling (1)

Open Loop :Effect of frequency scaling (2)

Open Loop :Effect of frequency scaling (3)

Frequency scaling : demarcating stable and unstable regions for each frequency

Unstable open loop system due to frequency scaling Frequency scaling applied here

Open system power capping Admission control required Power reduction required

In an open system, while doing power capping one must perform admissions control to maintain stability. Take-away 2:

Capping a closed online application Experiment : Generate constant load for the server. Lower the processor cap gradually and observe effects on latency

Relation between power, response rate and latency

Admissions Control effectiveness

In an closed system, doing admissions control along with frequency scaling leads to almost same throughput but with better latency Take-away 3:

Admissions control Assumptions : known relation between network traffic (T) and power consumption (P). Problem Statement : Reduce traffic of an application from current state T1 to T2 , such that power goes from current state P1 to P2. Challenges : • Traffic changes every instant. • A request from a user may spawn multiple flows. • How to do it in an app-agnostic way ?

Admissions Control Tradeoffs at each layer Doing Admissions Control at Layer 2 : • Layer 2: • Pros : Simple implementation . • Cons : All connections get hurt equally.

Admissions control continued • Layer 3 : • Pros : • cuts off entire requests, spanning across multiple flows. • Easy to configure in a firewall. • Does not need app-level compliance • Con : • Coarse admissions control due to NATs.

Admissions control continued • Layer 4 : • Pro : Can do finer grained admissions control than IP. • Cons : A webpage may be served over multiple flows , and different flows of the same request might get different service. • Layer 7: • Pro : Has most insight into app working . Can do fine-grained admissions control • Con : data center needs app-compliance / load balancer compliance .

Future work • Evaluate tradeoffs between doing Network Admissions Control at different layers. • Devise and implement algorithms to do admissions control at various layers.

Thank you

Latency-aware and performance-preserving Power Capping

Latency-aware and performance-preserving Power Capping

Presentation Transcript

Power-aware scheduling

Power Aware Program Transformations

Sequence-Aware Privacy Preserving Data-Leak Detection

Performance and Power Aware CMP Thread Allocation

Power Capping Via Forced Idleness

High-Performance, Power-Aware Computing

Power Aware Computing

Profiling, Prediction, and Capping of Power in Consolidated Environments

Power-Aware Computing 101

Power-Aware Placement

Semantics-Aware Performance Optimization

Power, Temperature, Reliability and Performance - Aware Optimizations in On-Chip SRAMs

Power Aware Software Architecture

PLATO: Predictive Latency-Aware Total Ordering

Power Aware Synthesis

Power-Aware Architecture

Power-Aware Placement

Power-Aware Microprocessors

Power-aware scheduling

Power Aware Computing

High-Performance, Power-Aware Computing

Power Aware Computing