CSE 691: Energy-Efficient Computing Lecture 4 SCALING: stateless vs. stateful

CSE 691: Energy-Efficient ComputingLecture 4SCALING: stateless vs. stateful Anshul Gandhi 1307, CS building anshul@cs.stonybrook.edu

autoscalepaper

Data Centers Facebook data center in Oregon • Collection of thousands of servers • Stores data and serves user requests

Power is expensive • Annual US data centers: 100 Billion kWh = $ 7.4 Billion • As much CO2 as all of Argentina • Google investing in power plants Most power is actually wasted! [energystar.gov, McKinsey & Co., Gartner]

A lot of power is actually wasted • Servers are only busy 30% of the time on average, but they’re often left on, wasting power Setup cost 260 s 200W • (+more) • BUSY server: 200 Watts • IDLE server: 140 Watts • OFF server: 0 Watts Intel Xeon E5520 dual quad-core 2.27 GHz Provisioning for peak ? Demand Time

Problem statement • Given unpredictable demand,how to provision capacity to minimize power consumption without violating response time guarantees (95%tile) ? Turn servers off: save power Release VMs: save rental cost Repurpose: additional work done ? Demand Time

Experimental setup 7 servers (key-value store) 1 server (500 GB) 28 servers Response time: Time taken to complete the request • A single request: 120ms, 3000 KV pairs

Experimental setup 7 servers (key-value store) 1 server (500 GB) 28 servers • Goal: Provision capacity to minimize power consumptionwithout violating response time SLA • SLA: T95 < 400ms-500 ms

AlwaysOn • Static provisioning policy • Knows the maximum request rate into the entire data center (rmax = 800 req/s) • What request rate can each server handle? 1 server 95% Resp. time (ms) 400 ms arrival rate (req/s) 60 req/s

AlwaysOn T95 = 291ms Pavg= 2,323W

Reactive T95 = 487ms Pavg= 2,218W T95 = 11,003ms Pavg= 1,281W x= 100%

Predictive Use window of observed request rates to predict request rate at time (t+260) seconds. Turn servers on/off based on this prediction. Linear Regression T95 = 2,544ms Pavg= 2,161W Moving Window Average T95 = 7,740ms Pavg= 1,276W

AutoScale • Predictive and Reactive are too quick to turn servers off • If request rate rises again, have to wait for full setup time (260s) Heuristic Energy(wait) = Energy(setup) Pidle ∙ twait = Pmax ∙ tsetup Wait for some time (twait) before turning idle servers off Two new ideas Load balancing? 95% Resp. time 10 jobs/server “Un-balance” load: Pack jobs on as few servers as possible without violating SLAs jobs at server [Gandhi et al., Allerton Conference on Communication, Control, and Computing, 2011] [Gandhi et al., Open Cirrus Summit, 2011]

Results Reactive Reactive AutoScale AutoScale [Gandhi et al., International Green Computing Conference, 2012] [Gandhi et al., HotPower, 2011]

cachescalepaper

Application in the Cloud Application Tier Database λ req/sec Caching Tier λDB req/sec Load Balancer Why have a caching tier? Reduce database (DB) load (λDB << λ)

Application in the Cloud Application Tier Database λ req/sec Load Caching Tier λDB req/sec Load Balancer $$$ > 1/3 of the cost [Krioukov`10] [Chen`08] Why have a caching tier? Reduce database (DB) load Reduce latency [Ousterhout`10] Shrink your cache during low load (λDB << λ)

Will cache misses overwhelm the DB? 10 50 5 Application Tier = 100 = 10 Database λ req/sec λp Caching Tier λDB req/sec Load Balancer Hit rate p = 0.9 0.8 λ(1-p) = λDB Goal: Keep λDB = λ(1-p)low If λ drops (1-p) can be higher p can be lower SAVE $$$

Are the savings significant? • It depends on the popularity distribution Small decrease in hit rate Zipf Hit rate, p Uniform Zipf Uniform Small decrease in caching tier size Large decrease in caching tier size % of data cached

Is there a problem? • Performance can temporarily suffer if we lose a lot of hot data Mean response time (ms) Response time stabilizes Shrink the cache Time (min)

What can we do about the hot data? Retiring Start state End state Caching Tier Caching Tier Caching Tier Caching Tier Option 1 Transfer Primary Caching Tier Option 2 We need to transfer the hot data before shrinking the cache

CSE 691: Energy-Efficient Computing Lecture 4 SCALING: stateless vs. stateful