Lecture 9: More Cloud Applications

Lecture 9: More Cloud Applications Xiaowei Yang (Duke University)

News: Buffalo as Data Center Mecca • $1.9 billion, at least 200 employees • Low-cost electric power, tax incentives, plenty of shovel-ready sites, cool climate

Review • Cloud Computing • Elasticity • Pay-as-you-go • Challenges • Security: co-residence, inference • Performance • Coarse-grained sharing • Lack of virtualized interface for specialized hardware

Today • Cloud Applications • Execution augmentation for mobile devices • Energy saving for mobile • Energy saving for desktops • Disaster recovery

The Case for Energy-Oriented Partial Desktop Migration NiltonBila†, Eyal de Lara†, MattiHiltunen, Kaustubh Joshi, H. Andr´esLagar-Cavillaand M. Satyanarayanan

Motivations • Offices and homes have many PCs • But, they areoften left running idle • PCs idle on average 12 hours a day • “Skilled in the art of being idle” by Nedevschi et al. in NSDI 2009 • 60% of desktops remain powered overnight • “After-hours power status of office equipment in the USA” by Webber, in Energy 2006

Why is it important? • Dell Optiplex 745 Desktop • Peak power: 280W • Idle power: 102.1W • Sleep power: 1.2W • If we put one to sleep when it is idle, the saving is (102.1-1.2)W.

Why do we leave desktops on? • Applications with always on semantics • Skype, IM, email, personal media sharing • Interspersed activities with idle periods • Lunch break • Chatting with colleagues

Related work • Full VM migration • LiteGreen, USENIX 2010 best paper • Encapsulate user session in VM • When idle, migrate VM to consolidation server and power down PC • When busy, migrate back to user’s PC User0 User1 Dom0 Xen

Partial VM migration • Idle VM only access partial memory and disk state (working set) • Migrate only the working set to a server • Potentially a cloud server • Cloud provider can further aggregate

Advantages • Small migration footprint • Client • Fast migration • Low energy cost • Network • Reduce bandwidth demand •   Server • More VMs per server

Feasibility Study • Can its desktop save energy by sleeping when an VM runs on the cloud? • Does the entire domain save energy by migrating idle sessions by sleeping?

Methodology • Prototyped simple on-demand migration approach with SnowFlock • Prepared a VM image, and run the VM • After five minutes, used SnowFlock to clone the VM • Monitor memory and disk page migration to cloneVM

Setup • Dell Optiplex 745 Desktop • 4GB RAM, 2.66GHz Intel C2D • Peak power: 280W • Idle power: 102.1W • Sleep power: 1.2W • VM Image: • Debian Linux 5 • 1GB RAM • 12 GB disk

Workloads

Memory Request Pattern • Spatial locality • Pre-fetching

Page Request Interval • 98% of request arrive in close succession

Potential Sleep Intervals

Energy Savings: an hour-long trace

Hourly Energy Savings: an overnight session • Saves 69% of energy

Memory footprint • A cloud node with 4GB of RAM can run ~30 VMs

Domain-wide Energy Savings

Annual Energy Savings • No partial migration

Annual Energy Savings • V = 23

Annual Savings

Open issues • Can it save cost? • Network • Cloud Rental • Frequent power cycling reduces hw life expectancy and limits power savings • Reduce number of sleep cycles and increase sleep duration • Predict page access patterns and prefetch • Leverage content addressable memory • Fast reintegration • Big Q: Can it be fast enough so that a user does not suffer a long delay? • Policies • When to migrate/re-integrate? • When does the desktop go to sleep? • On re-integration, should state be maintained in the cloud? For how long?

Disaster Recovery as a Cloud Service: Economic Benefits & Deployment Challenges Timothy Wood and Emmanuel Cecchet, University of Massachusetts Amherst; K.K. Ramakrishnan, AT&T Labs—Research; PrashantShenoy, University of Massachusetts Amherst; Jacobus van derMerwe, AT&T Labs—Research; ArunVenkataramani, University of Massachusetts Amherst

Datacenter Disasters • Disasters cause expensive application downtime • Truck crash shuts down Amazon EC2 site center (May 2010) • Lightning strikes EC2 data (May 2009) • Comcast Down: Hunter shoots cable (2008) • Squirrels bring down NASDAQ exchange (1987 and 1994)

DR Fits in the Cloud • Customer: pay-as-you-go and elasticity • Normal is cheap (fewer resources for backup than normal operations) • Rapidly scale up resources after disaster is detected • Provider: high degree of multiplexing • Customers will not fail at once • Can offer extra services like disaster detection

What is disaster recovery • Use DR services to prevent lengthy service disruptions • Data backups + failover mechanism • Periodically replicate state • Switch to backup site after disaster

DR Metrics • Recovery Point Objective (RPO): the most recent backup time prior to any failure • Recovery Time Objective (RTO): how long it can take for an application to come back online after a failure occurs • Time to detect failure • Provision servers • Initialize applications • Configure networks to connect

Performance • Have a minimal impact on the performance of each application being protected under failure-free operation • How can DR impact performance? • Consistency • The application can be restored to a consistent state • Geographic separation • Challenge: increasing network latency

DR Mechanisms • Hot Backup Site • Provides a set of mirrored stand-by servers that are always available • Minimal RTO and RPO • Use synchronous replication to prevent any data loss

Warm backup Site • Cheaply synchronize state during normal operations • Obtain resources on demand after failure • Short delay to resource provision and applications

Cost analysis study • Compare DR in Colocation center to Cloud • Colocation • pays for servers and space at all times • Cloud DR • Pays for resources as they are used

Case Study 1 • RUBiS: an ebay-like multi-tier web application • Three front ends • One database server • Only database state is replicated

Cost analysis • 99% Uptime cost (3 days of disaster per year)

Case 2: Data Warehouse • Post-disaster expensive due to high powered VM instance • Overall cheaper because 99% Uptime

RPO vs Cost Tradeoff • Flexible • Colo has a fixed cost regardless of RPO requirements

Cost Analysis Summary • Cloud DR’s benefits depend on • Type of resources to run application • Variation between normal and post-disaster costs • RPO and RTO requirements • Uptime • Cloud is better if post-disaster cost much higher than normal mode

Provider Challenges • How to maximize revenue? • Makes money from storage in normal case • But must pay for servers and keep them available for DR • Possible solutions • Spot instances (EC2 uses them) • Higher prices for higher priority resources • Correlated failures • Large disasters may affect many • Possible solutions • Decide provision using a risk model • Spread out customers

Mechanisms Needed for Cloud DR • Network reconfiguration • Application must be brought up online after moved to a backup site • May require setting up a private business network • Security and Isolation • VM migration and cloning • Restore an application after a disaster is handled • Cloud provider does not support VM migration in and out cloud yet

Summary • Cloud based disaster recovery • Can reduce cost • Up to 85% from a case study • Flexible tradeoff between cost and RPO

Forecast • Next lecture • Another cloud application for group collaboration • Monday is in fall break • Next Wednesday • Midterm • http://www.cs.duke.edu/courses/fall10/cps296.2/syllabus.html

Lecture 9: More Cloud Applications