Virtualizing Mission-Critical Apps 1PM EST, 3/29/2011 Ilya Mirman Philip Thomas

Virtualizing Mission-Critical Apps1PM EST, 3/29/2011Ilya MirmanPhilip Thomas

Agenda • The Rise of “The Virtualization Chasm” • 3 Fundamental inefficiencies • Best practices • Live demonstration

Background

Before Virtualization Excesscapacityto keep utilizationunder 80% Peak CPU Workload • Traditional IT guarantees apps’performance by • Dedicating physicalmachines (PM) to apps • Provisioning sufficient capacityto service peak loads • Consider an app requiring16 cores, 8GB memory and 10kIOPS (IO Per Sec) IO bandwidthto service its peaks 16 CPUcapacity16 cores 14 12 IO capacity: 10k IOPS 10 Capacity 8 6 Memory capacity: 8 GB 4 Mem 2 CPU IO PM

Over-Provisioning Waste Capacity over-provisioned for peak demands 16 14 12 10 Wasted capacity Capacity 8 6 4 Average utilization: 10% 2 PM Workloads are ‘bursty’: Average/peak is often under 10% Dedicating hardware wastes the slack capacity between average & peak

Virtualization is Set to Resolve This Waste Consolidate workloads into shared PMs This increases average utilization additively But it also increases interference among VMs E.g., Peak traffic of VM1 can interferewith CPU availability for other VMs PMs Peak Workloads of VMs Consolidate into shared PMs 8 6 4 2 VM1 VM2 VM3 VM4 VM5 VM6 VM7 VM8 VM9 VM10

VMs Compete for Resources Best-effort resource allocations (vs. dedicated) VMs get their allocations, if capacity is available VMs experience interference when capacity is insufficient Interference can create congestion, bottlenecks and delays Performance-insensitive apps can tolerate interference Permit simple, risk-free virtualization But mission-critical apps are highly vulnerable to interference!

The Rise of “The Virtualization Chasm” Performance- Insensitive Apps Production Apps “The Virtualization-Chasm” ROI Virtualization 1.0 Virtualization 2.0 20% 40% 80% 100% Percentage Apps Virtualized • Virtualization 1.0: Virtualize performance-insensitive apps • E.g., Print servers, non-critical web apps (The low-hanging fruits) • 20%-30% of enterprise apps • Virtualization 2.0: Virtualize production apps • The remaining 70%-80% important/critical production apps

Virtualizing Mission-Critical Apps

The Key Challenge: Ensuring That ProductionApps Get Their Resources Interference results from statistical over-commitment Apps’ demands can exceed capacity momentarily Interference may be controlled by two mechanisms Resource allocation: protect apps against over-commitment Workload placement: move workloads to minimize interference Let’s take a look at recommendations from the hypervisor vendors…

VMWare Best Practices: Managing Productions Apps Performance Avoid Over-Commitment: Assure PeakUtilization: “For performance-critical Exchange virtual machines (i.e., production systems),try toensure the total number of vCPUs assigned to all the virtual machines is equalto or less than the total number of cores onthe hostmachine.” “It is recommended that standalone servers…be designed tonot exceed 70% utilization duringpeak period.” Best Practice Guide to Exchange Server Virtualization: http://www.vmware.com/files/pdf/Exchange_2010_on_VMware_-_Best_Practices_Guide.pdf

VMWare Best Practices: Managing Productions Apps Performance VMWare Production Apps Strategy Rests on 2 Rules: VMs running production apps should ensure that: “Resource allocationsare sufficient to serve peak demands.” R-I guarantees that an app may get its peak demands served, if capacityis available. R-I “Aggregate allocationsdo not exceed thePM capacity.” R-II guarantees that the capacity allocation will be available. R-II i.e., if VM1 and VM2 each need 4 vCPUs, we need a PM with ≥8 CPUs!

Wait….Really? Then why virtualize? • Though there’s no sharing of resources, still enjoy the other benefits of virtualization (app isolation, VM set-up, back-up, etc.) “Resource allocationsare sufficient to serve peak demands.” R-I guarantees that an app may get its peak demands served, if capacityis available. R-I “Aggregate allocationsdo not exceed thePM capacity.” R-II guarantees that the capacity allocation will be available. R-II

Virtualization Can Result in3 Fundamental Inefficiencies 1. 2. 3. Over-provisioning inefficiency Workload packing inefficiency Non-adaptive control inefficiency These fundamental inefficiencies are considered next…

Over-provisioning Inefficiency

How to Avoid Over-Provisioning Waste? • To Avoid Waste: Increase average workload withoutincreasing reservations • Add performance-insensitive apps with high average workload • E.g., consolidate spam-filter apps, email archival apps alongside mission-critical apps • Need additional best practice rule: Smart consolidation Best Practice #1: Maintain a consolidation-balance between performance-sensitive and insensitive workloads

Workload-Packing Inefficiency

A Greatly Simplified Example PM2 PM3 Virtualized Workloads CPU capacity: 16 cores 8 Memory capacity: 8 GB 6 4 IO capacity: 10k IOPS 2 16 VM6 VM5 VM4 VM3 VM2 VM1 14 12 Manual Ad-Hoc Workload Assignment 10 8 6 4 2 PM1

What If We Get New VMs? 16 14 12 10 8 16 6 14 4 12 2 10 8 PM1 6 4 2 PM1 PM2 PM2 PM4 PM5 PM3 PM3 8 6 4 2 VM7 VM8 VM9 VM10 Ad Hoc Assignment • Can we do better? • Optimized assignment uses 40% less resources (3 PM vs. 5)

What Can We Learn from This Example? Changes may require (re-)assignment of workloads Even a trivialized example can be very complex Complexity and waste can grow dramatically When the number of VMs increases When physical machines vary When there are constraints (e.g., storage access, security policies) When the rate of changes is high Ad hoc processes can lead to costly inefficiencies Planning and workload placement must consider all workload types (not just CPU)

Overcoming the Packing Inefficiency Use improved workload placement algorithms Look holistically at all workloads and resources Exploit the flexibility of performance-insensitive workloads Exploit the dynamics of workloads peaks & troughs Best Practice #2: Use improved workload placement algorithms

Non-adaptive Control Inefficiency

Mission-Critical App Example • Virtualized MS Exchange app • High IOPS during the night (2AM-5AM) • Peak: 10 k-IOPS • <1 k-IOPS during the rest of the time 10 1 k-IOPS Rate Time 17 20 23 04 06 07 09 22 01 12 15 18 19 21 24 02 05 11 13 14 16 03 08 10

What If Workloads Grow? 16 14 12 10 8 6 4 16 14 2 12 PM1 10 8 6 4 2 PM2 PM4 PM2 PM3 PM3 PM1 What if VM1 needs more memory & storage? 8 6 4 2 VM1 VM2 VM3 VM4 VM5 VM6 • Can we do better? • Optimized assignment uses 25% less resources

Adaptive vs. Non-Adaptive Workload Control • Workloads demands (and interference) change over time • E.g., Exchange server is active through the night • Why keep its reservation during the day? • Static workload mgmt is limited in handling emergent problems • Apps profiles reflect long-term statistics; fluctuations can cause interferences • Adaptive workload control offers superior mgmt • Exploit workload dynamics to reduce waste of static policies • Eliminate emergent interferences Best Practice #3: Provide adaptive control to optimize resource use & avoid interference Best Practice #4: Use of forward looking workload projection

Adaptive Control: Too Complex for Manual Management Manual management requires administrators to: Master voluminous details of hypervisor andapplications internals Manage interference and waste problems manually Manage resource allocations and move applicationsas workloads change Maintain tight-coordination between virtualization& app administrators This complexity is a central barrier for Virtualization 2.0 !!!

Virtualizing Production Apps:Improved Best Practices

Conclusions Workload placement can be very inefficient Over-provisioning waste; workload-packing waste; non-adaptive inefficiencies Virtualization is much too complex for manual administration Must be augmented by workload management: Eliminate the over-provisioning waste through balanced consolidation Minimize the workload-packing waste by exploiting workload features Support adaptive control to optimize resource use & avoid interference Virtualization 2.0 Strategy: Replace manual mgmt with automated optimized workload management

Live Demonstration

Thank you! www.vmturbo.com

Virtualizing Mission-Critical Apps 1PM EST, 3/29/2011 Ilya Mirman Philip Thomas

Virtualizing Mission-Critical Apps 1PM EST, 3/29/2011 Ilya Mirman Philip Thomas

Presentation Transcript

mission critical companions

Mission Critical Infrastructure

Mission Critical

Business Blogging: Tips and Case Studies

This could be YOUR hack!

Best Practices for Virtualizing Mission Critical Applications

Social Media Check-Up PBS Interactive Webinar February 22 nd 2011 1pm EST

Critical Current Probe

Jacob Mirman, MD

BUDAPEST 2011

Amping Up Collectability in a Down Economy: Collateral, Liens, and Other Legal Maneuvers

Best Practices for Virtualizing Mission Critical Applications

Mission Critical

Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path

Mission Critical Infrastructure

Hydrologic Information System Workgroup Server: Software Stack, Deployment, Operation

Larry Wiener Math Chair Mirman School lwiener@mirman CAIS Conference March 3, 2014

Helios Critical Design Review

The Top 12 Mission Critical iPad Apps for Business