SLA Driven Capacity Planning for Cloud Based Systems

SLA Driven Capacity Planning for Cloud Based Systems Rahul Ghosh (IBM Research & Duke University) Co-authors: Vijay K. Naik and Murthy Devarakonda (IBM Research), Kishor S. Trivedi and DongSeong Kim (Duke University)

Talk outline • Capacity planning problem in IaaS Cloud • Cost and SLA components, optimization problem • Description of a capacity planning engine (CPE) • Key components and their descriptions • An example problem to describe the working of CPE • Problem description, solution approach, results • Future Research • Conclusions

Capacity planning (provider’s perspective) Failure of H/W, S/W Service times & priorities vary for different job types Workload demands varying over time Cloud service provider

What is the optimal number of physical machines (PMs) so that total cost is minimized and SLA is upheld? SLA driven capacity planning Large sized cloud, large variability, fixed # configurations

Cost components • Providers have two key costs for providing cloud based services • Capital Expenditure (CapEx) and • Operational Expenditure (OpEx) • Capital Expenditure (CapEx) • Example of CapEx includes infrastructure cost, software licensing cost • Usually CapEx is fixed over time • Operational Expenditure (OpEx) • Example of OpEx includes power usage cost, cost or penalty due to violation of different SLA metrics, management costs • OpEx is more interesting since it varies with time depending upon different factors like system configuration, management strategy or workload arrivals

Examples of SLAs in IaaS cloud • Availability of cloud service • Provisioning response delay • It is the duration from the submission of a user request to a IaaS cloud to the time instance when the request is actually provisioned Typically enterprises require availability of 5 nines or higher to host their business applications on a cloud provided service Provisioning response delay can vary depending on how VMs are provisioned on physical machines How these SLA and cost components are connected in an optimization problem?

Minimize total cost of operation (TCO) • In our analysis, two components of TCO ( ) are total infrastructure cost ( ) and total power cost ( ) • is an example of CapEx and is an example of OpEx • Our goal is to minimize total cost of operation (TCO), subject to cloud SLA constraints is #PMs; decision variable All other constraints are equality constraints, specifying the values of parameters: We use a capacity planning engine (CPE) to solve such optimization problem

Capacity Planning Engine (CPE) Workload analyzer (Job arrival rates, mean job service times) SLA IaaS Cloud specifications Max. (job rejection prob. or mean provisioning delay) (Mean searching delay, mean time to provision a VM, MTTF, MTTR, # CPU cores per PM etc. ) Search algorithm to find optimal capacity What are these models? Performance & Availability models Capacity estimator Output optimal capacity

Performance and availability models • Characterize cloud service as a function of arrival rate, available capacity, service requirements, and failure properties • Main Assumptions • - All requests are homogenous, where each request is for one virtual machine (VM) with fixed size CPU cores, RAM, disk capacity • - Submitted requests are served in FCFS basis by resource provisioning decision engine (RPDE) • - Once resources are assigned and provisioned to a request, the request runs to completion or a failure occurs and request exits the system following either of the two events • - Resources are grouped into pools of physical machines for performance • - All physical machines (PMs) in a particular type of pool are identical

Life cycle of a job inside a IaaS cloud Provisioning response delay • Provisioning and servicing steps: • (i) resource provisioning decision, • (ii) VM provisioning and • (iii) run-time execution VM deployment Provisioning Decision Actual Service Out Arrival Queuing Instantiation Resource Provisioning Decision Engine Run-time Execution Instance Creation Deploy Job rejection due to buffer full Job rejection due to insufficient capacity We translate these steps into analytic sub-models 12

Novelty of our approach • Single monolithic model vs. interacting sub-models approach • Even with a simple case of 6 physical machines and 1 virtual machine per physical machine, a monolithic model will have 126720 states. • In contrast, our approach of interacting sub-models has only 41 states. Clearly, for a real cloud, a naïve modeling approach will lead to very large analytical model. Solution of such model is practically impossible. Interacting sub-models approach is scalable, tractable and of high fidelity. Also, adding a new feature in an interacting sub-models approach, does not require reconstruction of the entire model. What are the different sub-models? How do they interact?

Model interactions: Performability

Results from model solutions

Problem description • We assume that the PMs do not fail – performance model is only used • All PMs are in hot pool • From workload analyzer of CPE, we compute job arrival rate and mean job service time • From cloud specifications, we obtain mean delay to find a PM and mean provisioning delay is decision variable Fast search algorithm is needed to solve this problem for large clouds

Overview of the algorithm to find optimal #PMs • Initialization: flag: search_type = 1 • while(1) { • Solve pure performance model to obtain and • if and : • break • else : • if (search_type == 1) : • if and : • else : • else: • // Run a binary search when optimal value lies between • ( & ) or between and • Determine the condition when search_type != 1 • Determine the condition when optimal value is reached • }// end of while

Illustration of the algorithm initialization reduce the capacity gap Values of capacity binary search begins optimal value constraint violation, condition for search_type !=1 true Number of search iterations

Results for an instance of optimization problem • Job arrival rate = 1000 jobs/ hr, mean job service time = 1 hr • Mean provisioning delay = 5 min, mean time to find a PM = 3 sec • We solved the problem for different combinations of SLA values

SLA vs. optimal capacity

Future Research • Development of a monitoring and modeling based tool • The tool will accomplish the following tasks: (i) Workload and system characterization (ii) For a well-characterized cloud service, predicting workload demand, system usage, failure rates at unit level and in aggregate (iii) When limited information is available on system resources and workload variability, such as under bursty conditions, provide short term forecasts on capacity requirements and online decisions to manage the dynamic behavior of the system (iv) Such a tool should be embedded into the cloud management platform to detect and predict performance bottlenecks, analyze risks and tradeoffs, provide provisioning and planning strategies for the short and long terms Challenges: Managing scale, variability, and stochastic behavior of the system

Conclusions • We described a capacity planning engine (CPE) for IaaS cloud that computes the optimal number of physical machines to minimize overall cost and to meet the SLA constraints • Core of the CPE consists of interacting stochastic process models that characterize performance and availability of IaaS cloud • Such approach is scalable and tractable compared to a monolithic modeling approach • Results show that our method of solution has small overhead • Hence, useful for both online and offline planning • In future, we want to extend this capacity planning engine to a measurement and modeling based tool and integrate it with cloud management platform

Thanks! Questions/Comments?

Backup

Meanings of symbols used

Resource Provisioning Decision Model Continuous Time Markov Chain (CTMC) for resource provisioning decision model Steps of a resource provisioning decision i = number of jobs in queue, s = pool (hot, warm or cold)

VM Provisioning Model Hot PM Hot pool Provisioning Decision Engine Warm pool Service out Accepted jobs Running VMs Idle resources in hot machine Cold pool Idle resources in warm machine Idle resources in cold machine

VM Provisioning Model (contd.) … 0,0,0 0,1,0 Lh,1,0 • i = number of jobs in the queue, j = number of VMs being provisioned, k = number of VMs running • For warm/cold machine, the resource usage model is similar with the following exceptions: • (i) Effective job arrival rate • (ii) For the first job, warm/cold machine requires additional start-up work • (iii) Mean time to provision a VM for the first job is longer compared to a hot machine … 0,0,1 (Lh-1),1,1 Lh,1,1 CTMC for each hot PM … … … … … … … 0,0,(m-1) 0,1,(m-1) (Lh-1),1,(m-1) Lh,1,(m-1) i,j,k … 0,0,m 1,0,m Lh,0,m

Run-time Model • Run-time model is used to determine the mean time for a job completion • We use a Discrete Time Markov Chain (DTMC) to capture the details of job execution Model output is mean job service time ( )

Results from Pure Performance Models • All these models are used for pure performance analysis since we do not consider any failure • Output of resource provisioning decision model: • Job rejection probability due to buffer full (Pblock) • Job rejection probability due to insufficient capacity (Pdrop) • Mean queuing delay (E[Tqueue]) • Mean decision delay (E[Tdecision]) • Output of VM provisioning models: • Probability that a atleast one machine in hot /warm/cold pool can accept a job for provisioning • These probabilities are denoted by Ph, Pw and Pc for hot, warm and cold pool respectively • Mean instantiation, provisioning and configuration delay (E[Tprov]) • Output of run-time model: • Mean job service time • Output of pure performance models • Total job rejection probability (Preject= Pblock + Pdrop) • Mean end-to-end provisioning delay (E[Tresp]=E[Tqueue]+E[Tdecision]+E[Tprov])

Availability Model • Availability model helps to understand the effect of failure and repair • We assume only net equivalent effect of different failures and repairs • (i, j, k) denotes number of available (or “up”) hot, warm and cold machines respectively

Effect of Varying Job Arrival Rate • Increasing system capacity reduces rejection probability and provisioning delay. However, marginal gain (reduction in rejection probability or provisioning delay) reduces with increasing capacity.

Effect of Varying Mean Job Service Time • Rejection probability and provisioning delay changes in a non-linear fashion with job service time • Benefit of Increasing system capacity is more in high service time regime

Effect of Varying Number of Machines Across Pools • Having all PMs on hot pool achieves best performance (least rejection probability and mean provisioning delay). However, the cost of operation may get increased, leading to a cost-performance trade-off.

Intuition behind initial value of #PMs • There are three types of servers in our model – decision server, provisioning server and actual job processing server. • Decision server runs significantly faster than the other two servers, so we neglected its impact. • Typically, job processing server runs at a slower pace compared to provisioning server. • If we divide lambda by the rate of job processing server, the #PMs to begin with might be large and too conservative. • This might lead to higher number of search iterations to find optimal #PMs.

SLA Driven Capacity Planning for Cloud Based Systems

SLA Driven Capacity Planning for Cloud Based Systems

Presentation Transcript

Technical Demonstrator SLA-driven Service Management

Cloud Based Analytics for Cloud Based Applications

SLA-Oriented Resource Provisioning for Cloud Computing

SLA-aware load balancing for cloud datacenters

Flash-based (cloud) storage systems

Capacity Planning

SLA-aware load balancing for cloud datacenters

Multi-dimensional SLA-based Resource Allocation for Multi-tier Cloud Computing Systems

SLA for Cloud Services

Roles-based Capacity Planning for TSL

COMP9334: Capacity Planning for Computer Systems and Networks

NETE4631 Managing the Cloud and Capacity Planning

Capacity Planning

Virtual Systems Monitoring and Capacity Planning

Capacity Planning: Role in MPC Systems Rough Cut Capacity Planning Capacity Bill Technique

An SLA-Oriented Capacity Planning Tool for Streaming Media Services

Technical Demonstrator SLA-driven Service Management

An SLA-Oriented Capacity Planning Tool for Streaming Media Services

Capacity Planning

Cloud Based Analytics for Cloud Based Applications