150 likes | 255 Vues
Explore the challenges of achieving high reliability in network services and learn about computational risk management strategies to ensure performance guarantees are met. The focus is on quantifying and managing risks using innovative methods.
E N D
HotDep’05 Computational Risk Management for Building Highly Reliable Network Services Chaki NgBrent N. Chun Philip Buonadonna
Network Service Performance • Desire for Hard Performance Guarantees • “99.999% availability,” “all trades < 30 seconds” • Difficult to Achieve Consistently • Demand: workload varies and can be bursty • Supply: resource needs vary and hard to plan for • Dedicated and Over-Provisioning • $$$, low utilization • Shared Infrastructure • Resource supply varies – competition, failures • Tradeoff supply and performance guarantees Chaki Ng || Computational Risk Management
Computational Service Provider (CSP) • Goal: mechanism to manage supply • Resources (e.g. server nodes) • Accommodate peak demand of most services • Markets of nodes • Each node sells resource contracts • Spot, futures, options • Contracts priced based on supply and demand Chaki Ng || Computational Risk Management
Measure Risk • How to quantify performance guarantees • Risk metrics: simple statistical summaries of undesirable outcomes • Example: Value-at-Risk (VaR) • Finance: “The Fidelity mutual fund will lose no more than $25MM monthly, with 95% probability” • Computation: “Amazon.com will process orders in less than 30 seconds daily for 95% of all orders” • Two challenges: calculate VaR and sensitivity analysis of VaR Chaki Ng || Computational Risk Management
Probability Probability 95% Var: -$27MM 95% Var: 33 seconds 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 Fidelity Fund Profit/Loss Amazon.com Order Time Calculate VaR • Calc expected performance distribution • Example method: historical • Methods: Variance, Monte Carlo, Stress Testing Chaki Ng || Computational Risk Management
Compute VaR: Model Supply and Demand Own ServiceWorkload Forecast Supply Set of Accessible Node Resources VaR Node Performance and Trade Forecast Aggregate Workload Forecast Chaki Ng || Computational Risk Management
Sensitivity Analysis of VaR • Goal: model how VaR varies as the set of resource contracts changes • VaR = F(set of resource contracts) • Forecast demand and supply • Nodes and aggregate workload forecast • Own client workload forecast • Model portfolio VaR • Swap set of resource contracts • Calculate VaR improvements Chaki Ng || Computational Risk Management
Portfolio Management • Goal: meet target VaR within budget and minimal cost • Continuous portfolio optimization • Find available set of resources • Find sets that achieve best VaR • Trade resource contracts • Buy best set within budget Chaki Ng || Computational Risk Management
MSFT ORCL Probability Fidelity Profit/Loss Finance: Manage Portfolio VaR VaR Portfolio EBAY IBM 95% Var: -$27MM Sell IBM @ $75 Buy EBay @ $37 Target VaR: “The Fidelity mutual fund will lose no more than $25MM monthly with 95% probability.” Financial Markets Chaki Ng || Computational Risk Management
95% Var: 33 seconds Node2 Node3 Probability 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 Amazon.com Order Time Computation: Manage Portfolio VaR VaR Portfolio Node4 Node1 Sell Node1 @ $50 Buy Node4 @ $30 Target VaR: “Amazon.com will process orders in less than 30 seconds for 95% of all orders.” CSP Chaki Ng || Computational Risk Management
Open Problems • Resource Contracts: pricing, base units • Programming: model, API • Modeling Supply and Demand • Portfolio Strategies: “standard portfolios” • Interoperability: across different CSPes Chaki Ng || Computational Risk Management
Conclusion • Dedicated vs. shared • CSP: share resources via markets • Achieve performance goals in the context of shared CSP • Quantify performance goal via risk metrics like VaR • Calculation and sensitivity analysis • Portfolio optimization Chaki Ng || Computational Risk Management
Backup Slides Chaki Ng || Computational Risk Management
Simple Experiment Service Workload Failover Node Failures Each request tries N nodes randomly If both nodes down failed request Successful Requests Daily Service Availability = All Requests Chaki Ng || Computational Risk Management
Results • Each point: 100 daily runs, 100 requests/hr Chaki Ng || Computational Risk Management