Create Presentation
Download Presentation

Download Presentation
## How Useful is Old Information?

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**How Useful is Old Information?**IEEE Transactions on Parallel and Distributed Systems, 1997 MICHAEL MITZENMACHER On the Management and Efficiency of Cloud Based Services seminar Alexander Zlotnik November 2010 Computer Science faculty Technion**Amazon Elastic Load Balancing**• Share Incoming Traffic • Scale Up • Scale Down • Detect &shutdown unhealthy EC2 instances • Balance single/multiple Availability Zones • Reports • $0.025 per hour [$18/month] + Traffic**Background**• What is this paper not • Real system • Scale up/down system • Groups/AZ balancing**Background**• Supermarket model & benefits • On incoming task, poll d queues for their load • Send the task to the shorter queue • For d = 2 the service time is exponentially shorter • Further d = 3, 4, … the improvement is linear**Background**• Load Balancing Types • Centralized/Distributed • Static/Dynamic(adaptive)**Setting**• n nodes • Tasks Arrival: Poisson (λn) • Task is forwarded to a node for execution • Execution rate is distributed Exp(µ) • Normalized to µ=1 • Every T time units a bulletin board is updated with the current loads of all nodes**Policy**• Look at drandom entries on the board, send the task to the shortest queue • d=1 • M/M/1 • d=n • Shortest queue Static**Periodic updates**0 T 2T 3T Tt**Definitions**• Pi,j(t) – Fraction of queues with posted load i, but have true load j • qi(t) - Rate of arrivals at a queue of size i Pi,j(t) True Load Posted Load**Between board updates**True Load Pi,j(t) µ µ qi qi Posted Load**On board update**True Load Pi,j(t) 0 0 Posted Load**bi (t) – Fraction of queues with load i posted**True Load Pi,j(t) Σ Posted Load**bi (t) – Fraction of queues with posted load i**• Arrival rate to a queue with posted load i: • Tasks arrival • Chance that d selected queues have load i or more • Chance that a queue of size i is selected • Same chance for all queues of size i**Fixed Cycle**• Hope: Convergence to a fixed point • State that there is no “motivation” to exit from • Periodic Updates • “Jump” on bulletin board updates, t=kT • Fixed cycle: bi do not change • Or, for k≥k0, P(kT) = P(k0T)**Fixed Cycle**True Load Pi,j(t) Σ Posted Load Σ If π((k-1)T)= π(kT), the next phase will be the same**Fixed Cycle – finding vector π**• Method A Run the system until changes in π are small • Method B mi,j(T) – Probability for M/M/1 queue to start with i tasks and after time T to have j tasks Solving a system of equations: • Method A on truncated system of differential equations**Fixed Cycle – finding vector π**• Method B mi,j(T) – Probability for M/M/1 queue to start with i tasks and after time T to have j tasks • Solving a system of equations: Iterating π – Iteration end π – Iteration start x = • Until small changes in π**Fixed Cycle – finding vector π**• Method B mi,j(T) – Probability for M/M/1 queue to start with i tasks and after time T to have j tasks • Iterating a system of equations: Bessel function of the first kind**Fixed Cycle – finding vector π**• Method A on truncated system of differential equations • Bound I, J • Iterate on • Until small changes in π**More complex Centralized Strategies**• Time based • Split T into subintervals • At subinterval [tk,tk+1) tasks sent to random server with load at most k**More complex Centralized Strategies**• Record-insert • Upon task arrival: • Sent to random server of servers with lowest load • Server’s load is incremented on bulleting board • Task ending not updated (until end of phase) • Uses real loads, not expectation • Always better than time-based**Continuous Updates**• Tasks benefit on occasional accurate info • Supported by other graphs too • Partial accurate info – not helpful • If not known which data is up to date**Conclusions**• Load Balancing is useful even with stale information • Choosing least loaded of 2 nodes is better than shortest or 3 or more • More complex policies can further help load balancing (like time-based or record-insert)**Open Questions**• Additional metrics • Different theoretical frameworks • More realistic arrival patterns (heavy-tailed)**Personal review**• Powerful theoretical framework • Interesting and relevant results • Cons • Too few loads scenarios • No comparison between the amount of needed bandwidth to load balancing benefit**Relevant Information**• More references • Collection of works: • “The power of Two Random Choices: A Survey of Techniques and Results” M. Mitzenmacher et al 2001 • Not covered in this presentation: • Deviation of simulating differential equations instead of full system [Sections 3.4, 3.5] • Theory of some systems, but their results are • Competitive Scenarios [Section 6] • Small number of servers