1 / 20

Task Assignment with Unknown Duration

Task Assignment with Unknown Duration. Mor Harchol-Balter Carnegie Mellon. 1. 3. 2. Large # jobs. L.B. 4. Distributed Server. Load Balancer employs TAP (Task Assignment Policy): rule for assigning jobs to hosts. Age-old Question: What’s a good TAP ?. FCFS. FCFS. Large # jobs.

Télécharger la présentation

Task Assignment with Unknown Duration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Task Assignmentwith Unknown Duration Mor Harchol-Balter Carnegie Mellon

  2. 1 3 2 Large # jobs L.B. 4 Distributed Server Load Balancer employs TAP (Task Assignment Policy): rule for assigning jobs to hosts Age-old Question: What’s a good TAP ?

  3. FCFS FCFS Large # jobs L.B. FCFS FCFS The Model • Processing requirement (size) of job is not known. • Jobs are not preemptible. • Jobs queued at a host are processed in FCFS order. • Hosts are identical. Motivation for model:Distributed servers for supercomputing, where each host is a multi-processor.

  4. Which TAP is best (given model)? 1 2 L.B. 3 4 1. Round-Robin 2. Random 3. Shortest-Queue Send job to host with fewest number jobs. 4. Least-Work-Left Central-Queue Send job to host with Host grabs next job when free. least total work left. “best” -- minimize mean waiting time

  5. Which TAP is best (given model)? 1 2 L.B. 3 4 1. Round-Robin 2. Random 3. Shortest-Queue Send job to host with fewest number jobs. 4. Least-Work-Left Central-Queue Send job to host with Host grabs next job least total work left. when free. Known: Optimal for exponentially- distributed sizes. “best” -- minimize mean waiting time

  6. But real jobs do NOT have exponentially-distributed sizes! They have heavy-tailed sizes.

  7. 1 x Unix process CPU lifetime measurements [Harchol-Balter, Downey TOCS 97] Fraction of jobs with CPU duration > x (log-log plot) Pr{Size > x} = Duration (xsecs) • We measured over 1 million UNIX processes. • Instructional, research, and sys. admin. machines. • Job of cpu age x has probability 1/2 of using another x.

  8. - a Pr{ Size > x } ~ x , 0 < a < 2 Bounded Pareto (heavy-tailed) distribution a : degree of variability a 0 ------ ------ 2 less variable & less heavy-tailed more variable & more heavy-tailed • Properties: • Decreasing Failure Rate • Very high variance! • Heavy-tail property -- • Miniscule fraction (<1%) • of the very largest jobs • comprise half the load. 1 0 min max job size

  9. Which TAP is best for heavy-tailed job sizes? 1 2 L.B. 3 4 1. Round-Robin 2. Random 3. Shortest-Queue Send job to host with fewest number jobs. 4. Least-Work-Left Central-Queue Send job to host with Host grabs next job least total work left. when free. Known: Optimal for exponentially- distributed sizes. “best” -- minimize mean waiting time

  10. The TAGS algorithm “Task Assignment by Guessing Size” s1 Host 1 s2 Host 2 Outside Arrivals s3 Host 3 Host 4 When job at host j reaches size sj , then job is killed and restarted from scratch at host j+1

  11. 3 Flavors of TAGS How to choose the cutoffs: s1, s2, s3, … • TAGS-opt-meanslowdown • TAGS-opt-meanwaitingtime • TAGS-opt-fairness

  12. TAGS is counterintuitive • TAGS wastes resources … non-workconserving • Big jobs seem unfairly penalized … yet somehow • turns out to be fair? • TAGS always operates under unbalanced load.

  13. Results of Analysis 2 hosts only -- system load = .5 Random Least-Work-Left TAGS-opt-slowdown Random Least-Work-Left TAGS-opt-fairness

  14. Results of Analysis 2 hosts only -- system load = .5 Random Least-Work-Left TAGS-opt-waitingtime

  15. More Results 4 hosts -- system load = .3 Random Least-Work-Left TAGS

  16. More Results New metric: Server Expansion Server expansion = number of hosts we would have to add to system to get mean slowdown down to 2 or 3. (Initial system: 2 hosts , system load = .7) Least-Work-Left TAGS

  17. WHY does TAGS work so well? 1) Reduction of variance of job size distribution 2) Load Unbalancing

  18. WHY does TAGS work so well? Recall, P-K formula for M/G/1 queue: FCFS Second moment of Job Size Distribution 2 l { X } E Mean Waiting Time E { W } = 2 ( 1 - r ) 1)Reduction of variance of job size distribution: TAGS reduces variance of job size distribution at the hosts. No other policy does this!

  19. WHY does TAGS work so well? 2)Load Unbalancing: This is fair? YES TAGS-opt-slowdown TAGS-opt-fairness Host 2 Host 2 Host 1 Host 1 All other policies aim to balance the load. TAGS unbalances load.

  20. Conclusion • This research challenges our common wisdom: • Load unbalancing may be better than load balancing. • It may be worthwhile to waste resources by restarting a job • from scratch at a new machine … even if the new machine • has a much higher load than the original machine! • A policy which appears to greatly penalize large jobs • may actually be fair.

More Related