290 likes | 303 Vues
Explore the impact of FIFO policy, SITA, and LWL in task assignments. Discover the efficiency and outcomes under differing assignment policies for high-variability workloads.
E N D
Surprising results on task assignment for high-variability workloads Mor Harchol-Balter, CMU, Computer Sci. Alan Scheller-Wolf, CMU, Tepper Business Andrew Young, Morgan Stanley
Goal: Minimize mean response time: E[T] FIFO Server farm model Assignment Policy incoming jobs FIFO Poisson(l) FIFO POLICY MATTERS! Q: What is a good Assignment Policy? (high-variability jobs) Mor Harchol-Balter, CMU
SITA (Size Interval) Split jobs based on size cutoffs. LWL (Least Work Left) Send job to host with least remaining work. +Protects against high variability +High throughput +Isolation for smalls Good Answers SITA LWL M/G/2 Mor Harchol-Balter, CMU
SITA in Practice • Supercomputing Centers • [Hotovy, Schneider, O’Donnell 96] • [Schroeder, Harchol-Balter 00] • Manufacturing Centers • [Buzacott, Shanthikumar 93] • File Server Farms • [Cardellini, Colajanni, Yu 01] • Supermarkets Prior Work on SITA SITA variants Optimizing SITA cutoffs SITA vs. LWL SITA LWL • [Broberg, Tari, Zeephongsekul 06] • [Harchol-Balter 02] • [Ciardo, Riska, Smirni 01] • [El-Taha, Maddah 06] • [Fu, Broberg, Tari 03] • [Harchol-Balter, Crovella, Murta 99] • [Oida, Shinjo 99] • [ Tari, Broberg, Zomaya, Baldoni 05] • [Thomas 08] • [Harchol-Balter 00] • [Harchol-Balter 02] • [Thomas 08] • [Tari,Broberg,Zomaya, Baldoni 05] • [Fu, Broberg, Tari 03] All conclude SITA far superior for high variability • [Harchol-Balter,Crovella,Murta 98] • [Bachmat, Sarfati 08] • [Sarfati 08] • [Harchol-Balter, Vesilo 10]
Should at least beat all commonly used policies when variability is high enough. In search of a proof of SITA’s total dominance. OK, so not optimal, but definite win for high variability. Years later Months later SITA Can’t prove anything because it’s not true! Mor Harchol-Balter, CMU
Alternative Title: The TRUTH about SITA,under very high job size variability Mor Harchol-Balter, CMU
as C2∞ Q: In this talk we will show ... • SITA diverges & LWL diverges? • SITA converges & LWL diverges ? • SITA diverges & LWL converges? • SITA converges & LWL converges? SITA A: All of the above LWL
as C2∞ Q: In this talk we will show ... Convergent LWL Divergent LWL Convergent SITA Looking for simple job size distributions to illustrate each. Divergent SITA Mor Harchol-Balter, CMU
Results (2 server system) a p Bimodal Conv. LWL Diverg. LWL 1-p b Conv. SITA or depends on pa & (1-p)b Diverg. SITA Exp(ma) p H2 1-p Exp(mb) Mor Harchol-Balter, CMU
Results (2 server system) a b Trimodal Conv. LWL Diverg. LWL r < 1 c=bm > 1 Conv. SITA or depends on m Exp(ma) Diverg. SITA Exp(mb) H3 r < 1 Exp(mc) Mor Harchol-Balter, CMU
Results (2 server system) Conv. LWL Diverg. LWL Conv. SITA depends on a Diverg. SITA Bounded Pareto(a) 1< a < 2 Mor Harchol-Balter, CMU
Diverg LWL Conv. LWL Bimodal Results Conv. SITA Diverg SITA a pa = QE[X] p depends ra & rb 1-p Lemma: As C2 ∞, but E[X], Q: const, a’s get little smaller QE[X] b’s get much bigger ∞ p 1 X ~ b (1-p)b = (1-Q)E[X] THM: LWL always diverges. THM: If ra < 1 & rb < 1 Convergent SITA Mor Harchol-Balter, CMU
Understanding LWL Isn’t LWL always bad for high C2? It depends … But shorts stuck behind longs, so E[T] ∞ Need 2 longs for this to be a problem! So we need: Pr{ 2 longs } * E[T| 2 longs] ? LWL Suffices to just look at E[X3/2]. Mor Harchol-Balter, CMU
Understanding LWL Thm: [Scheller-Wolf, Sigman 97], [Scheller-Wolf, Vesilo 06] (2 SERVERS) 1 spare server C2∞ Thm: I can make both happen! LWL Mor Harchol-Balter, CMU
Diverg LWL Conv. LWL Bimodal Results Conv. SITA Diverg SITA a pa = QE[X] p Lemma: As C2 ∞, but E[X], Q: const, a QE[X], b ∞, p 1 depends ra & rb 1-p X ~ b (1-p)b = (1-Q)E[X] THM: LWL always diverges. THM: If ra < 1 & rb < 1 Convergent SITA Mor Harchol-Balter, CMU
Diverg LWL Conv. LWL Trimodal Results Conv. SITA Diverg SITA pa a pb=b-3/2 b X~ depends on m Lemma: As C2 ∞, but E[X]: const, a E[X] b ∞, c ∞ pa 1 pc=c-3/2 c=bm THM: LWL always converges for r<1 THM: If m≤3, SITA converges If m>3, SITA diverges Mor Harchol-Balter, CMU
Results (2 server system) a a p b Trimodal Bimodal Conv. LWL Diverg. LWL r < 1 1-p c=bm > 1 b Conv. SITA Exp(ma) Diverg. SITA Exp(ma) p Exp(mb) H3 H2 Way more complex, because job types overlap! “Separation in the limit” r < 1 1-p Exp(mc) Exp(mb) Mor Harchol-Balter, CMU
E[T] LWL E[T] SITA SITA LWL C2 C2 Conv. LWL Diverg. LWL LWL Conv. SITA SITA Diverg. SITA E[T] E[T] SITA LWL C2 C2 Mor Harchol-Balter, CMU
Diverg LWL Conv. LWL Bounded Pareto (2 server system) Conv. SITA Diverg SITA X~ Bounded Pareto (k,p,a) 1 < a < 2 Lemma: As C2 ∞, but E[X], a: const, k (a -1)/a E[X] p ∞ k p depends on a THM: If a>3/2 and r<1, then LWL converges. Else LWL diverges. THM: SITA always diverges. Extends to n>2 servers when r < n-1
Bounded Pareto Results Why was this not noticed? Conv. LWL Diverg. LWL a = 1.4 a = 1.6 Conv. SITA SITA E[T] E[T] SITA LWL LWL Diverg. SITA C2 C2 Mor Harchol-Balter, CMU
Summary a a p b Trimodal Bimodal Conv. LWL Diverg. LWL r < 1 1-p c=bn > 1 b Conv. SITA or or Exp(ma) Diverg. SITA Exp(ma) p Exp(mb) H3 H2 r < 1 Bounded Pareto(a) 1-p Exp(mc) Exp(mb) 1< a < 2 Mor Harchol-Balter, CMU
“There once was a girl, who had a little curl right in the middle of her forehead. When she was good, she was very very good. But when she was bad, she was horrid.” Old Nursery Rhyme Conv. LWL Diverg. LWL Conv. SITA Diverg. SITA When SITA is good, it is very, very good But when it is bad, it is horrid. Mor Harchol-Balter, CMU
Epilogue … Mor Harchol-Balter, CMU
Where did SITA go wrong? SITA designed to keep shorts from getting stuck behind longs. Isn’t that good? But stringent segregation of shorts & longs can lead to underutilization of servers. Also, for some distributions, can’t subdivide to avoid infinite variability. Mor Harchol-Balter, CMU
SITA Split jobs based on size cutoffs. LWL Send job to host with least remaining work. +Protects against high variability +High throughput +Isolation for smalls Must be a better way… LWL SITA CS Take next small Take next job M/G/2 Take next job Take next job
CS Take next small Must be a better way… Take next job WIN/WIN ! Shorts have isolation from longs And server utilization is high WRONG! Thm: Whenever SITA diverges, CS diverges too. Mor Harchol-Balter, CMU
SITA Split jobs by size. SITA CS Take next small Thm: Whenever SITA diverges, CS diverges too. Take next job e.g., Pareto(a) e.g., Bimodal y y • PROOF: There are 2 reasons why SITA diverges under given y: • Any way of slicing leads to • (2) There is a way of slicing • away variability, but it forces S’s Smalls Larges L’s CS can’t help. CS should help, but doesn’t.
SITA Split jobs by size. SITA CS Take next small Thm: Whenever SITA diverges, CS diverges too. Take next job Pareto(a) Bimodal y y • PROOF: There are 2 reasons why SITA diverges under given y: • Any way of slicing leads to • (2) There is a way of slicing • away variability, but it forces • Small job sees L of age ~Le. • Small server has been in overload for ~Le time • Small sees ~Le work experiences Le delay. • Delay of small ∞ as C2 ∞ S’s Smalls Larges L’s CS can’t help. CS should help, but doesn’t.
Maybe isolating short jobs is not the panacea for high-variability workloads after all … CS Take next small Conclusion . Take next job SITA LWL Take next job M/G/2 Take next job Mor Harchol-Balter, CMU