230 likes | 388 Vues
Coordinated Workload Scheduling. A New Application Domain for Mechanism Design Elie Krevat. Introduction. Distributed systems becoming larger, more complex Nodes perform computation and storage tasks Workloads enter system and are distributed across nodes
E N D
Coordinated Workload Scheduling A New Application Domain for Mechanism Design Elie Krevat
Introduction • Distributed systems becoming larger, more complex • Nodes perform computation and storage tasks • Workloads enter system and are distributed across nodes • Clients run many workloads, can pay for resources • Nodes service many workloads (not dedicated) • System provides QoS guarantees: • Performance – load balance workloads to faster free nodes • Efficiency – minimize cycles wasted when tasks available • Fairness – nodes share resources across workloads
Benefits of Shared Storage Why cluster? Scaling, cost, and management. Why share? Slack sharing, economies of scale, uniformity.
Throughput Performance Insulation in Shared Storage • Each of n workloads on a server: • Executes efficiently within its portion of time (timeslice) • Ideally: gets ≥ 1/n of its standalone performance • In practice: within a fraction of the ideal • Argon project [Wachs07] provides bounds on efficiency across workloads for one server • Problems extending to many servers (cluster-style) • Synchronized workloads need coordination of schedules • Performance of system limited by slowest node
Timeslice challenges 140 ms Server A Workload 1 Workload 1 Workload 2 Workload 2 Workload 3 280 ms Server B Workload 4 Workload 1 Workload 4 100 ms Server C 1 6 3 2 5 1 6 3
Cluster-style Storage Systems Data Block Synchronized Read 1 R R R R 2 3 Client Switch Data Fragment 1 3 4 2 4 Client now sends next batch of requests Storage Servers 6
Environment Assumptions • One client per workload • Bounded number W of workloads, N of nodes • Constant set of workloads to be scheduled • But mechanism might support changing set • Communication doesn’t interfere with computation/storage tasks
Workload Distribution Settings • Two alternative workload distribution settings • Setting I: Free Workload Assignment • Workloads can be freely assigned to many nodes • Example: Embarrassingly parallel distributed apps • Problem: Determine best set of nodes to assign • Setting II: Fixed Workload Assignment • Workloads must be assigned to fixed set of nodes • Example: Cluster-style storage • Problem: Coordinate responses of nodes with better timeslice scheduling
Computing Environments with Monetary Incentives • Workloads pay for resources: • Weather forecasting • Seismic measurement simulations of oil fields • Distributed systems sell resources • Supercomputing centers sell resources • Shared infrastructures • Grid computing • Individually-owned computers sell spare cycles • SETI@Home for $$ • May not have single administrative domain
Why Mechanism Design? • Central coordinator(s) lack per-node information • Different performance capabilities and revenue models • Enforce cooperation and global QoS • Efficiency and fairness not always goals of players • Reduce scheduling problems to general mechanism • Scheduling coordinated workloads is hard (proof later) • Divide scheduling problems across nodes • Design mechanism to produce coordination
Outline • Background and Motivation Mechanism I: Free Workload Assignment • Mechanism II: Fixed Workload Assignment • Conclusions
Revenue Model: Free Assignment • Clients pay nodes directly after task • Payment is per-workload • Amount depends on many factors: • Speed of response • Number of requests/computations per timeslot • Clients may also pay fixed cost to central scheduler • Workloads want the best and fastest nodes • Central scheduler doesn’t know load/speed of nodes • Nodes are greedy and want lots of workloads • May lie about load/speed if asked directly • System Goal: Assign workloads to nodes that will respond fastest
Mechanism Design: VCG • Run auction to decide which M nodes to assign • FIFO approach to scheduling each workload • Can also run combinatorial auction on bundles • Nodes respond with bids • Valuations depend on speed and current load • Same factors that affect final payment • Apply Vickrey-Clarke-Groves mechanism • First auction iteration finds top M bids • Remove Node X, recompute top M bids • Additional auction iteration not actually necessary • Difference between X’s bid and M+1st bid is payment • May also normalize payments to share wealth over nodes
Mechanism Results • Incentive compatible • Nodes have no incentive to lie, since if they over-report valuation for workload they’ll still be paid true valuation • Global efficiency (i.e., best allocation for workload) • Related to general task allocation problem [Nisan99] • k tasks allocated to n agents • Goal is to minimize completion time of last assignment (make-span) • Valuation of agent is negation of total time spent on tasks • Approximation/randomized algorithms exist for CA
Outline • Background and Motivation • Mechanism I: Free Workload Assignment Mechanism II: Fixed Workload Assignment • Conclusions
Revenue Model: Fixed Assignment • Nodes paid by system at every timestep • Payment is part of mechanism payment scheme • System wants quick resolution of workload requests • Nodes need monetary incentives to schedule fully coordinated workloads efficiently and fairly • All M nodes service workload in same timeslice • Uncoordinated workloads not important • System Goals: • Enforce coordination of workloads per timeslice • Achieve fair distribution of resources • Achieve efficient schedule allocations
Coordination is hard 1 2 3 4 • Reduce Max Independent Set problem to problem of scheduling max # of fully coordinated workloads per timeslice • For every Node xi that services a workload wi, then wi has a dependency edge to all other workloads serviced by xi • NP-Complete, but approximation algorithms exist • For above example, max independent set is {1,3}
Properties of Schedule Allocations • Two types of schedule allocations • Basic quanta timeslice allocation a • Longer sequence of timeslices atot • Set of workloads serviced by node nx is Sx • # / timeslice quanta allocated per workload wiis qi • Total quanta count Qx for each node nx • Delay between consecutive workload schedules in allocation atot is schedule distance di,k • k refers to schedule instance in atot • Average schedule distance di,avg, per-node is dx,avg • Maximum schedule distance di,max, per-node is dx,max
Possible Payment Scheme • Node is paid max of P credits for each scheduled time quanta • No credits for uncoordinated schedule • For every cycle of time that workload isn’t scheduled, payment decreases by c (c << P) • Node is fined F if starves workload over a period of quanta greater than Qthr • Using derived properties of schedule allocations, each node calculates payments
Mechanism Design: Open Research Problem • Goal is to improve efficiency and fairness • Nodes compute their best allocations (through heuristics) using payment scheme that rewards efficiency/fairness • Send valuations to central scheduler • General mechanism determines best global allocation • But coordination is hard optimization problem • May be better suited only for central scheduler • Expected properties of a mechanism: • Nodes are players • No additional utility past payments? • Auctioned good may be single or total allocations • Tradeoff is ability to adapt to changing workloads vs. better assessment of efficient allocations over longer time
Outline • Background and Motivation • Mechanism I: Free Workload Assignment • Mechanism II: Fixed Workload Assignment Conclusions
Conclusions • Distributed systems environments provide new applications for mechanism design • Goals of better global performance, efficiency, fairness • Not always shared by individual nodes • Model and analysis of 2 different distribution settings • Free workload assignment solved with VCG • Fixed workload assignment still open problem • Revenue model and goals of mechanism vary • Payment functions use derived allocation properties • Coordination of workloads is hard optimization problem • Motivation for further research in related areas