Operating Systems Engineering OS Scheduling

Operating Systems EngineeringOS Scheduling By Dan Tsafrir, 27/4/2011

What’s OS scheduling • OS policies & mechanisms to allocate resources to entities • Enforce a policy (e.g., “shortest first”) • Using an OS mechanism (e.g., “thread chooser”) • Entities • Threads, processes, process groups, users, I/O ops (web requests, disk accesses…) • Resources • CPU cycles, memory, I/O bandwidth (network, disk) • Dynamic setting • Usually scheduling must occur when resources re-assignment can happen with some frequency • Not applicable to more static resources (like disk space) • Varying scale • From shared memory accesses (HW) to parallel jobs (SW)

Overload • Scheduling unavoidable • When resources are shared • But it’s mostly interesting in a state of overload • When demand exceeds available resources • E.g., if |threads| <= |cores|, scheduling is trivial • Well, not really, as threads placement might impact performance greatly (caches are shared) • For parallel jobs on, say, the Bluegene supercomputer, this example is more or less true • Likewise, in the cloud, when there are more physical servers than VMs (and you’re willing to pay for power) • A good scheduling policy • Gives the most important entity what it needs • Equally important => fair share

Popularity of OS scheduling research • Especially popular in the days of time-sharing • When there was a shortage of resources all around • Many scheduling problems become uninteresting • When you can just cheaply buy more/faster resources • But there were always important exceptions • Web servers handling peak demands (flash crowds, attacks, prioritizing paying customers) • And nowadays… • Embedded (power/performance considerations on your handheld device) • Cloud servers

Key challenges • Knowing what’s important • Can’t read clients’ mind; often unrealistic to explicitly ask • Many relevant, often conflicting performance metrics • Throughput vs. latency (e.g., network packets) • Throughput vs. fairness (e.g., DRAM accesses) • Power vs. speed (e.g., DVFS) • Soft/hard realtime • … • Many schedulers • CPU, disk, I/O, memory,… • Interaction not necessarily healthy • Countless domain-specific, workload-dependent, ad-hoc solutions • No generic solution

Addressing challenges – baseline • Understand where scheduling is occurring • Expose scheduling decisions, allow control, allow different policies • Account for resource consumption to allow intelligent control

Multilevel priority queue • Running reduces priority to run more • Multiple ready queues, ordered by importance • Run most important first in a round-robin fusion • Use preemption if more important process enters the system • The negative feedback loop ensures no starvation • If you sleep a lot => consume little CPU => you’re important=> when awakened, you’d typically immediately get a CPU • Used by all general-purpose OSes • Unix family, Windows family • Problematic in many respects • What if you run a lot but are still very important to the user?

Pitfall: priority inversion • Example • One CPU • P_low holds a lock • P_high waits for it • P_med becomes runnable => OS preempts P_low • (From real life: exactly what happened in 1st Mars Rover) • Example • Many CPU-bound background processes • X server serves multiple clients => CPU quota might run out • Possible solution: priority inheritance • P_high lends its priority to P_low until key is released • X clients lend their priority to X server until serviced

Pitfall: uncoordinated schedulers • Example • Even though CPU-scheduler favors emacs (always sleeps), • disk I/O scheduler does not, preferring higher throughput over emacs’s I/O ops • Example • Emacs needs memory => memory is tight => must wait • Other processes have dirty pages => write to disk • Disk I/O scheduler doesn’t know these writes are important

Active field of research • “No justified complaints: On fair sharing of multiple resources” [Dec, 2010; Dolev, Feitelson, Linial et. Al; TR] “We define fairness in such a scenario as the situation where every user either gets all the resources he wishes for, or else gets at least his entitlement on some bottleneck resource, and therefore cannot complain about not getting more. We then prove that a fair allocation according to this definition is guaranteed to exist for any combination of user requests and entitlements. The proof, which uses tools from the theory of ordinary differential equations, is constructive and provides a method to compute the allocations numerically.”

Active field of research • “RSIO: Automatic User Interaction Detection and Scheduling” [Jun 2010; Zheng, Viennot, Nieh; SIGMETRICS] “We present RSIO, a processor scheduling framework for improving the response time of latency-sensitive applications by monitoring accesses to I/O channels and inferring when user interactions occur. RSIO automatically identiﬁes processes involved in a user interaction and boosts their priorities at the time the interaction occurs to improve system response time. RSIO also detects processes indirectly involved in processing an interaction, automatically accounting for dependencies and boosting their priorities accordingly.”

Active field of research • “Secretly monopolizing the CPU without superuser privileges” [Aug 2007; Tsafrir, Etsion, Feitelson; USENIX Security] • See next presentation…

Operating Systems Engineering OS Scheduling