Processes and Threads

Processes and Threads • Processes and their scheduling • Multiprocessor scheduling • Threads • Distributed Scheduling/migration CS677: Distributed OS

Processes: Review • Multiprogramming versus multiprocessing • Kernel data structure: process control block (PCB) • Each process has an address space • Contains code, global and local variables.. • Process state transitions • Uniprocessor scheduling algorithms • Round-robin, shortest job first, FIFO, lottery scheduling, EDF • Performance metrics: throughput, CPU utilization, turnaround time, response time, fairness CS677: Distributed OS

Process Scheduling • Priority queues: multiples queues, each with a different priority • Use strict priority scheduling • Example: page swapper, kernel tasks, real-time tasks, user tasks • Multi-level feedback queue • Multiple queues with priority • Processes dynamically move from one queue to another • Depending on priority/CPU characteristics • Gives higher priority to I/O bound or interactive tasks • Lower priority to CPU bound tasks • Round robin at each level CS677: Distributed OS

Processes and Threads • Traditional process • One thread of control through a large, potentially sparse address space • Address space may be shared with other processes (shared mem) • Collection of systems resources (files, semaphores) • Thread (light weight process) • A flow of control through an address space • Each address space can have multiple concurrent control flows • Each thread has access to entire address space • Potentially parallel execution, minimal state (low overheads) • May need synchronization to control access to shared variables CS677: Distributed OS

Threads • Each thread has its own stack, PC, registers • Share address space, files,… CS677: Distributed OS

Why use Threads? • Large multiprocessors need many computing entities (one per CPU) • Switching between processes incurs high overhead • With threads, an application can avoid per-process overheads • Thread creation, deletion, switching cheaper than processes • Threads have full access to address space (easy sharing) • Threads can execute in parallel on multiprocessors CS677: Distributed OS

Why Threads? • Single threaded process: blocking system calls, no parallelism • Finite-state machine [event-based]: non-blocking with parallelism • Multi-threaded process: blocking system calls with parallelism • Threads retain the idea of sequential processes with blocking system calls, and yet achieve parallelism • Software engineering perspective • Applications are easier to structure as a collection of threads • Each thread performs several [mostly independent] tasks CS677: Distributed OS

Multi-threaded Clients Example : Web Browsers • Browsers such as IE are multi-threaded • Such browsers can display data before entire document is downloaded: performs multiple simultaneous tasks • Fetch main HTML page, activate separate threads for other parts • Each thread sets up a separate connection with the server • Uses blocking calls • Each part (gif image) fetched separately and in parallel • Advantage: connections can be setup to different sources • Ad server, image server, web server… CS677: Distributed OS

Multi-threaded Server Example • Apache web server: pool of pre-spawned worker threads • Dispatcher thread waits for requests • For each request, choose an idle worker thread • Worker thread uses blocking system calls to service web request CS677: Distributed OS

Thread Management • Creation and deletion of threads • Static versus dynamic • Critical sections • Synchronization primitives: blocking, spin-lock (busy-wait) • Condition variables • Global thread variables • Kernel versus user-level threads CS677: Distributed OS

User-level versus kernel threads • Key issues: • Cost of thread management • More efficient in user space • Ease of scheduling • Flexibility: many parallel programming models and schedulers • Process blocking – a potential problem CS677: Distributed OS

User-level Threads • Threads managed by a threads library • Kernel is unaware of presence of threads • Advantages: • No kernel modifications needed to support threads • Efficient: creation/deletion/switches don’t need system calls • Flexibility in scheduling: library can use different scheduling algorithms, can be application dependent • Disadvantages • Need to avoid blocking system calls [all threads block] • Threads compete for one another • Does not take advantage of multiprocessors [no real parallelism] CS677: Distributed OS

User-level threads CS677: Distributed OS

Kernel-level threads • Kernel aware of the presence of threads • Better scheduling decisions, more expensive • Better for multiprocessors, more overheads for uniprocessors CS677: Distributed OS

Process Migration • Transfer of sufficient amount of the state of a process from one machine to another • The process executes on the target machine

Motivation • Load sharing • Move processes from heavily loaded to lightly load systems • Load can be balanced to improve overall performance • Communications performance • Processes that interact intensively can be moved to the same node to reduce communications cost • May be better to move process to where the data reside when the data is large

Motivation • Availability • Long-running process may need to move because the machine it is running on will be down • Utilizing special capabilities • Process can take advantage of unique hardware or software capabilities

Initiation of Migration • Operating system • When goal is load balancing • Process • When goal is to reach a particular resource

What is Migrated? • Must destroy the process on the source system and create it on the target system • Process control block and any links must be moved

What is Migrated? • Eager (all):Transfer entire address space • No trace of process is left behind • If address space is large and if the process does not need most of it, then this approach my be unnecessarily expensive

What is Migrated? • Precopy: Process continues to execute on the source node while the address space is copied • Pages modified on the source during precopy operation have to be copied a second time • Reduces the time that a process is frozen and cannot execute during migration

What is Migrated? • Eager (dirty): Transfer only that portion of the address space that is in main memory and have been modified • Any additional blocks of the virtual address space are transferred on demand • The source machine is involved throughout the life of the process

What is Migrated? • Copy-on-reference: Pages are only brought over on reference • Variation of eager (dirty) • Has lowest initial cost of process migration

What is Migrated? • Flushing: Pages are cleared from main memory by flushing dirty pages to disk • Relieves the source of holding any pages of the migrated process in main memory

Negotiation of Migration • Migration policy is responsibility of Starter utility • Starter utility is also responsible for long-term scheduling and memory allocation • Decision to migrate must be reached jointly by two Starter processes (one on the source and one on the destination)

Eviction • System evict a process that has been migrated to it • If a workstation is idle, process may have been migrated to it • Once the workstation is active, it may be necessary to evict the migrated processes to provide adequate response time

Distributed Scheduling: Motivation • Distributed system with N workstations • Model each w/s as identical, independent M/M/1 systems • Utilization u, P(system idle)=1-u • What is the probability that at least one system is idle and one job is waiting? CS677: Distributed OS

Implications • Probability high for moderate system utilization • Potential for performance improvement via load distribution • High utilization => little benefit • Low utilization => rarely job waiting • Distributed scheduling (aka load balancing) potentially useful • What is the performance metric? • Mean response time • What is the measure of load? • Must be easy to measure • Must reflect performance improvement CS677: Distributed OS

Components • Transfer policy: when to transfer a process? • Threshold-based policies are common and easy • Selection policy: which process to transfer? • Prefer new processes • Transfer cost should be small compared to execution cost • Select processes with long execution times • Location policy: where to transfer the process? • Polling, random, nearest neighbor • Information policy: when and from where? • Demand driven [only if sender/receiver], time-driven [periodic], state-change-driven [send update if load changes] CS677: Distributed OS

Sender-initiated Policy • Transfer policy • Selection policy: newly arrived process • Location policy: three variations • Random: may generate lots of transfers => limit max transfers • Threshold: probe n nodes sequentially • Transfer to first node below threshold, if none, keep job • Shortest: poll Np nodes in parallel • Choose least loaded node below T CS677: Distributed OS

Receiver-initiated Policy • Transfer policy: If departing process causes load < T, find a process from elsewhere • Selection policy: newly arrived or partially executed process • Location policy: • Threshold: probe up to Np other nodes sequentially • Transfer from first one above threshold, if none, do nothing • Shortest: poll n nodes in parallel, choose node with heaviest load above T CS677: Distributed OS

Symmetric Policies • Nodes act as both senders and receivers: combine previous two policies without change • Use average load as threshold • Improved symmetric policy: exploit polling information • Two thresholds: LT, UT, LT <= UT • Maintain sender, receiver and OK nodes using polling info • Sender: poll first node on receiver list … • Receiver: poll first node on sender list … CS677: Distributed OS

Processes and Threads