CSNB334 Advanced Operating Systems 4. Process & Resource Management

CSNB334 Advanced Operating Systems4. Process & ResourceManagement Semester 2, 2007/2008 Lecturer: Asma Shakil

What is a process? • A process is a program (object code) in execution. • Threads of execution, or threads, are the objects of activity within the process. • Each thread includes a unique program counter, process stack, and set of processor registers. • The kernel schedules individual threads, not processes. • Processes provide two virtualizations: • a virtualized processor • The virtual processor gives the process the illusion that it alone monopolizes the system, despite possibly sharing the processor among dozens of other processes. • virtual memory. • Virtual memory lets the process allocate and manage memory as if it alone owned all the memory in the system.

Levels of Abstraction for An Executing Program

Processes and Threads Process abstraction combines two concepts Concurrency Each process is a sequential execution stream of instructions Protection Each process defines an address space Address space identifies all addresses that can be touched by the program Threads Key idea: separate the concepts of concurrency from protection A thread represents a sequential execution stream of instructions A process defines the address space that may be shared by multiple threads Threads can execute on different cores on a multicore CPU (parallelism for performance) and can communicate with other threads by updating memory

Threads vs. Processes Threads A thread has no data segment or heap A thread cannot live on its own, it must live within a process There can be more than one thread in a process, the first thread calls main & has the process’s stack If a thread dies, its stack is reclaimed Inter-thread communication via memory. Processes • A process has code/data/ heap & other segments • There must be at least one thread in a process • Threads within a process share code/data/heap, share I/O, but each has its own stack & registers • If a process dies, its resources are reclaimed & all threads die • Inter-process communication via OS and data copying.

Single and Multithreaded Processes

Implementing Threads Processes define an address space; threads share the address space Process Control Block (PCB) contains process-specific information Owner, PID, heap pointer, priority, active thread, and pointers to thread information Thread Control Block (TCB) contains thread-specific information Stack pointer, PC, thread state (running, …), register values, a pointer to PCB, … TCB for Thread1 PC SP State Registers … TCB for Thread2 Stack – thread2 PC SP State Registers … Stack – thread1 Process’s address space mapped segments DLL’s Heap Initialized data Code

Examples of Threads • A web browser • One thread displays images • One thread retrieves data from network • A word processor • One thread displays graphics • One thread reads keystrokes • One thread performs spell checking in the background • A web server • One thread accepts requests • When a request comes in, separate thread is created to service • Many threads to support thousands of client requests

Process Life Cycle A process begins its life when it is created. In Linux, this occurs by means of the fork() system call, which creates a new process by duplicating an existing one int main() { if (pid = fork() == 0) /* Child */ execve(“foo”); exit(); else /* Parent */ waitpid(pid); }

Process Creation : fork(), vfork() and clone() system calls • All are declared in <unistd.h>

Program 2 Program 1 Program 3 fork() vfork() clone() C Library System Call 3 fork() Sys_fork() 120 Sys_clone() clone() 289 Sys_vfork() vfork() do_fork()

Fork() Function • Generates a clone of the parent process • The child is almost identical • It executes the same program • In a copy of its virtual memory space

System Call: fork() Create a Child Process • #include <sys/types.h> • #include <unistd.h> • pid_t fork(); • Semantics • The child process is identical to its parent, except: • Its PID and PPID (parent process ID) • Zero resource utilization (initially, relying on copy-on-write) • No pending signals, file locks, inter-process communication objects • On success, returns the child PID • Simple way to detect “from the inside” which of the child or parent runs • Also use getpid(), getppid() • Returns −1 on error

Copy-on-Write • Traditionally, upon fork() all resources owned by the parent were duplicated and the copy was given to the child. • Copy-on-write (or COW) is a technique to delay or altogether prevent copying of the data. • Rather than duplicate the process address space, the parent and the child can share a single copy. • The data, however, is marked in such a way that if it is written to, a duplicate is made and each process receives a unique copy. • Consequently, the duplication of resources occurs only when they are written; until then, they are shared read-only. • In the case that the pages are never written—for example, if exec() is called immediately after fork()—they never need to be copied.

Terminating a Process #include <stdlib.h> void exit (int status) • This function instructs the kernel to terminate the process. • exit() never returns – therefore, no statement should follow the exit() call. • The process status is used to denote the process’ exit status. • Can be used by other programs or at the shell. • It invokes the system call _exit() to tell the kernel to terminate the process. #include <unistd.h> void _exit(int status); • When a process terminates, the kernel reclaims • Allocated memory • Open files • System V semaphores • Notifies the parent of it’s child’s demise

Waiting for terminated child processes • When a child dies before its parent, the kernel puts it in a special state – zombie. • A process in this state waits for its parent to inquire about its status. • Only after this inquiry does the terminated child exit and cease to exist even as a zombie. #include <sys/types.h> #include <sys/wait.h> pid_t wait (int *status) • A call to wait() returns • pid of a terminated child, or • -1 on error • If no child has terminated, the call blocks until a child terminates.

A simple example main() { int status; int pid,child1,child2; if((child1=fork()==0)) printf("child 1 is running\n"); else { if ((child2=fork())==0) printf("child 2 is running\n"); else { pid=wait(&status); if(pid==child1) printf("child1 has terminated first\n"); else printf("child2 has terminated first\n"); printf("goodbye\n"); } } }

Execve() System call • Load and execute a different program. • The execve() system call has the following form: • execve(char *path, char *argv[], char*envp[]) • path : pathname of the program to be executed (Relative/Absolute) • argv : list of parameter strings with the command. • envp : list of environment variable strings and values. • Causes the kernel to do the following • Find the new executable file in file system • Check for access permissions • Load the file for execution into the address space currently being used by the calling process. • Set the argv array and environment variables for the new program. • Start the process executing at the new program’s entry point.

System Call: execve() and variants Execute a Program: Variants #include <unistd.h> int execl(const char *path, const char *arg, ...); int execv(const char *path, char *const argv[]); int execlp(const char *file, const char *arg, ...); int execvp(const char *file, char *const argv[]); int execle(const char *path, const char *arg, ..., char *const envp[]); int execve(const char *filename, char *const argv[], char *const envp[]); Arguments execl() operates on NULL-terminated argument list Warning: arg, the first argument after the pathname/filename corresponds to argv[0] (the program name) execv() operates on argument array execlp() and execvp() are $PATH-relative variants (if file does not contain a ‘/’ character) execle() also provides an environment

Some examples int ret; ret = execl(“/bin/vi”, “vi”, “/home/STUDENT/file.txt”, NULL); if (ret == -1) perror(“execl”); const char *args[ ] = {“vi”, “/home/STUDENT/file.txt”, NULL}; int ret; ret = execvp (“vi”, args); if (ret == -1) perror(“execvp”);

Process Descriptor • A process contains a number of characteristics that can describe the process as being unique from other processes. • These charcs are kept in a single data type – process desciptor. • The kernel stores the process descriptors in a circular doubly linked list called the task list. • Each element in the task list is a process descriptor of the type • struct task_struct, • which is defined in <linux/sched.h>.

Process State • TASK_RUNNING—The process is runnable; it is either currently running or on a runqueue waiting to run. • TASK_INTERRUPTIBLE—The process is sleeping (that is, it is blocked), waiting for some condition to exist. When this condition exists, the kernel sets the process's state to TASK_RUNNING. The process also awakes prematurely and becomes runnable if it receives a signal. • TASK_UNINTERRUPTIBLE—Thisstate is identical to TASK_INTERRUPTIBLE except that it does not wake up and become runnable if it receives a signal. This is used in situations where the process must wait without interruption or when the event is expected to occur quite quickly. • TASK_ZOMBIE—The task has terminated, but its parent has not yet issued a wait() system call. The task's process descriptor must remain in case the parent wants to access it. If the parent calls wait(), the process descriptor is deallocated. • TASK_STOPPED—Process execution has stopped; the task is not running nor is it eligible to run. This occurs if the task receives the SIGSTOP, SIGTSTP, SIGTTIN, or SIGTTOU signal or if it receives any signal while it is being debugged.

Creating a New Task • When a parent process invokes the fork() system call, it creates a new process by performing the following steps: • Allocate new process descriptor to hold info needed to manage task (struct task_struct) and link it to the task list • Creates new kernel space stack • Duplicate info from parent’s process descriptor into the child’s. • Modify the fields in child’s process descriptor • New PID • Link to parents task and siblings • Initialize task specific timers (e.g. creation time) • Create file table and file descriptor for each open file in the parent’s file table. • Create new user data segment for child and copy from parent • Copy info regarding signals and handlers • Copy virtual memory tables • Change child state to TASK_RUNNING

The Linux Implementation of Threads • Linux has a unique implementation of threads. To the Linux kernel, there is no conceptof a thread. • Linux implements all threads as standard processes. • The Linux kernel does not provide any special scheduling semantics or data structures to represent threads. • Instead, a thread is merely a process that shares certain resources with other processes. • Each thread has a unique task_struct and appears to the kernel as a normal process (which just happens to share resources, such as an address space or open files, with other processes).

The Scheduler • The scheduler handles the running and stopping of processes. • Before the 2.6 kernel, the scheduler had significant limitations when many tasks were active. • This was due to the scheduler being implemented using an algorithm with O(n) complexity. • At very high loads, the processor can be consumed with scheduling and devote little time to the tasks themselves. • The pre-2.6 scheduler also used a single runqueue for all processors in a symmetric multiprocessing system (SMP) - good for load balancing but bad for memory caches. • The Linux 2.4.x kernel is not preemptable, and thus the support for RT applications is weak. Interrupts and exceptions result in short periods of kernel mode execution during which runnable RT tasks cannot resume execution immediately. • Linux 2.6.x is a preemptable kernel, and thus RT application support is considerably better.

The Linux 2.6 scheduler • Major scheduling structures • Each CPU has a runqueue is serviced in FIFO order. • The core data structures within a runqueue are two priority-ordered arrays. • One contains active tasks • Other contains expired tasks. • Each task has a time slice that determines how much time it's permitted to execute after which it is added to the expired array to wait for more CPU time. • When the active array is empty, the scheduler swaps the two arrays by exchanging their pointers and begins executing tasks on the new active array.

O(1) time in the Linux 2.6 scheduler • Possible because of the priority array • The Linux 2.6 scheduler always schedules the highest priority task on a system, and if multiple tasks exist at the same priority level, they are scheduled roundrobin with each other. • After running, tasks are put at the bottom of their priority level’s list. • The job of the scheduler is simple: choose the task on the highest priority list to execute. • To make this process more efficient, a bitmap is used to define when tasks are on a given priority list. • A find-first-bit-set instruction is used to find the highest priority bit set in one of five 32-bit words (for the 140 priorities). • The time it takes to find a task to execute depends not on the number of active tasks but instead on the number of priorities • Also, using two priority arrays in unison (the active and expired priority arrays) makes transitions between timeslice epochs a constant-time operation. • An epoch is the time between when all runnable tasks begin with a fresh timeslice and when all runnable tasks have used up their timeslices.

Static task prioritization • All tasks have a static priority, often called a nice value. • On Linux, nice values range from -20 to 19, with higher values being lower priority (tasks with high nice values are nicer to other tasks). • By default, tasks start with a static priority of 0, but that priority can be changed via the nice() system call. • Apart from its initial value and modifications via the nice() system call, the scheduler never changes a task’s static priority. • Static priority is the mechanism through which users can modify task’s priority, and the scheduler will respect the user’s input • A task’s static priority is stored in its static_prio variable. Where p is a task_struct, p->static_prio is its static priority.

Nice() #include <unistd.h> int nice (int inc); A successful call to nice increments a process’ nice value by inc and returns the newly updated value. int ret; errno = 0; ret = nice(10); if (ret == -1 && errno !=0) perror(nice); else printf(“nice value is now %d\n”, ret);

Dynamic task prioritization • The Linux 2.6 scheduler can dynamically alter a task's priority by adding or subtracting from a its static priority • It does so by penalizing CPU-bound tasks and rewarding tasks that are I/O bound. . • The priority of I/O bound tasks is decreased (a reward) by a maximum of five priority levels. CPU-bound tasks are punished by having their priority increased by up to five levels. • Tasks are determined to be I/O-bound or CPU-bound based on an interactivity heuristic. • It's important to note that priority adjustments are performed only on user tasks, not on real-time tasks.

Scheduler source code • In the file /usr/src/linux/kernel/sched.c including the runqueue structure • The statistics are available at /proc/schedstat and present a variety of data for each CPU in the system, including load-balancing and process-migration statistics. • schedule • The main scheduler function. Schedules the highest priority task for execution. • load_balance • Checks the CPU to see whether an imbalance exists, and attempts to move tasks if not balanced. • effective_prio • Returns the effective priority of a task (based on the static priority, but includes any rewards or penalties). • recalc_task_prio Determines a task's bonus or penalty based on its idle time. • source_load Calculates the load of the source CPU (from which a task could be migrated) • .target_load Calculates the load of a target CPU (where a task has the potential to be migrated) • migration_thread High-priority system thread that migrates tasks between CPUs.

CSNB334 Advanced Operating Systems 4. Process & Resource Management