Advanced Operating Systems

Advanced Operating Systems Lecture 12: Process migration University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani Distributed Operating Systems

Covered topic • Process migration, Why? And how. • References • Chapter 3 of the text book • Fred Douglis and John Ousterhout, “Transparent Process Migration: Design Alternatives and the Sprite Implementation” Distributed Operating Systems

Outline • Motivation for migration • How does migration occur? • Resource migration • Agent-based system • Details of process migration • Problems Distributed Operating Systems

Motivation • Key reasons: performance and flexibility • Process migration (strong mobility) • Improved system-wide performance – better utilization of system-wide resources • Idle workstations • Code migration (weak mobility) • Shipment of server code to client – filling forms (reduce communication, no need to pre-link stubs with client) • Ship parts of client application to server instead of data from server to client (e.g., databases) • Improve parallelism – agent-based web searches Distributed Operating Systems

Motivation • Flexibility • Dynamic configuration of distributed system • Clients don’t need preinstalled software – download on demand Distributed Operating Systems

Migration models • Process = code seg + resource seg + execution seg • Weak versus strong mobility • Weak => transferred code (program) starts from initial the state (Java Applets). Simple • Strong => move execution segment • Sender-initiated versus receiver-initiated • Sender-initiated (code is with sender) • Client sending a query to database server • Client should be pre-registered • Receiver-initiated • Java applets • Receiver can be anonymous Distributed Operating Systems

Who executes migrated entity? • Code migration: • Execute in a separate process • [Applets] Execute in target process • Process migration • Remote cloning • Migrate the process Distributed Operating Systems

Models for Code Migration • Alternatives for code migration. Distributed Operating Systems

Do Resources Migrate? • Depends on resource to process binding • By identifier: specific web site, ftp server • By value: Java libraries • By type: printers, local devices • Depends on type of “attachments” • Unattached to any node: data files • Fastened resources (moved only at high cost) • Database, web sites • Fixed resources • Local devices, communication end points Distributed Operating Systems

Resource Migration Actions • Actions to be taken with respect to the references to local resources when migrating code to another machine. • GR: establish global system-wide reference • MV: move the resources • CP: copy the resource • RB: rebind process to locally available resource Resource-to machine binding Process-to-resource binding Distributed Operating Systems

Migration in Heterogeneous Systems • Systems can be heterogeneous (different architecture, OS) • Support only weak mobility: recompile code, no run time information • Strong mobility: recompile code segment, transfer execution segment [migration stack] • Virtual machines - interpret source (scripts) or intermediate code [Java] Migration on Only subroutine Or method Call Migrate stack Distributed Operating Systems

Cost of migration • Multiprocessor: nondistributed • loss of the lines associated with the process in the processor's instruction a data caches • Distributed environment • Moving a process's virtual memory • Forwarding a process's IPC (local and network) messages, informing senders of the process's new contact information. • Moving information of files. the open file table, the file descriptor table, the file offset, dirty blocks in the buffer cache, &c • Moving the process's user-level state: registers, stack, &c • Moving the process's kernel-level state: pwd, pid, signal masks, &c Distributed Operating Systems

Cost of migration (partial migration) • Migration of the whole process too expensive. • Move certain aspects of a process • The remaining portions of the process create residual dependencies -- the migrated process still relies on the original host to provide the services that were not migrated. Distributed Operating Systems

Migrating Virtual memory • Freeze and copy migration: Suspend or freeze the process on the original host, and then to copy all of the pages of memory to the new host. Once all done, process can be resumed on the new host. • Simple, clean and easy to implement. • Does not create a residual dependency • Many pages which are never used may be copied and sent over the networkIf the process is migrated several times, this cost adds up • Do nothing while copying? Distributed Operating Systems

Migrating Virtual memory • Precopying: The process runs on the original host, while the pages are being copied. • It is clean -- it does not create any residual dependencies. • Copying pages that may never be used. • Dirty pages must be transferred. Can be more expensive • lazy migration: like demand paging. • It creates residual dependencies Distributed Operating Systems

Migrating Virtual memory • Distributed file system: a memory-mapped file. the process's memory can be migrated simply by flushing the dirty blocks and mapping the file from a different host. • Isn't as clean as it may seem Distributed Operating Systems

Migrating Communication Channels • If a process migrates, its communications must be able to continue. • Inform "interested" processes of the new location of a migrating process. • Unclean, unnecessary messages, how to know other communicating clients? • link redirection or forwarding at the original host of the migrating process. • Residual dependency and can increase the latency involved in sending messages to the migrated process, but makes the process of migration itself cheaper Distributed Operating Systems

Process with open files • Show up at the new host and re-open the files. But, in truth, there is a great deal of state associated with an open file. Consider the system-wide open file table, the cached inodes, dirty blocks that may live only in the local buffer cache, &c. • fork()'d proceses share the same file offset. • it is often much easier to leave the process dependent on the old host for file service. Distributed Operating Systems

Migrate kernel state • It is often easier to leave a migrating process dependent on a prior (or perhaps first) host for these services. • Checkpointing and recovery: A process's state to be saved to a file (much like a persistent object) and then a new process to be created (restored) based on this checkpoint file. This checkpoint file contains all of the "goods" including the kernel material. Distributed Operating Systems

Migrate? Or not migrate? • Several things to consider • If the home host suffers from a bursty load, it may not make sense to migrate a process -- the home host will be free again, soon. • Processes with significant virtual memory or IPC usage or many open files are poor choices for migration. • Historical consideration: long running processes are better candidates than recent arrivals – they are likely to continue to run for a long time. Short lived processes are likely to complete shortly after migration, offering little gain to amortize the cost of migration over useful work. Distributed Operating Systems

Design Issues • Measure of load • Queue lengths at CPU, CPU utilization • Types of policies • Static: decisions hardwired into system • Dynamic: uses load information • Adaptive: policy varies according to load • Preemptive versus non-preemptive • Centralized versus decentralized • Stability: l>m => instability, l1+l2<m1+m2=>load balance • Job floats around and load oscillates Distributed Operating Systems

Components • Transfer policy: when to transfer a process? • Threshold-based policies are common and easy • Selection policy: which process to transfer? • Prefer new processes • Transfer cost should be small compared to execution cost • Select processes with long execution times • Location policy: where to transfer the process? • Polling, random, nearest neighbor • Information policy: when and from where? • Demand driven [only if sender/receiver], time-driven [periodic], state-change-driven [send update if load changes] Distributed Operating Systems

Sender-initiated Policy • Transfer policy • Selection policy: newly arrived process • Location policy: three variations • Random: may generate lots of transfers => limit max transfers • Threshold: probe n nodes sequentially • Transfer to first node below threshold, if none, keep job • Shortest: poll Np nodes in parallel • Choose least loaded node below T Distributed Operating Systems

Receiver-initiated Policy • Transfer policy: If departing process causes load < T, find a process from elsewhere • Selection policy: newly arrived or partially executed process • Location policy: • Threshold: probe up to Np other nodes sequentially • Transfer from first one above threshold, if none, do nothing • Shortest: poll n nodes in parallel, choose node with heaviest load above T Distributed Operating Systems

Symmetric Policies • Nodes act as both senders and receivers: combine previous two policies without change • Use average load as threshold • Improved symmetric policy: exploit polling information • Two thresholds: LT, UT, LT <= UT • Maintain sender, receiver and OK nodes using polling info • Sender: poll first node on receiver list … • Receiver: poll first node on sender list … Distributed Operating Systems

Case Study: V-System (Stanford) • State-change driven information policy • Significant change in CPU/memory utilization is broadcast to all other nodes • M least loaded nodes are receivers, others are senders • Sender-initiated with new job selection policy • Location policy: probe random receiver, if still receiver, transfer job, else try another Distributed Operating Systems

Sprite (Berkeley) • Workstation environment => owner is king! • Centralized information policy: coordinator keeps info • State-change driven information policy • Receiver: workstation with no keyboard/mouse activity for 30 seconds and # active processes < number of processors • Selection policy: manually done by user => workstation becomes sender • Location policy: sender queries coordinator • WS with foreign process becomes sender if user becomes active: selection policy=> home workstation Distributed Operating Systems

Sprite (contd) • Sprite process migration • Facilitated by the Sprite file system • State transfer • Swap everything out • Send page tables and file descriptors to receiver • Demand page process in • Only dependencies are communication-related • Redirect communication from home WS to receiver Distributed Operating Systems

Overview of Code Migration in D'Agents (1) • A simple example of a Tel agent in D'Agents submitting a script to a remote machine (adapted from [gray.r95]) proc factorial n { if ($n  1) { return 1; } # fac(1) = 1 expr $n * [ factorial [expr $n – 1] ] # fac(n) = n * fac(n – 1) } set number … # tells which factorial to compute set machine … # identify the target machine agent_submit $machine –procs factorial –vars number –script {factorial $number } agent_receive … # receive the results (left unspecified for simplicity) Distributed Operating Systems

Overview of Code Migration in D'Agents (2) • An example of a Tel agent in D'Agents migrating to different machines where it executes the UNIX who command (adapted from [gray.r95]) all_users $machines proc all_users machines { set list "" # Create an initially empty list foreach m $machines { # Consider all hosts in the set of given machines agent_jump $m # Jump to each host set users [exec who] # Execute the who command append list $users # Append the results to the list } return $list # Return the complete list when done} set machines … # Initialize the set of machines to jump toset this_machine # Set to the host that starts the agent # Create a migrating agent by submitting the script to this machine, from where# it will jump to all the others in $machines. agent_submit $this_machine –procs all_users -vars machines -script { all_users $machines } agent_receive … #receive the results (left unspecified for simplicity) Distributed Operating Systems

Agents • Software agents • Autonomous process capable of reacting to, and initiating changes in its environment, possibly in collaboration • More than a “process” – can act on its own • Mobile agent • Capability to move between machines • Needs support for strong mobility • Example: D’Agents (aka Agent TCL) • Support for heterogeneous systems, uses interpreted languages Distributed Operating Systems

Implementation Issues (1) • The architecture of the D'Agents system. • Lowest: communication • Server: agent manage- • ment, comm. Among • agent, auth. • RTS: start & end agents, • Etc. Distributed Operating Systems

Implementation Issues (2) • The parts comprising the state of an agent in D'Agents. Distributed Operating Systems

Software Agents in Distributed Systems • Some important properties by which different types of agents can be distinguished. Distributed Operating Systems

Agent Technology • The general model of an agent platform (adapted from [fipa98-mgt]). Distributed Operating Systems

Agent Communication Languages (1) • Examples of different message types in the FIPA ACL [fipa98-acl], giving the purpose of a message, along with the description of the actual message content. Distributed Operating Systems

Agent Communication Languages (2) • A simple example of a FIPA ACL message sent between two agents using Prolog to express genealogy information. Distributed Operating Systems

Next Lecture • Files in distributed systems. • References • Chapter 10 of the book Distributed Operating Systems

Advanced Operating Systems