Resource Management Policy and Mechanism

Resource Management Policy and Mechanism Jeff Chase Duke University

The kernel syscall trap/return fault/return system call layer: files, processes, IPC, thread syscalls fault entry: VM page faults, signals, etc. thread/CPU/core management: sleep and ready queues memory management: block/page cache sleep queue ready queue I/O completions timer ticks interrupt/return

The kernel syscall trap/return fault/return system call layer: files, processes, IPC, thread syscalls fault entry: VM page faults, signals, etc. thread/CPU/core management: sleep and ready queues memory management: block/page cache policy policy sleep queue ready queue I/O completions timer ticks interrupt/return

Separation of policy and mechanism • Every OS platform has mechanisms that enable it to mediate access to machine resources. • Gain control of core by timer interrupts • Fault on access to non-resident virtual memory • I/O through system call traps • Internal code and data structures to track resource usage and allocate resources • The mechanisms enable resource management policy. • But the mechanisms do not and must/should not determine the policy. • We might want to change the policy!

Goals of policy • Share resources fairly. • Use machine resources efficiently. • Be responsive to user interaction. But what do these things mean? How do we know if a policy is good or not? What are the metrics? What do we assume about the workload?

Memory Allocation Howshould an OS allocate its memory resources among contending demands? • Virtual address spaces: fork, exec, sbrk, page fault. • The kernel controls how many machine memory frames back the pages of each virtual address space. • The kernel can take memory away from a VAS at any time. • The kernel always gets control if a VAS (or rather a thread running within a VAS) asks for more. • The kernel controls how much machine memory to use as a cache for data blocks whose home is on slow storage. • Policy choices: which pages or blocks to keep in memory? And which ones to evict from memory to make room for others?

What is a Virtual Address Space? • Protection domain • A “sandbox” for threads that limits what memory they can access for read/write/execute. • Each thread is in exactly one sandbox, but many threads may play in the same sandbox. • Uniform name space • Threads access their code and data items without caring where they are in physical memory, or even if they are resident in memory at all. • A set of VP translations • A level of indirection from virtual pages to physical frames. • The OS kernel controls the translations in effect at any time.

Introduction to Virtual Addressing text data BSS user stack args/env kernel virtual memory (big?) physical memory (small?) Code addresses memory through virtual addresses. The kernel controls the virtual-physical translations in effect (space). data The kernel and the machine collude to translate virtual addresses to physical addresses. The machine does not allow a user process to access memory unless the kernel “says it’s OK”. virtual-to-physical translations The specific mechanisms for implementing virtual address translation are machine-dependent.

Virtual Memory as a Cache text data BSS user stack args/env kernel virtual memory (big) physical memory (small) backing storage executable file header pageout/eviction text data data idata wdata symbol table, etc. page fetch page frames program sections process segments virtual-to-physical translations

Virtual Address Translation 12 Example: typical 32-bit architecture with 4KB pages. 0 VPN offset Virtual address translation maps a virtual page number (VPN) to a physical page frame number (PFN): the rest is easy. address translation Deliver exception to OS if translation is not valid and accessible in requested mode. { + PFN physical address offset

Cartoon View Each process/VAS has its own page table. Virtual addresses are translated relative to the current page table. process page table (map) PFN 0 PFN 1 PFN i In this example, each VPN j maps to PFN j, but in practice any physical frame may be used for any virtual page. PFN i + offset page #i offset The maps are themselves stored in memory; a protected register holds a pointer to the current map. user virtual address physical memory page frames

Under the Hood probe page table MMU access physical memory load TLB start here probe TLB access valid? raise exception load TLB zero-fill OS page on disk? page fault? fetch from disk allocate frame signal process

Page/block maps Idea: use a level of indirection through a map to assemble a storage object from “scraps” of storage in different locations. The “scraps” can be fixed-size slots: that makes allocation easy because they are interchangeable. map Example: page tables that implement a VAS.

Names and layers User view notes in notebook file Application notefilefd, byte range* fd File System bytes block# device, block # Disk Subsystem surface, cylinder, sector Add more layers as needed.

Representing a File On Disk file attributes: may include owner, access control list, time of create/modify/access, etc. once upo n a time /nin a l logical block 0 block map Index by logical block number and far far away ,/nlived t logical block 1 physical block pointers in the block map are sector IDs or physical block numbers he wise and sage wizard. logical block 2 “inode”

A filesystem on disk inode 0 bitmap file inode 1 root directory fixed locations on disk 11100010 00101101 10111101 wind: 18 0 0 snow: 62 rain: 32 hail: 48 10011010 00110001 00010101 allocation bitmap file blocks 00101110 00011001 01000100 directory blocks once upo n a time /n in a l file blocks and far far away , lived th regular file (inode) This is a toy example (Nachos).

The Buffer Cache Proc Memory File cache

File Buffer Cache Proc • Avoid the disk for as many file operations as possible. • Cache acts as a filter for the requests seen by the disk - reads served best. • Delayed writeback will avoid going to disk at all for temp files. Copyin/copyout File cache

Page/block cache internals HASH(blockID) Each frame/buffer of memory is described by a meta-object (header). Resident pages or blocks are accessible through through a global hash table. An ordered list of eviction candidates winds through the hash chains. Some frames/buffers are free (no valid data). These are on a free list.

VM page cache internals HASH(segment, page offset) 1. Pages in active use are mapped through the page table of one or more processes. 2. On a fault, the global object/offset hash table in kernel finds pages brought into memory by other processes. 3. Several page queues wind through the set of active frames, keeping track of usage. 4. Pages selected for eviction are removed from all page tables first.

Replacement • Think of physical memory as a cache • What happens on a cache miss? • Page fault • Must decide what to evict • Goal: reduce number of misses

Review of replacement algorithms • Random • Easy implementation, not great results • FIFO (first in, first out) • Replace page that came in longest ago • Popular pages often come in early • Problem: doesn’t consider last time used • OPT (optimal) • Replace the page that won’t be needed for longest time • Problem: requires knowledge of the future

Review of replacement algorithms • LRU (least-recently used) • Use past references to predict future • Exploit “temporal locality” • Problem: expensive to implement exactly • Why? • Either have to keep sorted list • Or maintain time stamps + scan on eviction • Update info on every access (ugh)

LRU • LRU is just an approximation of OPT • Could try approximating LRU instead • Don’t have to replace oldest page • Just replace an old page

Locality Principle of Locality: • Programs tend to reuse data and instructions near those they have used recently, or that were recently referenced themselves. • Temporal locality: Recently referenced items are likely to be referenced in the near future. • Spatial locality: Items with nearby addresses tend to be referenced close together in time. • Locality Example: • Data • Reference array elements in succession (stride-1 reference pattern): • Reference sum each iteration: • Instructions • Reference instructions in sequence: • Cycle through loop repeatedly: sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; Spatial locality Temporal locality Spatial locality Temporal locality

Memory Hierarchies Some fundamental and enduring properties of hardware and software: • Fast storage technologies cost more per byte and have less capacity. • The gap between CPU and main memory speed is widening. • Well-written programs tend to exhibit good locality. These fundamental properties complement each other beautifully. They suggest an approach for organizing memory and storage systems known as a memory hierarchy.

L1 cache holds cache lines retrieved from the L2 cache memory. L2 cache holds cache lines retrieved from main memory. Main memory holds disk blocks retrieved from local disks. Local disks hold files retrieved from disks on remote network servers. An Example Memory Hierarchy Smaller, faster, and costlier (per byte) storage devices L0: registers CPU registers hold words retrieved from L1 cache. on-chip L1 cache (SRAM) L1: off-chip L2 cache (SRAM) L2: main memory (DRAM) L3: Larger, slower, and cheaper (per byte) storage devices local secondary storage (local disks) L4: remote secondary storage (distributed file systems, Web servers) L5:

Caches Cache: A smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device. Fundamental idea of a memory hierarchy: • For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1. Why do memory hierarchies work? • Programs tend to access the data at level k more often than they access the data at level k+1. • Thus, the storage at level k+1 can be slower, and thus larger and cheaper per bit. • Net effect: A large pool of memory that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top.

Smaller, faster, more expensive device at level k caches a subset of the blocks from level k+1 8 Level k: 9 14 3 Data is copied between levels in block-sized transfer units Caching in a Memory Hierarchy 4 10 10 4 0 1 2 3 Larger, slower, cheaper storage device at level k+1 is partitioned into blocks. 4 4 5 6 7 Level k+1: 8 9 10 10 11 12 13 14 15

General Caching Concepts Program needs object d, which is stored in some block b. Cache hit • Program finds b in the cache at level k. E.g., block 14. Cache miss • b is not at level k, so level k cache must fetch it from level k+1. E.g., block 12. • If level k cache is full, then some current block must be replaced (evicted). Which one is the “victim”? • Placement policy: where can the new block go? E.g., b mod 4 • Replacement policy: which block should be evicted? E.g., LRU Request 12 Request 14 14 12 0 1 2 3 Level k: 14 4* 9 14 3 12 4* Request 12 12 4* 0 1 2 3 Level k+1: 4 5 6 7 4* 8 9 10 11 12 13 14 15 12

0: 1: CPU N-1: A System with Virtual Memory Examples: • workstations, servers, modern PCs, etc. Memory Page Table Virtual Addresses Physical Addresses 0: 1: P-1: Disk • Address Translation: Hardware converts virtual addresses to physical addresses via OS-managed lookup table (page table)

Page Faults (like “Cache Misses”) What if an object is on disk rather than in memory? • Page table entry indicates virtual address not in memory • OS exception handler invoked to move data from disk into memory • current process suspends, others can resume • OS has full control over placement, etc. Before fault After fault Memory Memory Page Table Page Table Virtual Addresses Physical Addresses Virtual Addresses Physical Addresses CPU CPU Disk Disk

Dynamic address translation User process Translator (MMU) Physical memory Physical address Virtual address Will this allow us to provide protection? Sure, as long as the translation is correct

The Page Caching Problem Each thread/process/job utters a stream of page references. • reference string: e.g., abcabcdabce.. The OS tries to minimize the number of faults incurred. • The set of pages (the working set) actively used by each job changes relatively slowly. • Try to arrange for the resident set of pages for each active job to closely approximate its working set. Replacement policy is the key. • On each page fault, select a victim page to evict from memory; read the new page into the victim’s frame. • Simple: replace the page whose next reference is furthest in the future (OPT).

Managing the VM Page Cache • Managing a VM page cache is similar to a file block cache, but with some new twists. • Pages are typically referenced by page table (pmap) entries. • Must invalidate mappings before reusing the frame. • Reads and writes are implicit; the TLB hides them from the OS. • How can we tell if a page is dirty? • How can we tell if a page is referenced? • Cache manager must run policies periodically, sampling page state. • Continuously push dirty pages to disk to “launder” them. • Continuously check references to judge how “hot” each page is. • Balance accuracy with sampling overhead.

public interface IVirtualDisk { /* Read a block specified by the dBID into buffer */ public void readBlock(int dBID, byte buffer[]) throws…; /* Write to block specified by the dBID from buffer */ public void writeBlock(int dBID, byte buffer[]) throws…; /* * Start an asynchronous request to the device/disk. * -- operation is either READ or WRITE * -- callbackIdentifer is an identifier the caller may use to match the * responses from the device (through a callback) with the requests. The * device does not interpret the callbackIdentifer, it just passes with * it along with the callback. * -- blockID uniquely identifies the block to access * -- buffer[] is a byte array used for read/write operations */ public void startRequest(DiskOperationType operation, int callbackIdentifer, int blockID, byte buffer[]) throws…; }

public interface IDFS { /* creates a new DFile and returns the DFileID */ public DFileID createDFile(); /* destroys the file specified by the DFileID */ public void destroyDFile(DFileID dFID); /* reads the file specified by DFileID starting from the offset startOffset * to the count specified into the buffer */ public int read(DFileID dFID, byte[] buffer, int startOffset, int count); /* writes to the file specified by DFileID from the buffer starting at * offset startOffset upto the count specified */ public int write(DFileID dFID, byte[] buffer, int startOffset, int count); /* List all the existing DFileIDs in the associated volume _volName */ public List<DFileID> listAllDFiles(); }

public abstract class DBufferCache implements VirtualDiskCallback { /* * Buffer allocation: Get locked buffer that can be used for block specified * by blockID */ public abstract DBuffer getBlock(int dBID); /* Release the locked buffer so that others waiting on it can use it */ public abstract void releaseBuffer(byte[] buffer); /* * sync() writes back all dirty blocks to DStore and forces DStore * to write back all contents to the disk device. The sync( ) method should * maintain clean block copies in DBufferCache. */ public abstract void sync(); /* Similar to sync() but invalidates all cached blocks unlike sync(). */ public abstract void flush(); }

public abstract class DBuffer { /* If the block is not in cache, start a fetch from disk asynchronously */ public abstract void startFetch(); /* Push a buffer block to device/disk asynchronously */ public abstract void startPush(); /* Check whether the buffer is in use */ public abstract boolean checkValid(); /* Wait until the buffer is free */ public abstract boolean waitValid(); /* Check whether the buffer is dirty, i.e., written to memory but not written to the disk device yet */ public abstract boolean checkClean(); /* Wait until the buffer is clean */ public abstract boolean waitClean(); }

public abstract class DBuffer { /* * reads into the buffer[ ] array the cache block specified by blockID from * the DBufferCache if it is in cache, otherwise reads the corresponding * disk block from the disk device. Upon an error, it should return -1, * otherwise return number of bytes read. */ public abstract int read(int blockID, byte[] buffer, int startOffset, int count); /* * writes the buffer[ ] array contents to the cache block specified by * blockID from the DBufferCache if it is in cache, otherwise finds a free * cache block and writes the buffer [ ] contents on it. Upon an error, it * should return -1, otherwise return number of bytes written. */ public abstract int write(int blockID, byte[] buffer, int startOffset, int count); }

How it should be

create, destroy, read, write a dfile list() dfiles sync() cache DFS sync(); DBuffer = getBlock(blockID); releaseBlock(buf); copy bytes to/from buffer startFetch(), startPush() waitValid(), waitClean() DBufferCache DBuffer ioComplete() startRequest(r/w) VirtualDisk

DFS /* creates a new dfile and returns the DFileID */ public DFileID createDFile(); /* destroys the dfile named by the DFileID */ public void destroyDFile(DFileID dFID); /* reads contents of the dfile named by DFileID into the buffer * starting from buffer offset startOffset; at most count bytes are transferred */ public int read(DFileID dFID, byte[] buffer, int startOffset, int count); /* writes to the file specified by DFileID from the buffer * starting from buffer offset startOffset; at most count bytes are transferred */ public int write(DFileID dFID, byte[] buffer, int startOffset, int count); /* List DFileIDs for all existing dfiles in the volume */ public List<DFileID> listAllDFiles();

DBufferCache /* Get buffer for block specified by blockID The buffer is “busy” until the caller releases it. */ public DBuffer getBlock(int blockID); /* Release the buffer so that others */ public void releaseBlock(DBuffer buf); /* Write back all dirty blocks to the volume, and wait for completion. */ public void sync();

DBuffer /* Start an asynchronous fetch of associated block from the volume */ public abstract void startFetch(); /* Start an asynchronous write of buffer contents to block on volume */ public abstract void startPush(); /* Check whether the buffer has valid data*/ public abstract boolean checkValid(); /* Wait until the buffer is free */ public abstract boolean waitValid(); /* Check whether the buffer is dirty, i.e., has modified data to be written back */ public abstract boolean checkClean(); /* Wait until the buffer is clean, i.e., until a push operation completes */ public abstract boolean waitClean(); /* Check if buffer is evictable: not evictable if I/O in progress, or buffer is held. */ public abstract boolean isBusy();

DBuffer /* * reads into the buffer[ ] array from the contents of the DBuffer. * Check first that the DBuffer has a valid copy of the data! * startOffset and count are for the buffer array, not the DBuffer. */ public int read(byte[] buffer, int startOffset, int count); /* * writes into the Dbuffer from the contents of buffer[ ] array. * startOffset and count are for the buffer array, not the Dbuffer. * Mark buffer dirty! */ public int write(byte[] buffer, int startOffset, int count); }

VirtualDisk /* * Start an asynchronous request to the device/disk. * Nature of the request is encoded in the state of the DBuffer * -- operation is either READ or WRITE * -- blockID uniquely identifies the block to access * -- buffer[] is a byte array used for read/write operations */ public void startRequest(DBuffer buf) throws…;

Resource Management Policy and Mechanism

Resource Management Policy and Mechanism

Presentation Transcript

Resource Management

Resource management

Compensation Policy, Human Resource Management Practices and Takeovers

Resource Allocation and management

Resource Management

RESOURCE MANAGEMENT

Resource Management

Resource Management

Coordination Mechanism and Policy for Disaster Management in Mozambique

Water Resource Policy

The petroleum sector in Norway: Resource management and policy

Resource Management and Balancing

The “Total Ecology” of Pelagic Resource Policy and Management

Resource Management

Resource management and Synchronization

Reserve and resource management

Process for Policy Development and Mechanism for Policy Concerns

Forestry and Resource Management

Reserve and resource management

Resource Management

Resource Management and Balancing

Forestry and Resource Management