Memory Management CS 550Spring 2014Kenneth Chiu
Background • Program must be brought (from disk) into memory and placed within a process for it to be run • Main memory and registers are only storage CPU can access directly • Register access in one CPU clock (or less) • Main memory can take many cycles • Cachesits between main memory and CPU registers • Protection of memory required to ensure correct operation.
Many Ways to Do Memory Management! • We will cover a variety of them. • Many of them are not mutually exclusive, and can be hybridized in various ways. • Also, many of them are not actually used as is, but are studied because they help illustrate basic principles. • What you need to learn is to get some idea of the possibilities so that when you encounter a specific one, you will be able to grasp it quickly.
The Problem • There is a finite amount of RAM on any machine. • There is a bus with finite speed, non-zero latency. • There is a CPU with a given clock speed. • There is a disk with a given speed. • How do we manage memory so that we make most effective use of the machine? • Mainly, that means preventing CPU idle time, maintaining good response time, etc. • While maintaining isolation and security.
Base and Limit Registers • Put every process at a different location in memory. • A pair of baseandlimitregisters define the logical address space.
How do you handle memory references in the machine code in such a scheme?
Logical vs. Physical Address Space • The concept of a logical address space that is bound to a separate physical address spaceis central to proper memory management. • Logical address– generated by the CPU; also referred to as virtual address. • Physical address– address seen by the memory unit. • Logical and physical addresses can be the same in primitive schemes; logical (virtual) and physical addresses differ in most modern schemes.
Memory-Management Unit (MMU) • Hardware device that maps virtual to physical address. • In a simple MMU scheme, the value in the relocation register is added to every address generated by a user process at the time it is sent to memory. • The user program deals with logical addresses; it never sees the real physical addresses.
Swapping • A process can be swapped temporarily out of memory to a backing store, and then brought back into memory for continued execution. • Backing store– fast disk large enough to accommodate copies of all memory images for all users; must provide direct access to these memory images. • Major part of swap time is transfer time; total transfer time is directly proportional to the amount of memory swapped. • Modified versions of swapping are found on many systems (i.e., UNIX, Linux, and Windows). • System maintains a ready queueof ready-to-run processes which have memory images on disk. • The term swap is not that precise. Often used slightly differently, depending on the OS.
A common early way of allocating memory is to simply provide each process with a contiguous logical address space, and map that to a contiguous range of physical addresses, called a partition. • Relocation registers used to protect user processes from each other, and from changing operating system code and data. • Relocation register contains value of smallest physical address. • Limit register contains range of logical addresses. Each logical address must be less than the limit register. • MMU maps logical address dynamically. • OS also needs a partition: • Usually held in low memory with interrupt vector. • User processes then held in high memory.
Memory Allocation • Can statically divide memory into several fixed-size partitions. • Each process goes into one partition, and each partition can only hold one process. • Disadvantage? • Can also use variable sized partitions. • OS must keep a table of occupied parts of memory. • An unoccupied chunk is called a hole. • Memory consists of occupied partitions and holes of various sizes.
Holes of various size are scattered throughout memory. • When a process arrives, it is placed into an input queue. • OS makes periodic decisions about how to allocate a partition to a process from the input queue. • A hole that is too large will be split into a partition and a new hole. • When a process exits, a neighboring hole will be merged with the newly created hole. OS OS OS OS process 5 process 5 process 5 process 5 process 9 process 9 process 8 process 10 process 2 process 2 process 2 process 2
Dynamic Storage-Allocation Problem • How to satisfy a request of size n from a list of free holes? • First-fit: Allocate the first hole that is big enough. • Fast. • Best-fit: Allocate the smallest hole that is big enough; must search entire list, unless ordered by size. • Produces the smallest leftover hole. • Worst-fit: Allocate the largest hole; must also search entire list. • Produces the largest leftover hole • Which do you thinks works best? • First-fit and best-fit better than worst-fit in terms of speed and storage utilization.
Input queue interaction also a factor. If there is a hole of size N, should we? • Use it for the smallest job in the queue that fits? • Use it for the largest job in the queue that fits? • Use it only for the next job in the queue, if it doesn’t fit, don’t allocate (FCFS). • Which will guarantee highest number of jobs per second?
Fragmentation • External Fragmentation: Wasted space due to discontiguous holes. • General rule of thumb, for a fully utilized memory, about 50% may be wasted. • Should we allocate from top or bottom of hole? • Internal Fragmentation: Allocated memory may be larger than actual requested memory, due to allocation granularity. • For example, if there is a hole of 2 bytes, no point in keeping track of it. • Usually very small for variable sized partitions. • External fragmentation can be addressed by compaction • Shuffle memory contents to place all free memory together in one large block. • Compaction is possible only if relocation is dynamic and is done at execution time. • I/O problem: What if I/O is going on? • Latch job in memory while it is involved in I/O, or • Do I/O only into OS buffers
Paging • Many of the issues seen so far are due to contiguity requirements of the RAM. Let’s relax that. • Divide physical memory into fixed-sized blocks called frames, size is a power of 2. • Divide logical memory into blocks of same size called pages • Keep track of all free frames. • To run a program of size n pages, need to find n free frames and load program. • Set up a page table to translate logical to physical addresses.
Addresses • With paging, address generated by CPU is divided into: • Page number (p): Used as an index into a page table which contains base address of each page in physical memory. • Page offset (d): Added to the base address to define the physical memory address that is sent to the memory unit. • Assume that there are m bits in the logical address, and that the page size is 2^n. Then: page number page offset p d m - n n
Example • How big is page? • Any external fragmentation? • Any internal fragmentation? • How big should a page be? 16-byte logical address space 32-byte physical memory
Free Frames • When a new process arrives, it will request N pages. • N frames need to be found. • How would you do this?
Before allocation After allocation
Implementation of Page Table • Page tables are typically done via a combination of OS support and hardware support. Many variations. • Register scheme: • Page table is kept in registers. Registers loaded during context switch. • PDP-11: Address space is 16 bits, page size is 8 KB 13 bits. Page table needs ?? entries. • Will this work if address space is 32 bits? How big does the page table need to be?
Another scheme, page table is kept in main memory: • Page-table base register (PTBR) points to the page table. • Page-table length register (PTLR) indicates size of the page table. • These registers are reloaded on context switch. • How many memory accesses does a load instruction take? • In this scheme every data/instruction access requires two memory accesses. One for the page table and one for the data/instruction. • The two memory access problem can be solved by the use of a special hardware cache called the translation look-aside buffer (TLBs), which is a form of associative memory.
Associative Memory • Associative memory is a fancy name for a lookup table. • Use a key, produce a value. • In software, we would implement it using a hash table, binary search tree, etc. • In hardware, the idea is that it’s a parallel search on all keys. Very expensive. • Address translation of (p, d) is then: • If p is in associative register, produce the frame #. • Otherwise get frame # from page table in memory. Page # Frame #
Effective Access Time • How long does a lookup take with a TLB? • A crucial performance measure is the hit ratio, h, which is the percentage of time that the desired page is in the TLB. • Assume TLB lookup time is l. • Assume memory cycle time is m. • The effective access time (EAT) is then: EAT = (m + l) h + (2m + l)(1 – h) = 2m + l – h*m • Example: Assume 80% hit ratio, 20 ns to search TLB, 100 ns to access memory. • If in TLB, how long does it take access memory? • If not in TLB, how long does it take? • EAT = .8*120 + .2*220 = 140 ns
Context Switches • What is the key in the TLB? • What happens when we do a context switch? • Normally must flush the TLB. • What does this do to performance? • Some TLBs store address-space identifiers (ASIDs) in each TLB entry which uniquely identifies each process to provide address-space protection for that process. • This allows the TLB to not be flushed.
Memory Protection • Does memory need to be protected? Why? • Should we associate the protection with the frame (physical), or the page (logical)? • Memory protection implemented by associating protection bit with each page. • Bits are added to the page table entry. • These bits indicate things like read/write/execute. • One important one is the valid-invalid: • “Valid” indicates that the associated page is in the process’ logical address space, and is thus a legal page. • “Invalid” indicates that the page is not in the process’ logical address space.
Consequence of internal fragmentation? Process only uses to 10,468 Page ends at 12,287
Sharing Pages • Do we ever want to share memory between processes? Why? • Multiple processes executing the same program: • One copy of read-only code shared among processes (i.e., text editors, compilers, window systems). • Shared libraries: • Each process keeps a separate copy of the non-shared code and data. • The pages for the private code and data can appear anywhere in the logical address space. • IPC: • Implementing messages, etc. • How can this be done with paging?
Structure of the Page Table • How big is the page table? • 4K pages, 32-bit system • 2^12 == 4K, 2^20 == 1 M entries • Each entry is 4 bytes, so 4 MB • Why is this a problem? • Is the page table managed by hardware or software? • Could be software. On TLB miss, exception is generated, software is responsible for traversing data structures and loading TLB. • The page table is just a lookup. What techniques are there available for lookup? • Binary search tree • Linked list • Arrays • Radix-based • Hash-based
Hierarchical Page Tables • Break up the logical address space into multiple page tables. • A simple technique is a two-level page table. • First-level page table is contiguous, but small. • Each first-level PTE points to a separate 2nd-level table. • Each 2nd-level table must still be contiguous, but there are many of them, so each one is (relatively) small.
Two-Level Paging Example • A logical address (on 32-bit machine with 4K page size) is divided into: • A page number consisting of 20 bits. • A page offset consisting of 12 bits. • Since the page table is paged, the page number is further divided into: • A 10-bit page number • A 10-bit page offset • Thus, a logical address is as follows:where p1 is an index into the outer page table, and p2 is the displacement within the page of the outer page table. page number page offset p2 p1 d 10 10 12
VAX Example • 32-bit machine, 512 byte page size • Logical address space divided into 4 sections, each 230 bytes. • High-order 2 bits used to designate the section. • Next 21 bits for the page number of the section. • Last 9 bits for offset within the page. How big is each page table? section page offset 2 bits 21 bits 9 bits
64-Bit Systems • How big is the outer page table? • Three-level?
Hashed Page Tables • Use a hash table for lookup. • The virtual page number is hashed and used as an index into a hash table. • That entry in the table contains a chain of elements hashing to the same location. • I.e., virtual page numbers that have the same hash code. • Virtual page numbers are compared in this chain searching for a match. • If a match is found, the corresponding physical frame is extracted.
page number frame number
Inverted Page Table • Normal page table has one entry for each page. • Optimizing it amounts to figuring out how to not use resources for large chunks of the page table that are empty. • If the ratio of logical address space to physical address space is large, this suggests using one entry per physical frame, rather than one entry per page. • Entry consists of the virtual address of the page stored in that real frame, with information about the process that owns that page. • Decreases memory needed to store each page table, but increases time needed to search the table when a page reference occurs. • Use hash table to limit the search to one — or at most a few — page-table entries.
logical address physical address pid2 p2 d i d inverted page table hash anchor table hashfunction i