690 likes | 913 Vues
CT213 – Memory Management. Petronel Bigioi. Content. Memory management requirements Relocation Protection Sharing Logical organization Physical organization Address space of a process Address binding Static address binding Dynamic address binding
 
                
                E N D
CT213 – Memory Management Petronel Bigioi
Content • Memory management requirements • Relocation • Protection • Sharing • Logical organization • Physical organization • Address space of a process • Address binding • Static address binding • Dynamic address binding • Hardware assisted relocation and protection • Segmentation • Segmented virtual addresses • Address translation • Segmentation summary • Paging • Paged virtual addresses • Address translation • Paging summary • Combining segmentation and paging • Operating Systems policies
Memory management • In uni-programming systems the main memory is divided into two parts • One for operating system • The other one for the program being executed • In multiprogramming systems, the user part of memory has to be further subdivided to accommodate multiple processes • The task of subdivision is carried out by the operating system and is known as memory management. • Memory needs to be allocated efficiently to pack as many processes into memory as possible
Memory management requirements • Relocation • Loading dynamically the program into an arbitrary memory space, whose address limits are known only at execution time • Protection • Each process should be protected against unwanted interference from other processes • Sharing • Any protection mechanism should be flexible enough to allow several processes to access the same portion in the main memory • Logical organization • Most programs are organized in modules some of which are un-modifiable (read only and/or execute only) and some of which contain data that can be modified; the operating system must take care of the possibility of sharing modules across processes. • Physical organization • Memory is organized as at least two level hierarchy; The OS should hide this fact and should perform the data movement between the main memory and secondary memory without the programmer’s concern
Memory hierarchy review • It is a tradeoff between size, speed and cost • Register • Fastest memory element; but small storage; very expensive • Cache • Fast and small compared to main memory; acts as a buffer between the CPU and main memory: it contains the most recent used memory locations (address and contents are recorded here) • Main memory is the RAM of the system • Disk storage - HDD
Cache review • Every address reference goes first to the cache; if the desired address is not here, then we have a cache miss; • The contents are fetched from main memory into the indicated CPU register and the content is also saved into the cache memory • Most software exhibits temporal locality of access, meaning that it is likely that same address will be used again soon, and if so, the address will be found in the cache, when we have a cache hit • Transfers between main memory and cache occur at granularity of cache lines, around 32 or 64 bytes (rather than bytes or processor words). Burst transfers of this kind receive hardware support and exploit spatial locality of access to the cache (future access are often to address near to the previous one) • Up to the type of write, caches can be • Write-through (when the update must be generated to the main memory as well as to the cache) • Write-back (when the update is generated only to the cache, that will propagate it to the main memory at a latter time)
Process address space • When accessing memory, a process is said to operate within an address space (data items are accessible within the range of addresses available to the process) • I.e. a RISC processor has only a few instructions that contain addresses (an address field which contains a representation of a memory address) • LOAD (a specified processor register with the contents of an address) • STORE (the contents of a processor register to a specified memory address) • CALL (a procedure at a specified address) • JUMP or BRANCH (to an specified address) • The number of bits allocated to specify the address is an architectural decision • Many early computers had 16 bits for address (thus allowing for a space of 64KB of direct addressing) • A typical value is now 32, which allows for 4GB of direct addressing memory space; we say that such a system gives a virtual address space of 4GB (the amount of physical memory in such a system is likely to be less than this)
Address binding • An address used in an instruction can point anywhere in the virtual address space of the process, it still must be bound to a physical memory address • Programs are made of modules. Compilers or assemblers that are translating a module, don’t know where the module will be loaded in the physical memory • One way to deal with this is by assuming that code that they output will start at address zero in memory • A linker can take a sequence of such modules and create a single composite module, by adjusting the (relative) addresses in all but the first module; the addresses in the resulting module are still relative to its start • Compilers have the whole virtual address space at their disposal • PC contains a virtual address that must be translated to a physical address before an instruction is fetched. Addresses of operands are also virtual addresses that need to be translated before the operand is brought from the main memory (or cache) • Address translation can be dynamic or static.
Static address binding • In a simple system, a module has to be loaded into the physical main memory • OS is responsible for managing the memory, so it will give the loader a base address where to load the module (assuming that the OS identified a large enough free memory zone to contain the program) • The loader should adjust all the relative addresses in the module, converting them to absolute physical addresses. • This is called static relocation or static binding • Problems with static binding: • Once loaded, the code or data of the program can’t be moved into the memory without further relocation • All the processes executing in such a system would share same physical address space; they would not be protected from other if addressing errors occur; Even the OS code is exposed to addressing errors
Dynamic address binding • Considers keeping the loaded addresses relative to the start of a process and using some other mechanism to translate from the virtual address space of the program to the physical address space of the computer • Advantages of dynamic address binding: • A given program can run anywhere in the physical memory and can be moved around by the operating system; all of the addresses that it is using are relative to its own virtual address space, so it is unaware of the physical locations at which it happens to have been placed • It is possible to protect processes from each other and protect the operating system from application processes by a mechanism we employ for isolating the addresses seen by the processes • To realize the advantages we will need a mechanism to bind the virtual address within the loaded instructions to physical addresses when the instructions are executed
Dynamic address binding • The real machine is shared by a number of processes and we see the virtual memory of a process occupying a portion of real physical memory • The way in which the virtual address space is mapped to the physical memory must therefore be changed each time the operating system switches from one process to another
Hardware assisted relocation and protection • Dynamic binding must be implemented in hardware, since it introduces translation as part of every memory access • If the virtual address space is organized as the one described earlier, than the basic requirement is for modules to be held contiguously in physical memory and contain addresses relative to their first location • The first location is called the base of the process • Suppose that an instruction is fetched and decoded and contains an address reference • This address reference is relative to the base, so the value of the base must be added to it in order to obtain the correct physical address to be sent to the memory controller
Hardware relocation and protection • The simplest form of dynamic relocation hardware is a base register and a memory management unit MMU to perform the translation • The operating system must load the base register as part of setting up the state of a process before passing control to it • Problems with this approach is that it doesn’t provide any protection between processes: simply, adding a base address to any addresses from the process space, doesn’t help to check the result against the physical memory allocated for the process.
Hardware relocation and protection • It is natural to combine the relocation and protection functions in one unit, by adding a second register (the limit register) that delimits the upper bound of the program in the physical memory • Dynamic hardware relocation and protection units have been used in early 60’s; since, those units have evolved and the contemporary processors are using far more advanced units; however, the principles still apply • Nowadays, the basic schema with one base register and one limit register is used extensively in embedded systems.
Hardware relocation and protection • Instruction execution with protection and relocation
Segmented virtual memory • In practice is not very useful for a program to occupy a single contiguous range of physical addresses • Such as scheme would prevent two processes from sharing the code • i.e. using this scheme it is difficult to arrange two executions of same program (two processes) to access different data while still being able to share same code • This can be achieved if the system has two base registers and two limit registers, thus allowing two separate memory ranges or segments per process
Segmented virtual memory • Two processes sharing a code segment but having private data segments
Segmented virtual memory Most significant bit of the virtual address is taken as a segment identifier, with 0 for data segment and 1 for code segment
Segmented virtual memory • Within a single program is usual to have separate areas for code, stack and heap; • Language systems have conventions on how the virtual address space is arranged • Code segment will not grow in size • Heap (may be growing) • Stack at the top of virtual memory, growing downwards • In order to realize the relocation (and protection), three segments would be preferable • Easy to give example where would be nice to have four segments
Segmented virtual addresses • The segment is the unit of protection and sharing and the more we have, the more flexible we can be • Two different ways the segmented address can be organized: • Virtual address space is split into a segment number and a byte number within a segment • The number of bits used for segment addressing is usually fixed by the CPU designer • The segment number is supplied separated from the offset portion of the address • This is done in X86 processors – by default instructions fetches are performed from designated code segment, data accesses from a data segment and stack operations from a stack segment; separate instructions will be used to select which segments will fulfill each of these roles
Segmented address translation • Part of the hardware support for dynamic address translation the operating system must keep a segment table for each process in which the location of each segment is recorded; these tables need to be in fast memory • If a process can have many segments, only those currently being used for instruction fetch and operand access need to be in main memory; other segments could be held on backing store until they are needed • If an address is presented for a segment that is not present in main memory, then the address translation hardware generates an addressing exception. This is handled by the operating system, causing the segment to be fetched into main memory and the mechanism restarted • This may required an existing segment to be evicted from the main memory in order to make space • The general concept is called virtual memory. A program’s segments are located throughout the memory hierarchy and operating system will take care of segment transfers (will see again in the context of paging)
Segmentation example • Assume 16 bits logical addresses and 16 bits physical addresses • Assume the segment is represented using 4 bits • The offset is represented using 12 bits • Assuming that the segment resides in physical memory at physical address 0010000000100000 (base address), then the virtual address 0001001011110000 corresponds to 0010001100010000
Segmentation Summary • A process is divided into a number of segments that don’t need to be equal in size • When a process is brought into the main memory, then all of its segments are usually brought into main memory and a process segment table is setup. • Advantages: • The virtual address space of a process is divided into logically distinct units which correspond to constituent parts of a process • Segments are the natural units of access control; the process may have different access rights for different segments • Segments are the natural units for sharing code and data objects with other processes • Disadvantages: • Inconvenient for operating system to manage storage allocation for variable-sized segments because each segment that is held in physical memory must be contiguous; after the system has been running for a while, the free memory available can be fragmented • This is known as external fragmentation, because though the total free memory might be far greater than the size of some segment that must be loaded, still there is no single area large enough to load it • External fragmentation might be solved by moving segments that are not needed out to the disk and moving them back when they are needed. This is called swapping.
Paged virtual memory • The need to keep each loaded segment contiguous in the physical memory poses a significant disadvantage because it leads to fragmentation and complicates the physical storage allocation problem • A solution to this is called paging, where blocks of a fixed size are used for memory allocation (so that if there is any free space, it is of the right size) • Memory is divided into page frames, and the user program is divided into pages of the same size
Paged virtual memory • Typical page size is small (1 to 4KB) and in paged systems, a process would require many pages • The limited size of physical memory can cause problems so a portion of the disk storage (or a separate, dedicated high speed storage device) could be used as an extension to the main memory and the pages of a process may be in the main memory and/or in this backing store • The operating system must manage the two levels of storage and the transfer of pages between them • It must keep a page table for each process to record information about the pages • A present bit is needed to indicate whether the page is in main memory or not • A modify bit indicates if the page has been altered since last loaded into main memory • If not modified, the page does not have to be written to the disk when swapped out
Paging Example All the processes (A, B, C and D) are stored on disk and are about to be loaded in the memory (by the operating system) Process A has four pages Process B has three pages Process C has four pages Process D has five pages
Paging Example • Various page tables at the time – see previous slide • Each page table entry contains the number of the frame in main memory (if any) that holds that page • In addition, typically, the operating system maintains a list of all frames in main memory that are currently unoccupied and available for pages
Paged virtual memory address translation • Involves translation of a virtual address (page + offset) into a physical address (frame + offset) using a page table • Page table will not be hold inside of internal registers (size issues), so it must be in main memory. Each process maintains a pointer in one of its registers, to the page table • The page number is used to index that table and lookup the corresponding frame number, which combined with the offset from the virtual address, gives the real physical address
Paged virtual memory address translation • Processes could occupy huge amounts of virtual memory • i.e. assume 32 bit typical addressing, each process could have up to 232 (4GB) bytes of virtual memory • Using 4KB pages (12 bits for offset), means 220 entries could be required per process in the page table • Assuming that each page table entry (PTE) in the page table occupies 4 bytes, then the size of the page table would take 4MB per process, unacceptably high • Solution is to use a two level scheme to organize large page tables • Root page table with 210 PTEs occupying 4 kbyte of main memory • Root page always remains in the main memory, the pages in the user page table can reside in the virtual memory
Paged virtual memory address translation • The first 10 bits of a virtual address are used to index into the root page table to find a PTE for a page of the user page table • If that page is in main memory, then the next 10 bits of the virtual memory address are used to index into the user PTE page table to find the PTE for the page that is referenced by the virtual address
Paged virtual memory address translation • TLB - Translation Lookaside Buffer ( a kind of cache memory) contains the page entries that have been most recently used • Before a page can be addressed, it must have an entry set up by the operating system in the TLB • TLB is searched associatively part of each address reference
Paged virtual memory address translation • The virtual page number is extracted from the virtual address and an associative lookup is initiated • If multiple processes, then special care needs to be taken, so the page from one process would not be confused with another’s (Solution to this is to either hold in the TLB the process id or have the TLB flushed out every context switch) • If a match is found (TLB hit), then an access check is made, based in the information stored in the flags • i.e. if a write is being attempted to a page that is read only • The physical page base, taken form TLB is appended to the offset from the virtual address to form the complete physical address • The flags field will indicate the access rights and other information • If an address reference is made to a page that is in the main memory but not in the TLB, then address translation fails (TLB miss) • A new entry in the TLB needs to be created for that page • If and address reference is made to a page that is not in the main memory, the address translation will fail again. No match will be found in the address table and the addressing hardware will raise an exception, called page fault • The operating system will handle this exception
Paging address translation example • Virtual to physical address translation in paging systems
Paging Summary • Advantages – by using fixed size pages in virtual address space and fixed size pages in physical address space, it addresses some of the problems with segmentation: • External fragmentation is no longer a problem (all frames in physical memory are same size) • Transfers to/from disks can be performed at granularity of individual pages • Disadvantages • The page size is a choose made by CPU or OS designer • It may not fit the size of program data structures and lead to internal fragmentation in which storage allocation request must be rounded to an integral number of pages • There may be no correspondence between page protection settings and application data structures • If two process are to share data structures, they may do so at the level of sharing entire pages • Requiring per process page tables, it is likely that the OS require more storage for its internal data structures
Combined paging and segmentation • Both paging and segmentation have their strengths • Paging – transparent to the programmer eliminates external fragmentation, thus providing efficient use of main memory • Segmentation – visible to the programmer but with ability to handle growing data structure, modularity and support for sharing and protection • Some systems are equipped with hardware (processor) and software (operating system) to provide both • User address space is broken up into a number of segments, at the discretion of the programmer • Each segment is broken up into a number of fixed size pages, which are equal in length to a main memory frame • From the programmer point of view an logical address still consists of a segment number and an segment offset • From the system point of view, the segment offset is seen as a page number and a page offset for a page within the specified segment
Memory Management Design Issues • The design of an memory management system for an OS depends on three areas of choice: • Use or not VM memory techniques • Any new OS provides support for it • Use paging, segmentation or both • Pure segmentation is very rare; when segmentation combined with paging, most of the design issues are in the area of paging • Algorithms employed for various aspects of memory management • the idea is to try and minimize the rate at which page faults occur, because page faults cause software overhead • Performance of any policies set depends on main memory size, the relative speed of main and secondary memory, the size and number of processes competing for resources, the execution behavior of individual processes.
Operating system policies for VM • Fetch Policy • When a page should be brought in main memory • Demand , Pre-paging • Placement policy • Where in real memory a process piece is to reside • Replacement policy • Which page to be replaced when a new one is brought in the main memory • Basic algorithms • Optimal, Least recently used (LRU), First in first out (FIFO), Clock
Operating system policies for VM • Resident set management • Not all pages of a process need (or could) to be brought in the main memory. How much memory to allocate to a process?? • Resident set size • Fixed, variable • Replacement scope • Global, local • Cleaning policy • Opposite to fetch policy. Determining when a modified page should be written to secondary memory • Demand, pre-cleaning • Load control • Determining the number of processes that would be resident in the main memory • Degree of multiprogramming
Fetch Policy • When to bring a page into memory • Demand Paging – Load the page when a process tries to reference it • Tends to produce a large number of page faults when the process starts, then the page fault ratio settles down • Pre-paging – Bring in pages that are likely to be used in the near future • Try to take advantage of disk characteristics • Generally more efficient to load several consecutive sectors/pages than individual sectors due to seek, rotational latency • Hard to correctly guess which pages will be referenced • Easier to guess at program startup • May load unnecessary pages • Usefulness is not clear
Placement Policy • Determines where in the real memory a process piece is to reside • In pure segmentation systems, it is an important design issue; since most of the modern operating systems are not based on pure segmentation VM techniques, this discussion is not within our goal • In pure paging or segmentation combined with paging systems, placement is irrelevant, since the address translation hardware and main memory access hardware can perform their functions for any page-frame combinations with equal efficiency
Replacement Policy • Which page to be replaced when a new one is brought in the main memory? Several inter-related concepts are involved: • Resident set management • How many page frames are to be allocated to each active process? • The set of pages considered for replacement should be limited per process that caused the page fault or for all the page frames in the main memory? • Replacement Policy • Across the considered set pages, which particular page should be selected for replacement? • Page removed should be the page least likely to be referenced in the near future • Most policies predict the future behavior on the basis of past behavior
Replacement Policy • One restriction on replacement policy: some of the frames in main memory may be locked • Frame Locking • If frame is locked, it may not be replaced • Typical usage for locked frames: • Kernel of the operating system • Key control structures • I/O buffers • Locking is achieved by associate a lock bit with each frame
Basic Replacement Algorithms • Replacement algorithms: • Optimal, Least recently used (LRU), First-in-first-out (FIFO), Clock • Optimal policy • Selects for replacement that page for which the time to the next reference is the longest • It can be shown that this algorithm results in the fewest number of page faults. • This algorithm is impossible to implement, since it is impossible to have perfect knowledge of future events
Basic Replacement Algorithms • Least Recently Used (LRU) • Replaces the page that has not been referenced for the longest time • By the principle of locality, this should be the page least likely to be referenced in the near future • Difficult to implement • Each page could be tagged with the time of last reference. This would require a great deal of overhead, since it has to be done at each page reference (both data and instructions) • Another approach would be to maintain a stack of page references, again an expensive prospect
Basic Replacement Algorithms • First-in, first-out (FIFO) • Treats page frames allocated to a process as a circular buffer • Pages are removed in round-robin style • All that is required is a pointer that circles through the page frames of the process • Simplest replacement policy to implement • Page that has been in memory the longest is replaced • These pages may be needed again very soon
Basic Replacement Algorithms • Clock Policy • Attempt to get performance of LRU with low overhead of FIFO • Include a use bit with each page • Think of pages as circular buffer • Keep a pointer to the buffer • Set the use bit when loaded or used • When we need to remove a page: • Use pointer to scan all available pages • Scan for page with use=0 • Set bit to 0 as we go by • Performance close to LRU