Memory Management in Intel Pentium Processor

Chapter 8: Main Memory

Chapter 8: Memory Management • Background • Swapping • Contiguous Memory Allocation • Paging • Structure of the Page Table • Segmentation • Example: The Intel Pentium

Objectives • To provide a detailed description of various ways of organizing memory hardware • To discuss various memory-management techniques, including paging and segmentation • To provide a detailed description of the Intel Pentium, which supports both pure segmentation and segmentation with paging

Background • Program must be brought from disk into main memory and placed within a process for it to be run • Main memory and registers are only storage CPU can access directly • Registers internal to CPU, accessed in one CPU cycle • Main memory accessed through memory bus, can take many CPU cycles • Causes memory stalls • Cache sits between main memory and CPU registers • Faster access, compensates for speed difference between CPU and memory • Accurate memory management and hardware-enforced memory protection are essential to safe multitasking

Background • Each process has its own memory space • Defined by base registerand limitregister • Hardware-enforced protection: CPU checks every memory access the process makes to insure it’s in the process’ space • Generates interrupt on illegal memory access • Only kernel-level privilege can supersede this for: • Operations to load/change the base & limit register addresses • Accessing other process’s memory space

Address Binding • Memory starts at 0x0, but program can start from any base address • Program code uses symbolic addresses (like function and variable names) • Compiler binds them to relocatable addresses (fixed addresses relative to the start of the program’s memory space) • Linker or loader binds relocatable addresses to fixed physical addresses in memory

Address Binding • Compile time • If memory location known a priori, absolute codecan be generated • Must recompile code if starting location changes • Load time • Compiler generates relocatable code if memory location is not known at compile time • Can be loaded at any starting location without recompiling, but must be reloaded if starting location changes during execution • Execution time • Binding delayed until run time if the process can be moved during its execution from one memory segment to another • Approach used by most OS

Logical vs. Physical Address Space • We need to differentiate two different kinds of memory addresses • A memory address in a program, as seen by the CPU, is a logicaladdress, also called a virtual address • The corresponding address in physical memory is a physicaladdress • Logical and physical addresses are the same in compile-time and load-time address-binding schemes; logical (virtual) and physical addresses differ in execution-time address-binding scheme

Memory-Management Unit (MMU) • Run-time mapping from virtual to physical memory address done by special hardware: the MMU • Can see it as an interface hardware between CPU and memory • In reality, it’s part of the CPU (has been part of Intel CPU since 80286) • MMU adds the value in a relocation register (i.e. the base address) to every address generated by a user process at the time it is sent to memory • The user program only deals with logical addresses; it never sees the real physical addresses

Dynamic Loading • Routines are not loaded until it is called • Calling routine checks if called routine is in memory; load it if not • Until then, routine kept on disk in relocatable code format • Better memory-space utilization; unused routine is never loaded • Useful when large amounts of code are needed to handle infrequently occurring cases • Operating system provides dynamic loading library • User program’s responsibility to take advantage of dynamic loading

Dynamic Linking • Linking can be postponed until execution time • Dynamically linked libraries (*.DLL in Windows) • At compile time, a stub code is included in the binary instead of the routine; used to locate the appropriate library routine • At run time, stub replaces itself with the address of the routine in DLL, and executes the routine • Very advantageous • Only one copy of common routines • Saves disk/memory spaces, makes updates easier • Since routine can be loaded in another process’s protected memory space, kernel support needed to check if routine is in memory

Swapping • A process can be swapped temporarily out of memory to a backing store (typically, hard drive), and then brought back into memory for continued execution • Roll out, roll in: swapping variant used for priority-based scheduling algorithms; lower-priority process is swapped out so higher-priority process can be loaded and executed • Major part of swap time is transfer time; total transfer time is directly proportional to the amount of memory swapped • System maintains a special ready queue of runnable processes which have memory images on disk

Memory Protection • Several user processes loaded into memory at once • Registers used to protect user processes from each other, and from changing operating-system size due to transient code • Limit register contains range of logical addresses; each logical address must be less than the limit register or interrupt is generated • Relocation register contains value of base address in memory • MMU maps logical address to physical address dynamically

Contiguous Allocation • Multiple Partition Method • Memory divided in fixed-size partitions; each partition contains exactly one process • When a process terminates, its partition is freed and allocated to a new process • One of the earliest memory allocation techniques, used in IBM OS/360 (circa 1964) • Clearly limited • Limits multitasking to the number of partitions • Limits size of processes to size of partitions OS process 5 process 8 process 9 process 2

Contiguous Allocation • Variable Partition Scheme • OS keeps track of which parts of memory are occupied/free • Initially memory is empty; as processes are added and then removed, holes of available memory of various sizes form • When a process is added to memory, it is allocated memory from a hole large enough to accommodate it • If there isn’t one, OS can wait (reduce multitasking) or allocate a smaller process (risk starvation) • This is dynamic allocation OS OS OS OS process 5 process 5 process 5 process 5 process 9 process 9 process 8 process 10 process 2 process 2 process 2 process 2

Contiguous Allocation • Dynamic allocation problem: Given several possible holes of different sizes, how to pick the one to allocate process into? • First-fit: Allocate the first hole that is big enough • Fastest method • Best-fit: Allocate the smallest hole that is big enough • Must search entire list (unless ordered by size) • Produces the smallest leftover hole (least memory wasted) • Worst-fit: Allocate the largest hole • Must search entire list (unless ordered by size) • Produces the largest leftover hole (which can then be allocated to another process) • Good idea, but typically the worst of the three methods both in speed and memory use

Fragmentation • External Fragmentation • Total memory space exists to satisfy a request, but it is not contiguous • On average, a third of memory lost to fragmentation • Internal Fragmentation • Typically, memory is divided in blocks and allocated by block, rather than byte • Allocated blocks may be larger than requested memory (i.e. last block not entirely needed); the extra memory in the block is lost • Unused memory internal to a partition • Reduce external fragmentation by compaction • Shuffle memory contents to place all free memory together in one large hole • Done at execution time; only possible if dynamic relocation is allowed • Can be a very expensive operation OS process 5 process 9 process 10 process 2

Paging • Non-contiguous allocation eliminates problems of external fragmentation and compaction; also makes swapping easier • Paging • Supported by hardware • Physical memory divided into fixed-sized blocks called frames (size is power of 2, between 512 bytes and 16 MB) • Logical memory divided into blocks of same size called pages • OS keeps track of all free frames • To run a program of size n pages, need to find n free frames and load program • Page table used to translate logical to physical addresses • Internal fragmentation is still a problem though

Paging

Paging Address Translation • Address generated by CPU is divided into: • Page number (p) used as an index into the page table which contains base address of each frame (f) in physical memory • Page offset (d) combined with base address to define the physical memory address that is sent to the memory unit • For given logical address space 2m divided in page size2n, a logical address translates to: page number page offset p d m - n n

Paging Address Translation 16-byte of logical memory and 4-byte pages/frames: m = 4 and n = 2 page number page offset 10 01 1001 = 9 (contains j) Page number 10 (2, directs to frame 1, which begins at 1x4 = 4 bytes in physical memory) Offset 01 (1, means +1byte) Total: 1x4 +1 = 5 bytes in physical memory (contains j)

Paging Hardware

Paging • No external fragmentation • Any free frame can be allocated • Internal fragmentation capped at almost 1 frame per process • Worst case: process needs n frames + 1 byte, meaning OS allocates n+1 frames and wastes almost entire last one • So we’d like frames as small as possible • Overhead for a page table for each process, which is in main memory • A page table entry is 4 bytes (32 bits) large • Size of page table proportional to the number of pages it contains • So we’d like pages as large as possible • Typically, pages 4KB in size • Can be larger – Solaris pages are 8Kb to 4MB in size

Frames • Process sees its memory space as a contiguous set of pages • In reality, OS loads it page by page in whichever frames are available • OS needs to maintain a frame table • How many frames there are • Which frames are allocated • Which frames are free

Frames After allocation Before allocation

Implementation of Page Table • Hardware implementation • Page table stored in CPU in dedicated registers • Fast address translation • Modified and updated only by the kernel’s dispatcher • Changing page table requires changing all registers • Number of registers limits number of page table entries; typically less than 256 • Not practical for modern computers, with millions of page table entries • Software implementation • Page table stored in main memory • CPU only stores the page table base register (PTBR) • Only one register to modify to change the page table • Every data/instruction access requires two memory accesses, one for the page table and one for the data/instruction • Slower translation

Translation Look-aside Buffer • The two-memory-access problem can be solved by the use of a special fast-lookup hardware cache called translation look-aside buffer(TLB) • TLB entry: <page number, frame number> • Special hardware search all keys simultaneously • Very fast • Very expensive, so limited to 64 – 1024 entries • If page is found (TLB hit), this takes less than 10% longer than direct memory access! • If not found (TLB miss) • Look for it in the page table • Add it to the TLB, clearing another entry if TLB is full • Some TLB include an address space identifier (ASID) with each entry, identifying the process to which it belongs • TLB can safely store pages from multiple processes • Without this info, TLB needs to be flushed each time a new page table is loaded

Translation Look-aside Buffer

Translation Look-aside Buffer • The Effective Access Time(EAT) is the time to access an address in physical memory • For example • It takes 100 ns to access data in physical memory once the address is known • It takes 20 ns to look up an address in the TLB • 90% TLB hit rate • In the 10% TLB misses, we then need to access the page table in memory (an extra 100ns) • EAT for accessing memory directly (as in Real Mode) = 100ns • EAT for paging = 0.9 * (20 + 100) + 0.1 * (20 + 100 + 100) = 130ns

Memory Protection • Memory protection implemented by associating protection bits with each frame • Example: read / write / execute bits • Illegal operations caught by CPU and cause interrupt • Valid-invalid bit attached to each entry in the page table • “valid” indicates that the associated page is in the process’ logical address space, and is thus a legal page • “invalid” indicates that the page is not in the process’ logical address space

Valid or Invalid Bit Internal fragmentation page table length register

Shared Pages • Pages give us a new option for shared memory: shared pages • If reentrant code, execute-only code that cannot be modified, is used by several processes, we can put this code in shared pages that are part of all processes • Code is not modifiable, so no race condition / critical section problem • Shared code must appear in same location in the logical address space of all processes

Structure of the Page Table • Hierarchical Paging • Tables of page tables • Hashed Page Tables • Page table sorted by hash index • Inverted Page Tables • Table entries represent frames, maps to pages

Hierarchical Page Tables • Given a modern computer with large (32 bit) logical address space and small (4KB) pages, each process may have a million pages • Each page is represented by a 4 byte entry in the page table, giving a 4MB page table per process • Would be a lot to allocate in a contiguous block in main memory • Break up the logical address space into multiple page tables • A simple technique is a two-level page table • Outer page table point to (inner) page table • Page table points to page, as before

Two-Level Page-Table Scheme

Two-Level Paging Translation • A logical address (on 32-bit machine with 4K page size) is divided into: • a page number consisting of 20 bits • a page offset (d) consisting of 12 bits • Since the page table is paged, the page number is further divided into: • a 10-bit page number (p1) • a 10-bit page offset (p2) • Thus, a forward-mappedlogical address translates as follows:

Three-Level Paging Scheme • Given a 64-bit processor and logical space, two-level paging may be insufficient • To keep inner page table size constant, outer page table grows considerably • Alternative is three-level paging • Second outer page table maps to outer page table, which maps to inner page table, which maps to memory • Second outer page still big, we could split it again • And again, and again… • But for each split, we need one more memory access to translate the address, which slows down the system

Hashed Page Tables • Hash function converts large & complex data into “unique” & uniformly-distributed integer values • Unique in theory, but in practice hash collisions occur where different data to map to the same hash value • Hashed page table • Apply hash function to page number • Entry in hashed page table is a linked list of <page#, frame#> entries, to account for collisions • Find the entry that matches the page number, and retrieve the frame number

Inverted Page Table • Page table has one entry for each page in logical memory, consisting the address of the frame storing it • Inverted page table has one entry for each frame in physical memory • Each entry consists of the address of the page stored in that memory location • With information about the process that owns that page • No need for one page table per process – only one table for the system • Use hash function to improve search

Segmentation • Segmentationis a memory-management scheme similar to a user view of memory • Memory is separated in segments such as “the code”, “the data”, “the subroutine library”, “the stack” • No real order or relation between them • Memory access is done by specifying two information, segment number and offset

1 4 2 3 Segmentation 1 2 3 4 physical memory user view

Segmentation Architecture • Logical address consists of <segment-number, offset> • Segment tableit to a physical addresses • Segment base: starting physical address • Segment limit: length of the segment

Segmentation Architecture

Example: The Intel Pentium • Part of the Intel IA-32 architecture • Pentium combines segmentation and paging • CPU generates logical address • Given to segmentation unit (GDT), which produces linear addresses • Given to paging unit, which generates physical address in main memory • Segmentation unit + paging unit  MMU

Pentium Segmentation • Logical address space of a process divided in two partitions • Private segments to the process; information in local descriptor table • Shared segments between all processes; information stored in globaldescriptor table • Logical address is <selector, offset> • Selector is <segment, global/local, privilege level> • Converted to linear address • CPU • Six segment registers (CS, DS, ES, SS, FS, GS) • Six 8-byte cache registers to hold the corresponding descriptors; no need to fetch them from memory each time

Pentium Paging • Two-level paging scheme • 32-bit linear address • Page directory(outer page table) entry pointed by bits 22-31, points to an inner page table • Inner page table entry pointed by bits 12-21, points to 4KB page • Offset inside page pointed by bits 0-11 • Gives the physical address • CPU • Page directory of current process pointed to by Page Directory Base Register, contained in CR3 register linear address physical address

Example: Linux on Pentium • Only uses six segments • Kernel code; kernel data; user code; user data; task state segment (TSS); default LDT segment • All processes share user code/data segments, but have their own TSS and may create a custom LDT segment • Linux only recognises two privilege levels, kernel (00) and user (11)

Example: Linux on Pentium • Three-level paging scheme • Logical address is <global directory, middle directory, page table, offset> • But Pentium only has a two-level paging scheme! • On Pentium, Linux middle directory is 0 bits (i.e. non-existent)

Review • What is the difference between external and internal fragmentation? • What is the difference between logical and physical memory? • Compare the memory allocation schemes of contiguous variable-size memory allocation, contiguous segmentation, and paging, in terms of external fragmentation, internal fragmentation, and memory sharing.

Exercises • Read everything • If you have the “with Java” textbook, skip the Java sections and subtract 1 to the following section numbers • 8.1 • 8.2 • 8.3 • 8.5 • 8.6 • 8.9 • 8.11 • 8.13 • 8.14 • 8.15 • 8.17 • 8.23 • 8.25 • 8.26 • 8.27 • 8.28 • 8.29 • 8.32

Memory Management in Intel Pentium Processor

Memory Management in Intel Pentium Processor

Presentation Transcript

Chapter 5 Memory

Chapter 5. The Memory System

Chapter 3.1 : Memory Management

Memory Management

Chapter 5 Internal Memory

Chapter 5 Internal Memory

Main Memory

Chapter 7: Memory

The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2)

Memory

Lecture 17 Chapter 8: Main Memory (cont) Chapter 9: Virtual Memory

Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Virtual Memory

Memory

Chapter 8: Main Memory

Lecture 15 Main Memory

Chapter 6: Memory

Chapter 8: Main Memory

Net 321 : Computer Operating System

Chapter 4 Internal Memory

Virtual Memory

Chapter 8: Main Memory