Memory Management, Background and Hardware Support

Memory Management,Background and Hardware Support Fred Kuhns (fredk@arl.wustl.edu, http://www.arl.wustl.edu/~fredk) Applied Research Laboratory Department of Computer Science and Engineering Washington University in St. Louis

Central Processing Unit (CPU) Arithmetic-Logical Unit (ALU) Control Unit Controller Device Controller Device Controller Device Controller Device Primary Memory Recall the Von Neumann Architecture CSE522– Advanced Operating Systems

Primary Memory • Primary Memory Design Requirements • Minimize access time: hardware and software requirement • Maximize available memory: using physical and virtual memory techniques • Cost-effective: limited to a small percentage total • Memory Manager Functions • Allocate memory to processes • Map process address space to allocated memory • Minimize access times while limiting memory requirements CSE522– Advanced Operating Systems

Process Address Space • Compiler produces relocatable object modules • Linker combines modules into an absolute module (loadable module). • addresses are relative, typically starting at 0. • Loader loads program into memory and adjusts addresses to produce an executable module. CSE522– Advanced Operating Systems

Text (shared) Initialized Data Unitialized Data Heap (Dynamic) Environment UNIX Process Address Space Low Address (0x00000000) Process Address space stack (dynamic) High Address (0x7fffffff) CSE522– Advanced Operating Systems

proc struct Stack Stack Stack kernel stack/u area kernel stack/u area kernel stack/u area Data Data Data Text (shared) Text (shared) Text (shared) Big Picture kernel memory CSE522– Advanced Operating Systems

Memory Management • Central Component of any operating system • Memory Partitioning schemes: Fixed, Dynamic, Paging, Segmentation, Combination • Relocation • Hierarchical layering to optimize performance and cost • registers • cache • primary (main) memory • secondary (backing store, local disk) memory • file servers (networked storage) • Policies target expected memory requirements of processes • consider short, medium and long term resource requirements: • long term: admission of new processes (overall system requirements) • medium term: memory allocation (per process requirements) • short term: processor scheduling (immediate needs) • Common goal: optimize number of runnable process resident in memory CSE522– Advanced Operating Systems

Operating System Operating System 8 M 8 M 8 M 2 M process 1 4 M process 1 6 M 8 M Unused 8 M 8 M 8 M 8 M 12 M Equal Sized Partitions Unequal Sized Partitions Fixed Partitioning • Partition memory into regions with fixed boundaries • Equal-size partitions • program size <= partition • program size > partition size, then must use overlays • Use swapping when no available partitions • Unequal-size partitions • Main memory use is inefficient, suffers from Internal Fragmentation CSE522– Advanced Operating Systems

Placement Algorithm with Partitions • Equal-size partitions: • any partition may be used since all are equal in size • balance partition size with expected allocation needs • Unequal-size partitions • assign each process to the smallest partition within which it will fit. Use per partition process queues. • processes are assigned in such a way as to minimize wasted memory within a partition (internal fragmentation) Operating System New Processes … wait for best fit Operating System New Processes … select smallest available CSE522– Advanced Operating Systems

Variable Partitioning • External Fragmentation - small holes in memory between allocated partitions. • Partitions are of variable length and number • Process is allocated exactly as much memory as required • Must use compaction to shift processes so they are contiguous and all free memory is in one block CSE522– Advanced Operating Systems

OS 128K OS 128K Process 1 320K Process 2 224K 896 K Process 3 288K 64 K OS 128K OS Process 1 224k Process 2 64 K Process 4 128K Process 4 96 K 96 K Process 3 Process 3 288K 64 K 64 K Example Dynamic Partitioning Add processes: 1 - 320K 2 - 224K 3 – 288K Add processes: 4 - 128K Swap Processes: 2 – 224K *swap process 2 to make room for process 4. Remove processes: 1 - 320K Swapped Processes: 2 – 224K Relocate Processes: move process 2 into memory freed by process 1 CSE522– Advanced Operating Systems

Variable Partition Placement Algorithm • Best-fit : generally worst performer overall • place in smallest unused block to minimize unused fragment sizes • Worst-fit • place in largest unused block to maximize unused fragment sizes • First-fit : simple and fast • scan from beginning and choose first that is large enough. • may have many process loaded in the front end of memory that must be scanned • Next-fit : tends to perform worse than first-fit • scan memory from the location of the last allocation and select next available block large enough to hold process • tens to allocate at the end of memory where the largest block is found • Use compaction to combine unused blocks into larger continuous blocks. CSE522– Advanced Operating Systems

Variable Partition Placement Algorithm start of memory 8K 8K alloc 16K block 12K First Fit 12K 22K 6K Fragment 6K Last allocated block (14K) Best Fit 18K 2K Fragment 2K 8K 8K 6K 6K Allocated block Free block 14K 14K Next Fit 36K 20K Fragment 20K Before After CSE522– Advanced Operating Systems

Addresses • Logical Address • reference to a memory location independent of the current assignment of data to memory • Relative Address (type of logical address) • address expressed as a location relative to some known point • Physical Address • the absolute address or actual location CSE522– Advanced Operating Systems

Relocation • Fixed partitions: When program loaded absolute memory locations assigned • A process may occupy different partitions over time • swapping and compaction cause a program to occupy different partitions => different absolute memory locations • Dynamic Address Relocation: relative address used with HW support • Special purpose registers are set when process is loaded relocated at run-time • Base register: starting address for the process • relative address is added to base register to produce an absolute address • Bounds register: ending location of the process • Absolute address compared to bounds register, if not within bounds then an interrupt is generated CSE522– Advanced Operating Systems

Hardware Support for Relocation Relative address Process Control Block Base Register Adder Program Absolute address Bounds Register Comparator Data Interrupt to operating system Stack Process image in main memory CSE522– Advanced Operating Systems

Techniques • Paging: • Partition memory into small equal-size chunks • Chunks of memory are called frames • Divide each process into the same size chunks • Chunks of a process are called pages • Operating system maintains a page table for each process • contains the frame location for each process page • memory address = page number + offset • Segmentation: • All segments of all programs do not have to be of the same length • There is a maximum segment length • Addressing consist of two parts - a segment number and an offset • Since segments are not equal, segmentation is similar to dynamic partitioning CSE522– Advanced Operating Systems

Memory Management: Requirements • Relocation • program memory location determined at load time. • program may be moved to different location at run time (relocate). • Consequences: memory references must be translated to actual physical memory address • Protection • Protect against inter-process interference (transparent isolation). • Consequences: Must check addresses at run time when relocation supported. • Sharing • Controlled sharing between processes • Access restrictions may depend on the type of access • Permit sharing of read-only program text for efficiency reasons • Require explicit concurrency protocol for processes to share program data segments. CSE522– Advanced Operating Systems

Memory Hierarchy ~500 Bytes 1 clock cycle CPU Registers Executable Memory < 10MB 1-2 Clock cycles Cache Memory < 1GB 1-4 Clock cycles Primary Memory < 100GB (per device) 5-50 usec Rotating Magnetic Memory Secondary Storage < 15GB (per device) 25 usec – 1 sec Optical Memory < 5GB (per tape) seconds Sequential Accessed Memory CSE522– Advanced Operating Systems

Principle of Locality • Programs tend to cluster memory references for both data and instructions. Further, this clustering changes slowly with time. • Hardware and software exploit principle of locality. • Temporal locality: if location is referenced once, then it is likely to be referenced again in the “near” future. • Spatial locality: if a memory location is referenced then other “nearby” locations will be referenced. • Stride-k (data) reference patterns • visit every kth element of a contiguous vector. • stride-1 reference patterns are very common. for (i = 1, Array[0] = 0; i < N; ++i) Array[i] = calc_element(Array[i-i]); CSE522– Advanced Operating Systems

Caching: A possible Scenario • Copy of web page moved to a file on the client (cached). • Part of the file is copied into primary memory so program can process data (cached) • A cache “line” is copied into cache for program to use • Individual words are copied into CPU registers as they are manipulated by the program Client Host Web Server Disk (files) CPU DRAM (Primary) page.html page.html 4 page.html page.html cache image.jpg image.jpg 2 3 1 CSE522– Advanced Operating Systems

Hardware Requirements • Protection: Prevent process from changing own memory maps • Residency: CPU distinguishes between resident and non-resident pages • Loading: Load pages and restart interrupted program instructions • Dirty: Determine if pages have been modified CSE522– Advanced Operating Systems

Memory Management Unit • Translates Virtual Addresses • Page tables • Translation Lookaside Buffer (TLB) • Page tables • One for kernel addresses • One or more for user space processes • Page Table Entry (PTE) one per virtual page • 32 bits - page frame, protection, valid, modified, referenced CSE522– Advanced Operating Systems

Caching terminology • Cache hit: requested data is found in cache • Cache miss: data not found in cache memory • cold miss: cache is empty • conflict miss: cache line occupied by a different memory location • capacity miss: working set is larger than cache • Placement policy – where new block (i.e. cache line) is placed • Replacement policy – controls which block is selected for eviction • direct mapped : one-to-one mapping between cache lines and memory locations. • fully associative: any line in memory can be cached in any cache line • N-way set associative: A line in memory can be stored in any of N-lines associated with the mapped set. CSE522– Advanced Operating Systems

V V Tag Tag Block Block V V Tag Tag Block Block t-bits s-bits b-bits Address Cache/Primary Memory Structure Memory Address Set Number Cache 0 1 2 Block Set 0 3 ... Set S-1 E = lines per set m = address bits = t + s + b M = 2m, maximum memory address t = m – (s+b) tag bits per line S = 2s, sets in the cache B = 2b, data Bytes per line V = valid bit, 1 per line. May also require a dirty bit. C = cache size = B·E·S = 2s+b·E Block 2n - 1 Word Length The s-bits select the set number while the t-bits (tag) uniquely id the memory location. CSE522– Advanced Operating Systems

Cache Design • Write policy • hit: write-through versus write-back • miss: write-allocate versus no-write-allocate • Replacement algorithm • determines which block to replace (LRU) • Block size • data unit exchanged between cache and main memory CSE522– Advanced Operating Systems

Translation • Virtual address: • virtual page number + offset • Finds PTE for virtual page • Extract physical page and adds offset • Fail (MMU raises an exception - page fault): • bounds error - outside address range • validation error - non-resident page • protection error - not permitted access CSE522– Advanced Operating Systems

Some details • Limit Page Table size: • segments • page the page table (multi-level page table) • MMU has registers which point to the current page table(s) • kernel and MMU can modify page tables and registers • Problem: • Page tables require perhaps multiple memory access per instruction • Solution: • rely on HW caching (virtual address cache) • cache the translations themselves - TLB CSE522– Advanced Operating Systems

Translation Lookaside Buffer • Associative cache of address translations • Entries may contain a tag identifying the process as well as the virtual address. • Why is this important? • MMU typically manages the TLB • Kernel may need to invalidate entries, • Would the kernel ever need to invalidate entries? • Contains page table entries that have been most recently used • Functions same way as a memory cache • Given a virtual address, processor examines the TLB • If present (a hit), the frame number is retrieved and the real address is formed • If not found (a miss), page number is used to index the process page table CSE522– Advanced Operating Systems

Address Translation - General CPU virtual address cache MMU Physical address data Global memory CSE522– Advanced Operating Systems

context table pointer context Address Translation Overview MMU Virtual address CPU physical address cache TLB Page tables CSE522– Advanced Operating Systems

Page Table Entry • Resident bit indicates if page is in memory • Modify bit to indicate if page has been altered since loaded into main memory • Other control bits • frame number, this is the physical frame address. Y bits X bits Virtual address virtual page number offset in page Page Table Entry (PTE) M R control bits frame number Z bits CSE522– Advanced Operating Systems

offset in page virtual page number Example 1-level address Translation Virtual address DRAM Frames 12 bits 20 bits Frame X X offset add PTE control bits R M frame number (Process) Page Table current page table register CSE522– Advanced Operating Systems

SuperSPARC Reference MMU Physical address Physical page offset Context Tbl Ptr register Context Tbl 12 Bits 24 Bits PTD Level 1 Level 2 PTD Level 2 PTD Context register 12 bit PTE 6 bits 8 bits 6 bits 12 bits Virtual address 4096 index 1 index 2 index 3 offset virtual page • 12 bit index for 4096 entries • 8 bit index for 256 entries • 6 bit index for 64 entries • Virtual page number has 20 bits for 1M pages • Physical frame number has 24 bits with a 12 bit offset,permitting 16M frames. CSE522– Advanced Operating Systems

Page Table Descriptor/Entry Page Table Descriptor type Page Table Pointer 2 1 0 Page Table Entry ACC M R C type Phy Page Number 8 7 6 5 4 2 1 0 Type = PTD, PTE, Invalid C - Cacheable M - Modify R - Reference ACC - Access permissions CSE522– Advanced Operating Systems

Page Size • Smaller page size: • Reduce internal fragmentation; • If program uses relatively small segments of memory then small page sizes reflect this behavior • Large page size: • Secondary memory is designed to efficiently transfer large blocks of data • Smaller page size => more pages required per process => larger page tables => larger page tables means large portion of page tables in virtual memory and smaller TLB footprint • Multiple page sizes provide the flexibility needed to effectively use a TLB • Large pages can be used for program instructions; or better for kernel memory thereby decreasing its footprint in the page tables • Most operating system support only one page size CSE522– Advanced Operating Systems

Segmentation • May be unequal, dynamic size • Simplifies handling of growing data structures • Allows programs to be altered and recompiled independently • Lends itself to sharing data among processes • Lends itself to protection • Segment tables: • corresponding segment in main memory • Each entry contains the length of the segment • A bit is needed to determine if segment is already in main memory • Another bit is needed to determine if the segment has been modified since it was loaded in main memory CSE522– Advanced Operating Systems

Segment Table Entries Virtual Address Segment Number Offset Segment table Entry ctrl Length Segment Base Address CSE522– Advanced Operating Systems

Combined Paging and Segmentation • Paging is transparent to the programmer • Paging eliminates external fragmentation • Segmentation is visible to the programmer • Segmentation allows for growing data structures, modularity, and support for sharing and protection • Each segment is broken into fixed-size pages CSE522– Advanced Operating Systems

CSE522– Advanced Operating Systems

Memory Management, Background and Hardware Support

Memory Management, Background and Hardware Support

Presentation Transcript

Memory Management and Virtual Memory

Hardware Support for Efficient Transactional and Supervised Memory Systems

Memory Hardware

IO Memory Management Hardware Goes Mainstream

Software-Hardware Cooperative Power Management Technique for Main Memory

Low memory and hardware constraints

Memory Management: Overlays and Virtual Memory

Hardware Support for Collective Memory Transfers in Stencil Computations

OS Support for Virtualizing Hardware Transactional Memory

Hardware Support Overview

Memory Management and Protection Part 2: The hardware view

Hardware Transactional Memory

Background - hardware

Hardware Support for Dynamic Memory Management

Memory Management and Processor Management

Memory and Disc Drive hardware

Hardware Transactional Memory

Hardware Support and Maintenance Services-24x7live support.

Hardware Support

OS Support for Virtualizing Hardware Transactional Memory

Datacenter hardware support