Competing For Memory

Competing For Memory Vivek Pai / Kai Li Princeton University

Mechanics • Feedback optionally anonymous • No real retribution anyway • Do it to make me happy • Quiz 1 Question 2 answer(s) • #regs != #bits • Registers at top of memory hierarchy • Lots of acceptable answers • Last Quiz, Feedback still being digested

The Big Picture • We’ve talked about single evictions • Most computers are multiprogrammed • Single eviction decision still needed • New concern – allocating resources • How to be “fair enough” and achieve good overall throughput • This is a competitive world – local and global resource allocation decisions

Lessons From Enhanced FIFO • Observations • it’s easier to evict a clean page than a dirty page • sometimes the disk and CPU are idle • Optimization: when system’s “free”, write dirty pages back to disk, but don’t evict • Called flushing – often falls to pager daemon

Valid Writable Owner (user/kernel) Write-through Cache disabled Accessed (referenced) Dirty PDE maps 4MB Global x86 Page Table Entry Page frame number U P Cw Gl L D A Cd Wt O W V 12 31 Reserved

Program Behaviors • 80/20 rule • > 80% memory references are made by < 20% of code • Locality • Spatial and temporal • Working set • Keep a set of pages in memory would avoid a lot of page faults Working set # page faults # pages in memory

Observations re Working Set • Working set isn’t static • There often isn’t a single “working set” • Multiple plateaus in previous curve • Program coding style affects working set • Working set is hard to gauge • What’s the working set of an interactive program?

Working Set • Main idea • Keep the working set in memory • An algorithm • On a page fault, scan through all pages of the process • If the reference bit is 1, record the current time for the page • If the reference bit is 0, check the “last use time” • If the page has not been used within d, replace the page • Otherwise, go to the next • Add the faulting page to the working set

WSClock Paging Algorithm • Follow the clock hand • If the reference bit is 1, set reference bit to 0, set the current time for the page and go to the next • If the reference bit is 0, check “last use time” • If page has been used within d, go to the next • If page hasn’t been used within d and modify bit is 1 • Schedule the page for page out and go to the next • If page hasn’t been used within d and modified bit is 0 • Replace this page

Simulating Modify Bit with Access Bits • Set pages read-only if they are read-write • Use a reserved bit to remember if the page is really read-only • On a read fault • If it is not really read-only, then record a modify in the data structure and change it to read-write • Restart the instruction

Implementing LRU without Reference Bit • Some machines have no reference bit • VAX, for example • Use the valid bit or access bit to simulate • Invalidate all valid bits (even they are valid) • Use a reserved bit to remember if a page is really valid • On a page fault • If it is a valid reference, set the valid bit and place the page in the LRU list • If it is a invalid reference, do the page replacement • Restart the faulting instruction

Demand Paging • Pure demand paging relies only on faults to bring in pages • Problems? • Possibly lots of faults at startup • Ignores spatial locality • Remedies • Loading groups of pages per fault • Prefetching/preloading • So why use it?

Speed and Sluggishness • Slow is > .1 seconds (100 ms) • Speedy is << .1 seconds • Monitors tend to be 60+ Hz = <16.7ms between screen paints • Disks have seek + rotational delay • Seek is somewhere between 7-16 ms • At 7200rpm, one rotation = 1/120 sec = 8ms. Half-rotation is 4ms • Conclusion? One disk access OK, six are bad

Memory Pressure “Swap” space • Region of disk used to hold “overflow” • Contains only data pages (stack/heap/globals). Why? • Swap may exist as “regular file,” but dedicated region of disk more common

Disk Address • Use physical memory as a cache for disk • Where to find a page on a page fault? • PPage# field is a disk address • Observation: OS knows that pages are real but not in memory Virtual address space Physical memory invalid

Imagine a Global LRU • Global – across all processes • Idea – when a page is needed, pick the oldest page in the system • Problems? Process mixes? • Interactive processes • Active large-memory sweep processes • Mitigating damage?

Source of Disk Access • VM System • Main memory caches - full image on disk • Filesystem • Even here, caching very useful • New competitive pressure/decisions • How do we allocate memory to these two? • How do we know we’re right?

Partitioning Memory • Originally, specified by administrator • 20% used as filesystem cache by default • On fileservers, admin would set to 80% • Each subsystem owned pages, replaced them • Observation: they’re all basically pages • Why not let them compete? • Result: unified memory systems – file/VM

File Access Efficiency • read(fd, buf, size) • Buffer in process’s memory • Data exists in two places – filesystem cache & process’s memory • Known as “double buffering” • Various scenarios • Many processes read same file • Process wants only parts of a file, but doesn’t know which parts in advance

Result: Memory-Mapped Files Process A Process B Process C Process A Process B Process C File Map File Map File Map File

Lazy Versus Eager • Eager: do things right away • read(fd, buf, size) – returns # bytes read • Bytes must be read before read completes • What happens if size is big? • Lazy: do them as they’re needed • mmap(…) – returns pointer to mapping • Mapping must exist before mmap completes • When/how are bytes read? • What happens if size is big?

Semantics: How Things Behave • What happens when • Two process obtain data (read or mmap) • One process modifies data • Two processes obtain data (read or mmap) • A third process modifies data • The two processes access the data

Being Too Smart… • Assume a unified VM/File scheme • You’ve implemented perfect Global LRU • What happens on a filesystem “dump”?

Amdahl’s Law • Gene Amdahl (IBM, then Amdahl) • Noticed the bottlenecks to speedup • Assume speedup affects one component • New time = (1-not affected) + affected/speedup • In other words, diminishing returns

NT x86 Virtual Address Space Layouts 00000000 Application code Globals Per-thread stacks DLL code 3-GB user space 7FFFFFFF 80000000 Kernel & exec HAL Boot drivers C0000000 C0800000 Process page tables Hyperspace BFFFFFFF C0000000 System cache Paged pool Nonpaged pool 1-GB system space FFFFFFFF FFFFFFFF

Virtual Address Space in Win95 and Win98 00000000 User accessible Unique per process (per application), user mode 7FFFFFFF 80000000 Shared, process-writable (DLLs, shared memory, Win16 applications) Systemwide user mode C0000000 Win95 and Win98 Systemwide kernel mode Operating system (Ring 0 components) FFFFFFFF

Details with VM Management • Create a process’s virtual address space • Allocate page table entries (reserve in NT) • Allocate backing store space (commit in NT) • Put related info into PCB • Destroy a virtual address space • Deallocate all disk pages (decommit in NT) • Deallocate all page table entries (release in NT) • Deallocate all page frames

Page States (NT) • Active: Part of a working set and a PTE points to it • Transition: I/O in progress (not in any working sets) • Standby: Was in a working set, but removed. A PTE points to it, not modified and invalid. • Modified: Was in a working set, but removed. A PTE points to it, modified and invalid. • Modified no write: Same as modified but no write back • Free: Free with non-zero content • Zeroed: Free with zero content • Bad: hardware errors

Dynamics in NT VM Demand zero fault Page in or allocation Standby list Free list Zero list Bad list Process working set Modified writer Zero thread “Soft” faults Modified list Working set replacement

Shared Memory • How to destroy a virtual address space? • Link all PTEs • Reference count • How to swap out/in? • Link all PTEs • Operation on all entries • How to pin/unpin? • Link all PTEs • Reference count w . . . . . . Page table . . . Process 1 w Physical pages . . . . . . Page table Process 2

Child’s virtual address space uses the same page mapping as parent’s Make all pages read-only Make child process ready On a read, nothing happens On a write, generates an access fault map to a new page frame copy the page over restart the instruction Copy-On-Write r r . . . . . . Page table . . . Parent process r r Physical pages . . . . . . Page table Child process

Issues of Copy-On-Write • How to destroy an address space • Same as shared memory case? • How to swap in/out? • Same as shared memory • How to pin/unpin • Same as shared memory

Competing For Memory

Competing For Memory

Presentation Transcript

Competing For Advantage

Competing For Advantage

Competing For Advantage

Competing For Advantage

Competing For Advantage

Competing For Advantage

Competing For Advantage

Competing For Advantage

Competing for Advantage

Competing for Advantage

Competing for Advantage

Competing For Advantage

Competing for Advantage

Competing For Advantage

Competing For Advantage

Competing for Advantage

Competing For Advantage

Competing For Advantage

Competing For Advantage

Competing for Advantage

Competing For Advantage

Competing for Advantage