230 likes | 706 Vues
Chapter 22. Beyond Physical Memory: Policies. Hyunjoon Kim (hjkim@theory.snu.ac.kr) School of Computer Science and Engineering Seoul National University. Introduction. When little memory is free,
E N D
Chapter 22. Beyond Physical Memory: Policies Hyunjoon Kim (hjkim@theory.snu.ac.kr) School of Computer Science and Engineering Seoul National University
Introduction • When little memory is free, • the memory pressure forces the OS to start paging out pages to make room for actively-used pages. • How can the OS decide which page to evict from memory? • This decision is made by the replacement policy of the system. • FIFO (First-In First-Out) policy • Random policy • LRU (Least Recently Used) policy • Viewpoint: main memory can be viewed as a cache for virtual memory pages in the system.
Cache Management 100 times faster than case 1!
The Optimal Replacement Policy Cold-start miss Cold-start miss Cold-start miss Capacity miss
Random Policy Picks a random page Simply picks a random page to replace under memory pressure. Simple to implement.
Random Policy • How Random does depends on the luck of the draw. • Sometimes (just over 40% of the time), Random is as good as optimal (6 hits). • Sometimes it does much worse (2 hits or fewer).
Using History: LRU (Least Recently Used) Policy Replaces the least –recently-used page. Replaces the least–recently-used page. • FIFO and Random might kick out an important page, one that is about to be referenced again. • LRU policy is based on the principles of locality • Spatial locality • Temporal locality • LRU matches optimal in its performance.
Workload Examples • See how each policy behaves over the range of cache sizes • Varies the cache size from very small (1 page) to enough to hold all the unique pages (100 pages) • No-locality workload • 80-20 workload • Looping-sequential workload
The No-Locality Workload • Has no locality • Accesses 100 unique pages over time, choosing the next page to refer to at random: overall,10,000 pages are accessed. • Conclusions • FIFO = RAND = LRU < OPT • When the cache is large enough to fit the entire workload, it also doesn’t matter which policy you use. the hit rate exactly determined by the size of the cache.
The 80-20 Workload • Conclusions • FIFO = RAND < LRU < OPT • If each miss is very costly, even a small increase in hit rate can make a huge difference on AMAT. LRU is more likely to hold onto the hot pages. • Exhibits locality • 80% of the references are made to 20% of the pages (the “hot” pages). • 20% of the references are made to the remaining 80% of the pages (the “cold” pages). • Accesses 100 unique pages
The Looping-Sequential Workload • Conclusions • FIFO = LRU < RAND < OPT • Older pages are going to be accessed sooner than the pages that the policies prefer to keep in cache. Has a nice property (not having weird corner-case behaviors). Refers to 50 pages in sequence 0, 1, … , 49 and repeats this loop, for total 10,000 accesses.
Implementing Historical Algorithms Can we approximate LRU and still obtain the desired behavior? • Perfect LRU • To keep track of which pages have been least- and most-recently used, the system has to do some accounting work on every memory reference. • Such accounting could greatly reduce performance.
Approximating LRU • Clock algorithm • When a replacement must occur, the OS checks if the currently-pointed to page P has a reference bit of 1 or 0. • If 1, the use bit for P set to 0 and the clock hand is incremented to the next page P+1. • Continues until it finds a reference bit that is set to 0. • If 0, replace the current page evict it! • Requires hardware support in the form of a use bit (reference bit) • Whenever a page is referenced, the use bit is set by hardware to 1.
The Behavior of a Clock Algorithm Variant • Conclusions • FIFO, RAND < Clock < LRU < OPT it does better than approaches that don’t consider history. • Randomly scans pages when doing a replacement • When it encounters a page with a reference bit set to 1, it clears the bit • When it finds a page with the reference bit set to 0, it chooses it as a victim.
Considering Dirty pages • The modified clock algorithm • It scans for • Pages that are both unused and clean to evict first • Failing to find those, then unused dirty pages • The hardware should include a dirty bit. evict it! • Additional consideration • If a page has been modified (dirty), it must be written back to disk to evict it. → expensive! • If it has not been modified (clean), the eviction is free.
When does the OS bring a page into memory? • Sometimes called the page selection policy. • For most pages, the OS simply use demand paging. • The OS brings the page into memory when it is accessed, “on demand” as it were. • The OS could guess that a page is about to be used, and thus bring it in ahead of time: pre-paging (or prefetching) • If a code page P is brought into memory, a code page P+1 will likely soon be accessed and thus should be brought into memory too.
How does the OS write page out to disk? • Collects a number of pending writes together in memory and write them to disk in one write: clustering or grouping of writes. • Effective because of the nature of disk drives, which perform a single large write more efficiently than many small ones.
Thrashing Source: http://codex.cs.yale.edu/avi/os-book/OS9/slide-dir/ (retrieved on 2015/05/10) • When the memory demands of the set of running processes exceeds the available physical memory, • The system will constantly be paging: “thrashing”.
Thrashing • Solution: Given a set of processes, the reduced set of processes working sets (the pages that they are using actively) fit in memory and thus can make progress. • Admission control: better to do less work well than to try to do everything at once poorly • In this case, Linux runs an out-of-memory killer. • This daemon chooses a memory-intensive process and kills it • Reduces memory in a none-too-subtle manner.
Summary • By the replacement policy, the OS decides which page to evict from memory. • Goal in picking a replacement policy • To minimize the number of cache misses • Among FIFO, Random, and LRU, LRU generally does better to minimize the number of cache misses. • There is the principle of locality that programs tend to exhibit. • It is expensive to implement perfect LRU: approximating LRU. • Hardware support is required in the form of a reference bit (use bit). • Clock algorithm • Also can consider dirty pages • Other virtual memory policies • Demand paging, pre-paging • Clustering of writes