Token-ordered LRU an Effective Policy to Alleviate Thrashing ECE7995 Presentation Presented by Xuechen Zhang, Pei Yan
Outline • Introduction • Challenges • Algorithm Design • Performance Evaluation • Related work • Conclusions
Memory Management for Multiprogramming • In a multiprogramming environment, multiple concurrent processes share the same physical memory space to support their virtual memory. • These processes compete for the limited memory to establish their respective working sets. • Page replacement policy is responsible to coordinate the competition by dynamically allocating memory pages among the processes. • Global LRU replacement policy is usually used to manage the aggregate process memory spaces as a whole.
Thrashing • Definition • In virtual memory systems, thrashing may be caused by programs that present insufficient locality of reference: if the working set of a program cannot be effectively held within physical memory, then constant data swapping may cause thrashing. • Problems • No program is able to establish its working set • Cause large page faults • Low CPU utilization • Execution of each program practically stops Questions? How thrashing develops in the kernel? How to deal with thrashing?
How thrashing develops in the kernel? • Proc1 paging paging paging • Proc2 paging Physical memory CPU IDLE Memory demand
The performance degradation under thrashing Memory Page Dedicated Executions The time of first spike is extended by 70 times Process Execution • Memory shortage 42%. The time of a start of vortex is extended by 15 times Concurrent Executions
Insights into Thrashing • A page frame of a process becomes a replacement candidate in the LRU algorithm if the page has not been used for a certain period of time. • There are two conditions under which a page is not accessed by its owner process: • the process does not need to access the page; • the process is conducting page faults (sleeping) so that it is not able to access the page although it might have done so without the page faults. • We call the LRU pages generated on the first condition true LRU pages, and those on the second condition false LRU pages. • These false LRU pages are produced by the time delay of page faults, not by the access delay of the program. • The LRU principle is not maintained. (Temporal Locality is not applicable!) • The amount false LRU pages is a status indicator: seriously thrashing. However, LRU page replacement implementations do not discriminate between these two types of LRU pages, and treats them equally!
Challenges • How to distinguish two kinds of page faults? • How to implement a lightweight thrashing prevention mechanism?
Algorithm Design • Why token? Token Processes I/O False LRU Page Faults Processes I/O True LRU Page Faults
Algorithm Design The basic idea: • Set a token in the system. • The token is taken by one of the processes when page faults occur. • The system eliminates the false LRU pages from the process holding the token to allow it to quickly establish its working set. • The token process is expected to complete its execution and release its allocated memory as well as its token. • Other processes then compete for the token and complete their runs in turn. • By transferring privilege among processes in thrashing from one to another, the system can reduce the total number of false LRU pages and to transform the chaotic order of page usages to an arranged order. • The policy can be designed to allow token transferred more intelligently among processes to address issues such as fairness and starvation.
Considerations about the Token-ordered LRU • Which process to receive the token? • A process whose memory shortage is urgent. • How long does a process hold the token? • It should be adjustable based on the seriousness of the thrashing. • What happens if thrashing is too serious? • It becomes a load control mechanism by setting a long token time so that each program has to be executed one by one. • Can Multi-tokens be effective for thrashing? • The token and its variations have been implemented in the Linux kernel.
Performance evaluation • Experiment environment • System environment • Benchmark programs • Performance metrics
Performance evaluation (Cont'd) • System environment
Performance Evaluation (Cont'd) • Benchmark programs
Performance evaluation (Cont'd) • Status quanta • MAD (Memory allocation demand) • The total amount of requested memory space reflected in the page table of a program in pages • RSS (Resident set size) • The total amount of physical memory used by a program in pages • NPF (Number of page faults) • The number of page faults of a program • NAP (Number of accessed pages) • The number of accessed pages by a program within the time interval of 1s
Performance evaluation (Cont'd) • Slowdown • Ratio between the execution time of an interacting program and its execution time in a dedicated environment without major page faults. • Measure the performance degradation of an interacting program
Performance Evaluation (Cont'd) • Quickly acquiring memory allotments • Apsi, bit-r, gzip and m-m
Performance Evaluation (Cont'd) • Gradually acquiring memory allotments • Mcf, m-sort, r-wing and vortex
Performance Evaluation (Cont'd) Non-regularly changing memory allotments • Gcc and LU
Performance evaluation (Cont'd) • Interaction group (gzip&vortex3) Without token Token At the execution time of 250th second, the token was taken by votex.
Performance evaluation (Cont'd) • Interaction group (bit-r&gcc) Without token Token At the execution time of 146th second, the token was taken by gcc.
Performance evaluation (Cont'd) • Interaction group (vortex3&gcc) Without token Token At the execution time of 397th second, the token was taken by gcc.
Performance evaluation (Cont'd) • Interaction groups (vortex1&vortex3) Without token Token At the execution time of 433th second, the token was taken by Vortex1.
Related work • Local page replacement • The paging system select victim pages for a process only from its fixed size memory space • Pros • No interference among multi-programs. • Cons • Could not adapt dynamic memory changes of programs. • Under-utilization of memory space
Related work (Cont'd) • Working set model • A working set (WS) of a program is a set of its recently used pages, which is a subset of all pages of the program which has been referenced in the previous time units (Working set window). • Pros • Theoretically eliminate thrashing caused by chaotic memory competition. • Cons • The implementation of this model is extremely expensive since it needs to track WS of each process.
Related work (Cont'd) • Load control • Adjust the memory demands from multiple processes by changing the multiprogramming level. (Suspend/Active) • Pros • Solve page thrashing completely. • Cons • Act in a brutal force manner. • Cause other synchronous processes severely delayed. • Difficult to determine right time to do so. • Expensive to rebuild the working set.
Conclusions • Distinguish between true LRU fault and false LRU fault. • Design token-ordered LRU algorithm to eliminate false LRU faults. • Experiments show that the algorithm effectively alleviates thrashing.