1 / 25

Windows 7 Memory Management

Windows 7 Memory Management. Landy Wang Distinguished Engineer Microsoft Corporation. Topics. Working set management Fine grained page locking Security NUMA Non volatile (flash) memory Handling of contiguous/large page memory requests High end servers Footprint and performance.

wynonna
Télécharger la présentation

Windows 7 Memory Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Windows 7 Memory Management Landy Wang Distinguished Engineer Microsoft Corporation

  2. Topics • Working set management • Fine grained page locking • Security • NUMA • Non volatile (flash) memory • Handling of contiguous/large page memory requests • High end servers • Footprint and performance

  3. Working Set Background • Optimal usage of system memory - a constant area of investment ! • Working set : Comprises all the potentially trimmable virtual addresses for a given process, session or system resource. • Resources like nonpaged pool, kernel stacks, large pages & AWE regions are excluded (because they are not trimmable). • Working sets provide an efficient way for the system to make memory available under pressure ... but maintaining them is not free and care must be exercised during trim candidate selection ... and the subsequent writing of those pages ! • Trimmed pages go to the standby (clean), modified or zero page lists. The modified/mapped writer threads write them in a timely fashion.

  4. Working Set Aging/Trimming • Working sets are periodically aged to improve trim decisions • Which sets and which virtual addresses to trim ? • How much to trim ? • Memory events so applications can (optionally) participate...

  5. Working Set General Policies • When memory is low, how are working sets managed equitably and efficiently so optimal usage is achieved ? • Working sets are ordered based on their age distribution. • Trim goal is set higher to avoid subsequent additional trimming. • After goal is met, other sets continue to be trimmed – but just for their very old pages. This provides fairness so one process doesn’t surrender pages and the others do not. • Up to 4 passes may be performed, later passes consider higher percentages of each working set and lower ages (more recently accessed) as well. • When trimming occurs, all sets are also aged so future trims will have optimal (and fair !) candidates.

  6. Working Set Improvements • Expansion to 8 aging values (up from 4) • Keep exact age distribution counts instead of estimates • Force self-aging and trimming during rapid expansion • Don’t skip processes due to lock contention and ensure fair aging by removing pass limits • Don’t ravage small sets since subsequent hard faults penalize all sets • Separation of the system cache working set into 3 distinct working sets (system cache, paged pool and driver images) to prevent individual expansion from trimming the others • Factor in standby list repurposing when making age/trim decisions • Improved inpage clustering of system addresses • Result : Doubling of performance in memory constrained systems !

  7. Task Manager’s Main Screen

  8. Task Manager Working Set Display

  9. PFN Lock Background • The PFN (page frame number) array is a virtually contiguous (but can be physically sparse) data structure where each PFN entry describes the state of a physical page of memory. • Information includes : • - State (zero, free, standby, modified, modifiednowrite, bad, active, etc) • - How many page table entries are mapping it • - How many I/Os are currently in progress • - The containing frame/PTE • - The PTE value to restore when the page leaves its last working set or is repurposed • - NUMA node • - etc Size is critical ... And how to best manage the information ?

  10. PFN Lock : The Problem • The huge majority of all virtual memory operations were synchronized via a single system-wide PFN lock. Thus even seemingly unrelated operations by threads, even those in different processes, would contend for and serialize at this lock, potentially causing significant performance degradation/spikes. • Larger numbers of processors and memory sizes intensify the lock pressure. For example, prior to this change SQL Server had an 88% PFN lock contention rate on systems with 128 processors. • Applications and device drivers seeking higher performance faced significant complexity at best : AWE, large pages, or even complete algorithmic redesigns.

  11. PFN Lock : The Scope • All page allocation, deallocation, access and state manipulation • All prefetching, prioritization, access logging and page identification • All page list manipulation (zero, free, standby, modified, modnowrite, bad) • All pagefile space allocation/deletion, adding/expansion/contraction • Page fault management, trimming/theft/replacement, mapped/modified writing, flushing, purging • All control area, segments, subsections and prototype PTE usage • Virtual address space deletion/decommit, protection changing, trimming, large pages, etc • Process/session creation, duplication, inswap/outswap, deletion • Kernel stack creation/deletion, inswap/outswap, stealing • System cache view mapping/unmapping/readahead, protection • Image validation, ASLR dynamic relocations • MDL probing/unlocking • Driver loading, unloading, paging • User event signaling (low memory, high memory, etc) • Dynamic addition/removal of memory plus mirroring/hibernate/resume • Dynamic kernel virtual address space allocation/deletion/initialization

  12. PFN Lock : The Answer • In Windows 7, the systemwide PFN lock was replaced with fine-grained locking on an individual page basis. • This completely eliminated this bottleneck, resulting in much higher scalability. For example, the Usenixmemclonemicrobenchmark is now 15x faster than Windows Server 2008 on 32 processor configurations. • Fully compatible (on a binary and source level) so all software benefits without any changes. Developers don’t need to resort to complex workarounds to achieve highest performance !

  13. PFN Lock Replacement Hierarchy • Pool locks • System VA lock • Working set expansion list lock • Individual per-page locks • Access logging lock • Page list (free per color, zero per color, standby per priority, modified filesystem/pagefiledestined, bad) locks • Per-pagefile space lock • Memory event signaling lock • Per-control area lock • Dynamic relocation VA (ASLR) assignment lock • Segment list lock • Section object pointers lock

  14. Security : ASLR Background Image header ExecutableLoad Address+/- 16MB Randomly Chosen ExecutableLoad Address Executable DLL Loading DLLs Randomly ChosenImage-Load Bias Up to 16MB Kernel Mode

  15. Security : ASLR Background • Images relocated dynamically when each image section is created. • When combined with NX, makes life difficult for hackers ! • Compresses VA space to reduce page table page cost as well as provide a larger contiguous VA range for applications. • Introduced in Vista, applications (for compatibility) must opt in via /DYNAMICBASE.

  16. Security : ASLR Improvements • Driver randomization increased to 64 possible load addresses for 32-bit drivers and 256 for 64-bit drivers, up from 16 for both. • Kernel, HAL and session drivers relocated post-Vista RTM. Large session drivers (win32k.sys for example) are also now relocated. • Extra effort is also made to relocate user space images even when system VA space is tight/fragmented by temporarily using the user address space of the system process. • The memory cost of ASLR has also been reduced by adding 2x compression for in-memory image relocation tables, which saves at least 11MB of pagablememory on every system. • Allow execute revocation (for NX-optin on the fly) post-Vista RTM.

  17. NUMA • NUMA is the approach preferred by hardware designers to achieve optimum performance. • Typical far node cost : clients 1.3-1.7x, servers 1.1x-3x+ ! • Windows 7 adds support for 64 NUMA nodes (up from 16). • Node graph construction so optimal allocations can always be performed automatically without drivers/apps doing heavy lifting. • Apps can specify node preference on allocation/view/control area/thread/process boundaries. • Automatic page migration performed by the system !

  18. Integrated NVRAM Support • NVRAM : • - Built directly into motherboards • - In solid state drives • - In USB sticks • - As a replacement for main memory • Windows 7 delivers tight and efficient integration of NVRAM support directly into the core memory management system. • Eliminates numerous filter driver drawbacks … some examples : • The same disk page can be in memory, in a ReadyBoost cache and pinned in a ReadyDrive disk all at the same time with each component unaware of the others. • Pagefile-backed pages can be consuming space in both ReadyBoost and ReadyDrive caches even though the application (and memory management) had deleted them long ago.

  19. Contiguous/Large Page Memory • Significant redesign post Vista RTM to obtain memory efficiently without trimming, issuing I/O or inserting fault delays. In memory pages are swapped in place. Efficient scanning including range skipping during preliminary pass ensures much higher yield results. • Applications (ie, databases) allocating large page regions. • Hypervisors allocating memory for guest VMs. • Device drivers making contiguous memory or MmAllocatePagesForMdl* calls. • Result : Reductions in allocation times can be several orders of magnitude ! Callers no longer run the risk of disrupting the entire system !

  20. High-End System Support • Initial 64 bit nonpagedpool maximum bumped from 40% to 75%. • Reclaim initial nonpaged pool (up to a 3% RAM boost !). • Boot time reductions by not depleting executive worker queues for page zeroing (at the expense of boot time forward progress). • TLB flush reductions (especially valuable for virtualization). • Cache management improvements to avoid flushing/overflushing. • Enterprise clustered filesystem support APIs added. • Software mirroring for major OEMs (also in WS2008). • Avoid issuing modified page writes until absolutely necessary post Vista RTM.

  21. Footprint Analysis • Vista SP1 memory management code is ~460KB (25% is pagable or INIT). Windows 7 total code growth is 8KB ! • Static data reduced from 41KB to 38KB. • Multiplicative data structures … even more important ! Significant effort into saving at this level – relocation tables are one example. • Locality of reference improvements – for speed, false sharing elimination and footprintpurposes.

  22. Focus Areas • Footprint / locality of reference • Memory and I/O prioritization and efficiency • Parallelism • Scalability • Security • Power consumption • New technologies - hardware and software

  23. Questions ?

More Related