1 / 77

SE 292: High Performance Computing Memory Organization and Process Management

SE 292: High Performance Computing Memory Organization and Process Management. Sathish Vadhiyar. Memory Image of a Process – Structure of a process in main memory. A process consists of different sections (shown in the figure) in main memory A process also includes current activity

senona
Télécharger la présentation

SE 292: High Performance Computing Memory Organization and Process Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SE 292: High Performance ComputingMemory Organization and Process Management Sathish Vadhiyar

  2. Memory Image of a Process – Structure of a process in main memory • A process consists of different sections (shown in the figure) in main memory • A process also includes current activity • Program counter • Process registers Stack libc.so Data Memory mapped region for shared libraries Text Heap a.out Uninitialized Initialized Data Data Text Text Code

  3. Memory Image • Data section contains global variables (initialized & unitialized) • The memory image also consists of shared libraries used by a program (more later)

  4. Memory ImageStack • Stack contains temporary data • Holds function parameters, return addresses, local variables • Expands and contracts by function calls and returns, respectively • Addresses calculated and used by compiler, relative to the top of stack, or some other base register associated with the stack • Growth of stack area is thus managed by the program, as generated by the compiler

  5. Memory ImageHeap • Heap is dynamically allocated • Expands and contracts by malloc & free respectively • Managed by a memory allocation library

  6. Memory Abstraction • Above scheme requires entire process to be in memory • What if process space is larger than physical memory? • Virtual memory • Provides an abstraction to physical memory • Allows execution of processes not completely in memory • Offers many other features

  7. Virtual Memory Concepts • Instructions must be in physical memory to be executed • But not all of the program need to be in physical memory • Each program could take less memory; more programs can reside simultaneously in physical memory • Need for memory protection • One program should not be able to access the variables of the other • Typically done through address translation (more later)

  8. Virtual Memory Concepts – Large size • Virtual memory can be extremely large when compared to a smaller physical memory page 0 page 1 page 2 page 2 page 2 Memory Map Physical Memory page v Virtual Memory

  9. Virtual Memory Concepts - Sharing • Virtual memory allows programs to share pages and libraries stack stack shared library shared library heap heap data Memory Map data code code

  10. Address Translation • Terminology – Virtual and Physical Addresses • Virtual memory addresses translated to physical memory addresses • Memory Management Unit (MMU) provides the mapping • MMU – a dedicated hardware in the CPU chips • Uses a lookup table (see below) • To translate a virtual address to the corresponding physical address, a table of translation information is needed

  11. Address Translation • Minimize size of table by not managing translations on byte basis but larger granularity • Page: fixed size unit of memory (contiguous memory locations) for which a single piece of translation information is maintained • The resulting table is called page table • The contents of the page table managed by OS • Thus, (address translation logic in MMU + OS + page table) used for address translation

  12. Address Translation PROGRAM/PROCESS MAIN MEMORY Virtual Address Space Physical Address Space 0x00000000 0x00000000 0x000000 Text 0x000024: 256 Bytes 256 Bytes Virtual Page 0 Physical Page 0 0x000000FF 0x000000FF 0x0000FF 0x00000100 0x00000100 Data 0x00000124: 256 Bytes Virtual Page 1 … 0x000001FF 0x000001FF 0x00000200 0x00000200 Heap … … 0xFFFFFF Virtual Page 0xFFFFFF Stack 0xFFFFFFFF 0xFFFFFFFF

  13. Address Translation VPN PPN Virtual address Physical address Offset Offset Physical Page Number (PPN) Virtual page number (VPN) Translation table PAGE TABLE

  14. Size of Page Table • Big Problem: Page Table size • Example: 32b virtual address, 16KB page size • 256K virtual pages => MB page table size per process • Has to be stored in memory • Solution – Multi-level page table (more later)

  15. Address Translation Virtual Address (n-bit) Page Table Base Register (PTBR) n-1 p p-1 0 Virtual Page Number (VPN) Virtual Page Offset (VPO) Valid Physical Page Number (PPN) Page Table m-1 p p-1 0 Physical Page Number (PPN) Physical Page Offset (PPO) Physical Address (m-bit) Fig 10.13 (Bryant)

  16. What’s happening… Page Tables Disk Main Memory 1 1 P1 - 2 - 3 - 1 Processes 4 - 1 P1 P2 Pn - P2 2 2 3 4 1 3 - … 4 … 2 4 3 1 3 Virtual page contents 4 - 2 Pn 3 2 - 4

  17. Demand Paging • When a program refers a page, the page table is looked up to obtain the corresponding physical frame (page) • If the physical frame is not present, page fault occurs • The page is then brought from the disk • Thus pages are swapped in from the disk to the physical memory only when needed – demand paging • To support this, the page table has valid and invalid bits • Valid – present in memory • Invalid – present in disk

  18. Page Fault • Situation where virtual address generated by processor is not available in main memory • Detected on attempt to translate address • Page Table entry is invalid • Must be `handled’ by operating system • Identify slot in main memory to be used • Get page contents from secondary memory • Part of disk can be used for this purpose • Update page table entry • Data can now be provided to the processor

  19. Demand Paging (Valid-Invalid Bit) Fig. 9.5 (Silberschatz)

  20. Page Fault Handling Steps Fig. 9.6 (Silberschatz)

  21. Page Fault Handling – a different perspective 4 Exception Page fault exception handler CPU chip 2 Cache/ Memory Disk Victim page PTEA MMU 5 1 Processor PTE VA 7 3 New page 6 Fig. 10.14 (Bryant)

  22. Page Fault Handling Steps • Check page table to see if reference is valid or invalid • If invalid, a trap into the operating system • Find free frame, locate the page on the disk • Swap in the page from the disk • Replace an existing page (page replacement) • Modify page table • Restart instruction

  23. Performance Impact due to Page Faults • p – probability of page fault • ma – memory access time • effective access time (EAT) – (1-p) x ma + p x page fault time • Typically EAT = (1-p) x (200 nanoseconds) + p(8 milliseconds) = 200 + 7,999,800 x p nanoseconds • Dominated by page fault time

  24. So, What happens during page fault? • Trap to OS, Save registers and process state, Determine location of page on disk • Issue read from the disk • major time consuming step – 3 milliseconds latency, 5 milliseconds seek • Receive interrupt from the disk subsystem, Save registers of other program • Restore registers and process state of this program • To keep EAT increase due to page faults very small (< 10%), only 1 in few hundred thousand memory access should page fault • i.e., most of the memory references should be to pages in memory • But how do we manage this? • Luckily, locality • And, smart page replacement policies that can make use of locality

  25. Page Replacement Policies • Question: How does the page fault handler decide which main memory page to replace when there is a page fault? • Principle of Locality of Reference • A commonly seen program property • If memory address A is referenced at time t, then it and its neigbhouring memory locations are likely to be referenced in the near future • Suggests that a Least Recently Used (LRU) replacement policy would be advantageous temporal spatial

  26. Based on your experience, why do you expect that programs will display locality of reference? Locality of Reference Same address (temporal) Neighbours (spatial) Loop Function Sequential code Loop Program Local Loop index Stepping through array Data

  27. Page Replacement Policies • FIFO – performance depends on if the initial pages are actively used • Optimal page replacement – replace the page that will not be used for the longest time • Difficult to implement • Some ideas? • LRU • Least Recently Used is most commonly used • Implemented using counters • LRU might be too expensive to implement

  28. Page Replacement Algorithms - Alternatives • LRU Approximation – Second-Chance or Clock algorithm • Similar to FIFO, but when a page’s reference bit is 1, the page is given a second chance • Implemented using a circular queue • Counting-Based Page Replacement • LFU, MFU • Performance of page replacement depends on applications • Section 9.4.8

  29. Managing size of page table – TLB, multi-level tables TLB • Translation Lookaside Buffer – a small and fast memory for storing page table entries • MMU need not fetch PTE from memory every time • TLB is a virtually addressed cache 2 VPN PTE 3 Translation Cache/ Memory 1 4 Processor VA PA Data n-1 p+t p+t-1 p p-1 0 TLB tag (TLBT) TLB index (TLBI) VPO

  30. Speeding up address translation • Translation Lookaside Buffer (TLB): memory in MMU that contains some page table entries that are likely to be needed soon • TLB Miss: required entry not available

  31. Multi Level Page Tables Level 1 page table Level 2 page tables • A page table can occupy significant amount of memory • Multi level page tables to reduce page table size • If a PTE in level 1 PT is null, the corresponding level 2 PT does not exist • Only level-1 PT needs to be in memory. Other PTs can be swapped in on-demand. PTE 0 PTE 1 PTE 2 null PTE 3 null PTE 4 null PTE 5 null PTE 6 … …

  32. Multi Level Page Tables (In general..) Virtual Address n-1 p-1 0 VPN 1 VPN 2 … VPN k VPO • Each VPN i is an index into a PT at level i • Each PTE in a level j PT points to the base of some PT at level (j+1) fig. 10.19 (Bryant) PPN m-1 p-1 0 PPN PPO Physical Address

  33. Virtual Memory and Fork • During fork, a parent process creates a child process • Initially the pages of the parent process is shared with the child process • But the pages are marked as copy-on-write. • When a parent or child process writes to a page, a copy of a page is created.

  34. Dynamic Memory Allocation • Allocator – Manages the heap region of the VM • Maintains a list of “free” in memory and allocates these blocks when a process requests it • Can be explicit (malloc & free) or implicit (garbage collection) • Allocator’s twin goals • Maximizing throughput and maximizing utilization • Contradictory goals – why?

  35. Components of Memory Allocation • The heap is organized as a free list • When an allocation request is made, the list is searched for a free block • Search method or placement policy • First fit, best fit, worst fit • Coalescing – merging adjacent free blocks • Can be immediate or deferred • Garbage collection – to automatically free allocated blocks no longer needed by the program

  36. Fragmentation • Cause of poor heap utilization • An unused memory is not available to satisfy allocate requests • Two types • Internal – when an allocated block is larger than the required payload • External – when there is enough memory to satisfy a free request, but no single free block is large enough

  37. Process Management

  38. Computer Organization: Software • Hardware resources of computer system are shared by programs in execution • Operating System: special program that manages this sharing • Ease-of-use, resource allocator, device controllers • Process: a program in execution • ps tells you the current status of processes • Shell: a command interpreter through which you interact with the computer system • csh, bash,…

  39. Operating System, Processes, Hardware Processes System Calls OS Kernel Hardware

  40. Operating System Software that manages the resources of a computer system • CPU time • Main memory • I/O devices • OS functionalities • Process management • Memory management • Storage management

  41. Process Lifetime • Two modes • User – when executing on behalf of user application • Kernel mode – when user application requests some OS service, some privileged instructions • Implemented using mode bits Silberschatz – figure 1.10

  42. Modes • Can find out the total CPU time used by a process, as well as CPU time in user mode, CPU time in system mode

  43. Shell - What does a Shell do? while (true){ • Prompt the user to type in a command • Read in the command • Understand what the command is asking for • Get the command executed • } • Shell – command interpreter • Shell interacts with the user and invokes system call • Its functionality is to obtain and execute next user command • Most of the commands deal with file operations – copy, list, execute, delete etc. • It loads the commands in the memory and executes write read fork, exec Q: What system calls are involved?

  44. System Calls • How a process gets the operating system to do something for it; an interface or API for interaction with the operating system • Examples • File manipulation: open, close, read, write,… • Process management: fork, exec, exit,… • Memory management: sbrk,… • device manipulation – ioctl, read, write • information maintenance – date, getpid • communications – pipe, shmget, mmap • protection – chmod, chown • When a process is executing in a system call, it is actually executing Operating System code • System calls allow transition between modes

  45. Mechanics of System Calls • Process must be allowed to do sensitive operations while it is executing system call • Requires hardware support • Processor hardware is designed to operate in at least 2 modes of execution • Ordinary, user mode • Privileged, system mode • System call entered using a special machine instruction (e.g. MIPS 1 syscall) that switches processor mode to system before control transfer • System calls are used all the time • Accepting user’s input from keyboard, printing to console, opening files, reading from and writing to files

  46. System Call Implementation • Implemented as a trap to a specific location in the interrupt vector (interrupting instructions contains specific requested service, additional information contained in registers) • Trap executed by syscall instruction • Control passes to a specific service routine • System calls are usually not called directly - There is a mapping between a API function and a system call • System call interface intercepts calls in API, looks up a table of system call numbers, and invokes the system calls

  47. Figure 2.9 (Silberschatz)

  48. Traditional UNIX System Structure

More Related