500 likes | 624 Vues
CPS 210 Unix and Beyond. Jeff Chase Duke University http://www.cs.duke.edu/~chase/cps210. “ Just make it â€. To get started on heap manager, download the files and type “ make †. Provides a script to build the heap manager test programs on Linux or MacOS.
E N D
CPS 210 Unix and Beyond Jeff Chase Duke University http://www.cs.duke.edu/~chase/cps210
“Just make it” • To get started on heap manager, download the files and type “make”. • Provides a script to build the heap manager test programs on Linux or MacOS. • This lab is just a taste of system programming in C. • The classic text is CS:APP. • Also see PDF “What every computer systems student should know about computers” on the course website. • You may think of it as notes from CS:APP. It covers background from Computer Architecture and also some material for this class. http://csapp.cs.cmu.edu a classic
64 bytes: 3 ways p + 0x0 0x0 int p[] int* p char p[] char *p 0x1f p char* p[] char** p 0x0 0x1f Pointers (addresses) are 8 bytes on a 64-bit machine. 0x1f
Alignment p + 0x0 0x0 int p[] int* p X X char p[] char *p 0x1f p char* p[] char** p 0x0 X 0x1f The machine requires that an n-byte value is aligned on an n-byte boundary. n = 2i 0x1f
Heap allocation A contiguous chunk of memory obtained from OS kernel. E.g., with Unix sbrk() system call. Allocated heap blocks for structs or objects. Align! A runtime library obtains the block and manages it as a “heap” for use by the programming language environment, to store dynamic objects. E.g., with Unix malloc and free library calls.
Variable Partitioning Variable partitioning is the strategy of parking differently sized cars along a street with no marked parking space dividers. 2 3 1 Wasted space external fragmentation
Alternative: block maps The storage in a heap block is contiguous in the VAS. C and other PL environments require this. That complicates the heap manager because the heap blocks may be different sizes. Idea: use a level of indirection through a map to assemble a storage object from “scraps” of storage in different locations. The “scraps” can be fixed-size slots: that makes allocation easy because they are interchangeable. map Example: page tables that implement a VAS.
Fixed Partitioning Wasted space internal fragmentation
Post-note • We took much of the class talking about some general issues for naming, illustrated in Unix. • Block maps and other indexed maps are common structure to implement “machine” name spaces: • sequences of logical blocks, e.g., virtual address spaces, files • process IDs, etc. • For sparse block spaces we may use a tree hierarchy of block maps (e.g., inode maps or 2-level page tables, later). • Storage system software is full of these maps. • Symbolic name spaces use different kinds of maps. • They are sparse and require matching more expensive. • Trees of maps create nested namespaces, e.g., the file tree.
Files: hierarchical name space root directory applications etc. mount point external media volume or network storage user home directory
File I/O Pathnames are translated through the directory tree, starting at the root directory or current directory. char buf[BUFSIZE]; int fd; if ((fd = open(“../zot”, O_TRUNC | O_RDWR) == -1) { perror(“open failed”); exit(1); } while(read(0, buf, BUFSIZE)) { if (write(fd, buf, BUFSIZE) != BUFSIZE) { perror(“write failed”); exit(1); } } Every system call should check for errors and handle appropriately. File grows as process writes to it system must allocate space dynamically. System finds the physical disk locations of the file’s logical blocks by indexing a block map (the file’s index node or “inode”).
A filesystem on disk inode 0 bitmap file inode 1 root directory fixed locations on disk 11100010 00101101 10111101 wind: 18 0 0 snow: 62 rain: 32 hail: 48 10011010 00110001 00010101 allocation bitmap file blocks 00101110 00011001 01000100 directory blocks once upo n a time /n in a l file blocks and far far away , lived th regular file (inode) This is a toy example (Nachos).
Names and layers User view notes in notebook file Application notefilefd, byte range* fd File System bytes block# device, block # Disk Subsystem surface, cylinder, sector Add more layers as needed.
Directories wind: 18 0 0 snow: 62 rain: 32 directory inode hail: 48 A creat operation must scan the directory to ensure that creates are exclusive. There can be no duplicate names: the name mapping is a function. Note: implementations vary. Large directories are problematic. lblock 32 Entries or free slots are typically found by a linear scan.
Operations on Directories (UNIX) • Link - make entry pointing to file • Unlink - remove entry pointing to file • Rename • Mkdir - create a directory • Rmdir - remove a directory
ln -s /usr/Marty/bar bar creat bar creat foo ln /usr/Lynn/foo bar unlink bar unlink foo foo bar Links usr Lynn Marty
directory A directory B wind: 18 0 0 inode link count = 2 sleet: 48 rain: 32 hail: 48 inode 48 Unix File Naming (Hard Links) A Unix file may have multiple names. Each directory entry naming the file is called a hard link. Each inode contains a reference count showing how many hard links name it. unlink system call (“remove”) unlink(name) destroy directory entry decrement inode link count if count == 0 and file is not in active use free blocks (recursively) and on-disk inode link system call link (existing name, new name) create a new name for an existing file increment inode link count Illustrates: garbage collection by reference counting.
wind: 18 0 0 directory A directory B sleet: 67 rain: 32 hail: 48 inode link count = 1 ../A/hail/0 inode 48 inode 67 Unix Symbolic (Soft) Links A soft link is a file containing a pathname of some other file. symlink system call symlink (existing name, new name) allocate a new file (inode) with type symlink initialize file contents with existing name create directory entry for new file with new name The target of the link may be removed at any time, leaving a dangling reference. How should the kernel handle recursive soft links?
Concepts • Reference counting and reclamation • Redirection/indirection • Dangling reference • Binding time (create time vs. resolve time) • Referential integrity
Processes and the kernel Programs run as independent processes. Each process has a private virtual address space and one thread. data data Protected system calls ...and upcalls (e.g., signals) Protected OS kernel mediates access to shared resources. Threads enter the kernel for OS services. The kernel is a separate component/context with enforced modularity. The kernel syscall interface supports processes, files, pipes, and signals.
GS4. Layered systems Garlan and Shaw, An Introduction to Software Architecture, 1994.
Processes: A Closer Look stack thread virtual address space user ID process ID parent PID sibling links children process descriptor (PCB) + + resources Each process has a thread bound to the VAS. The thread has a stack addressable through the VAS. The kernel can suspend/restart the thread wherever and whenever it wants. The OS maintains some state for each process in the kernel’s internal data structures: a file descriptor table, links to maintain the process tree, and a place to store the exit status. The address space is a private name space for a set of memory segments used by the process. The kernel must initialize the process memory for the program to run.
VAS example (32-bit) 0x7fffffff Reserved Stack • An addressable array of bytes… • Containing every instruction the process thread can execute… • And every piece of data those instructions can read/write… • i.e., read/write == load/store • Partitioned into logical segments with distinct purpose and use. • Every memory reference by a thread is interpreted in its VAS context. • Resolve to a location in machine memory • A given address in different VAS may resolve to different locations. Dynamic data (heap/BSS) Static data Text (code) 0x0
A Peek Inside a Running Program 0 CPU common runtime your program x code library address space (virtual or physical) your data R0 heap Rn x PC y SP registers stack y high “memory”
Unix File Descriptors Illustrated user space kernel file pipe process file descriptor table socket open file table tty • Processes may share open files (“objects”), but the binding of file descriptors to objects is specific to each process. • e.g., see the dup system call Disclaimer: this drawing is oversimplified.
Networking endpoint port operations advertise (bind) listen connect (bind) close write/send read/receive channel binding connection node A node B Some IPC mechanisms allow communication across a network. E.g.: sockets using Internet communication protocols (TCP/IP). Each endpoint on a node (host) has a port number. Each node has one or more interfaces, each on at most one network. Each interface may be reachable on its network by one or more names. E.g. an IP address and an (optional) DNS name.
What is a distributed system? "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." -- Leslie Lamport Leslie Lamport
GS6. Interpreter Garlan and Shaw, An Introduction to Software Architecture, 1994.
Interpreter: example An interpreter controls how a program executes and what it sees. An interpreter can “sandbox” a program for isolation.
Threads: a familiar metaphor 1 2 3 Page links and back button navigate a “stack” of pages in each tab. Each tab has its own stack. One tab is active at any given time. You create/destroy tabs as needed. You switch between tabs at your whim. Similarly, each thread has a separate stack. The OS switches between threads at its whim. One thread is active per CPU core at any given time. time
Fork • Child can’t be an exact copy • Is distinguished by one variable (the return value of fork) if (fork () == 0) { /* child */ execute new program } else { /* parent */ carry on }
Post-note: understand garbage collection • Garbage collection: the language runtime system calls the underlying heap manager to free unused heap blocks automatically; the program itself does not have to do it. • Java does it for you, but C does not. • A heap block is “garbage” only when there are no references to the block, e.g., no pointers to the object that lives in that block. • A reference is a stored name. The garbage collector counts these references, and marks a block as garbage when all references to it are gone. To do that it must find/identify all stored references. • Java knows the types of all of a program’s data objects, so it can find stored references and identify their targets. • A language that supports garbage collection may also move objects around to compact the heap to reduce fragmentation. • Weakly typed languages like C cannot do this for you. Q: can a file system garbage collect or compact stored data on disk?
Post-note • Next slide gives more detail on fork/exit. • We will discuss kernel protection and kernel entry and exit more later.
Mode Changes for Fork/Exit transition from user to kernel mode (callsys) transition from kernel to user mode (retsys) • Syscall traps and “returns” are not always paired. • Fork “returns” (to child) from a trap that “never happened” • Exit system call trap never returns • System may switch processes between trap and return Fork call Fork return Wait call Wait return Exec enters the child by doctoring up a saved user context to “return” through. parent child Fork entry to user space Exit call
Example: System Call Traps • Programs in C, C++, etc. invoke system calls by linking to a standard library of procedures written in assembly language. • the library defines a stub or wrapper routine for each syscall • stub executes a special trap instruction (e.g., chmk or callsys or int) • syscall arguments/results passed in registers or user stack Alpha CPU architecture read() in Unix libc.a Alpha library (executes in user mode): #define SYSCALL_READ 27 # op ID for a read system call move arg0…argn, a0…an # syscall args in registers A0..AN move SYSCALL_READ, v0 # syscall dispatch index in V0 callsys # kernel trap move r1, _errno # errno = return status return
Representing a File On Disk file attributes: may include owner, access control list, time of create/modify/access, etc. once upo n a time /nin a l logical block 0 block map Index by logical block number and far far away ,/nlived t logical block 1 physical block pointers in the block map are sector IDs or physical block numbers he wise and sage wizard. logical block 2 “inode”
Post-note • The following slides were presented in the next class (on Android) as intro to motivate Android. • Android keeps the Unix (Linux) kernel, but replaces the entire application framework. • Shell is gone. App execution is controlled by trusted system-wide server process, which is part of the system TCB. • Pipes are gone. Apps interact through system events (intents) and service bindings (binder RPC). • There is only one user, but each app has its own userID. • Each app has at most one instance, with its private files. • Terminals are gone: user opens screens (activities) to interact with apps. The system keeps an activity stack with a “back” button. • foreground and background activities? • System launches app components and reclaims them at suitable times. They don’t “exit”.
Unix, looking backward: UI+IPC • Conceived around keystrokes and byte streams • User-visible environment is centered on a text-based command shell. • Limited view of how programs interact • files: byte streams in a shared name space • pipes: byte streams between pairs of sibling processes
Unix, looking backward: upcalls • Limited view of how programs interact with the OS. • The kernel directs control flow into user process at a fixed entry point: e.g., entry for exec() is _crt0 or “main”. • Process may also register a signal handlers for events relating to the process, (generally) signalled by the kernel. • Process lives until it exits voluntarily or fails • “receives an unhandled signal that is fatal by default”. data data ...and upcalls (e.g., signals) Protected system calls
X Windows (1985) • Big change: GUI. • Windows • Window server • App events • Widget toolkit
Unix, looking backward: security • Presumes multiple users sharing a machine. • Each user has a userID. • UserID owns all files created by all programs user runs. • Any program can access any file owned by userID. • Each user trusts all programs it chooses to run. • We “deputize” every program. • Some deputies get confused. • Result: decades of confused deputy security problems. • Contrary view: give programs the privileges they need, and nothing more. • Principle of Least Privilege