Advanced Operating Systems

Advanced Operating Systems Lecture 14:Distributed File System University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani Distributed Operating Systems

How to design a system • How to share data in DS. • References • Chapter 10 of the text book • “Google file system” Distributed Operating Systems

Outline • What are file? General problems • Network File System (NFS) • Andrew file System • Google file system Distributed Operating Systems

What are File? • A file is a collection of data organized by the user. Not necessarily meaningful to OS. operating system. • File system is responsible for managing files typically on persistent storage. • Name files in meaningful ways. • Access files. Create, destroy, read, write, etc. • Physical allocation. • Security and protection. • Resource administration, quotas, priorities, • What DFS does? the same thing on DS environment. • Transparency is important here. Distributed Operating Systems

Why Dist. File System? • More storage than can fit on a single system • More fault tolerance than can be achieved if "all of the eggs are in one basket." • The user is "distributed" and needs to access the file system from many places Distributed Operating Systems

How build DFS? • Grafting a Name Space Into the Tree • Mounting on "/etc/vfstab". Solaris can bind remote directories to local mount points "on demand" through the "automounter". It allows files to be spread out across multiple servers and/or replicated. AFS implements read-only replication. Coda supports read-write replication. • Implementing Operations • Typically done via the virtual file system (VFS) interface. Distributed Operating Systems

How build DFS (2)? • Unit of Transit • How much do we move? whole files and blocks. • Andrew File System (AFS) and Coda version 1 and 2 whole file semantics. NFS and AFS version 3 implement block level semantics. None byte-level semantics. File level long time with opening and closing a file, and less cache efficiency. • Reads, Writes and Coherency • Multiple users write, the final result will be the file from the perspective of one user, but not both. • In UNIX, files are not protected from conflicting writes. Distributed Operating Systems

How build DFS (3)? • Caching and Server State • High latency, we need caching. What happens if two different users want to read the same file? what happens if one of them writes to the file? How do the clients hodling cached copies know? • Ostrich principle -- live with the inconsistency if it does. • Periodically validate our cache by checking with the server: “checksome [blah].?" or “timestamp [blah]?” • keep track of the users by issuing a callback promise to each client as it collects a file/block.. The callback-based approach is optimistic and saves the network overhead. But it does complicate the server and reduce its robustness. Distributed Operating Systems

File Sharing Semantics • Unix semantics • Read after write returns value written • System enforces absolute time ordering on all operations • Always returns most recent value • Changes immediately visible to all processes • Difficult to enforce in distributed file systems unless all access occur at server (with no client caching) • Session semantics • Local changes only visible to process that opened file • File close => changes made visible to all processes • Allows local caching of file at client • Two nearly simultaneous file closes => one overwrites other? Distributed Operating Systems

Other File Sharing Semantics • Immutable files • Create/delete only; no modifications allowed • Delete file in use by another process • Atomic transactions • Access to files protected by transactions • Serializable access, costly to implement Distributed Operating Systems

NFS • Networked file system • Provide distributed filing by remote access • With a high degree of transparency • Method of providing highly transparent access to remote files • Developed by Sun

NFS Characteristics • Volume-level access • RPC-based • Stateless remote file access • Uses XDR for transferring files • Location (not name) transparent • Implementation for many systems • All interoperate, even non-Unix ones • Currently based on VFS

VFS/Vnode Review • VFS—Virtual File System • Common interface allowing multiple file system implementations on one system • Plugged in below user level • Files represented by vnodes

NFS Diagram NFS Client NFS Server / / /tmp /mnt /home /bin x y foo bar

File Handles • On the client site, files are represented by vnodes • The client NFS implementation internally represents remote files as handles • Opaque to client • But meaningful to server • To name remote file, provide handle to server

NFS Architecture (1) • The remote access model. • The upload/download model Distributed Operating Systems

NFS Architecture (2) • The basic NFS architecture for UNIX systems. Distributed Operating Systems

Communication • Reading data from a file in NFS version 3. • Reading data using a compound procedure in version 4. Distributed Operating Systems

NFS Handle Diagram Client side Server side User process file descriptor handle NFS server vnode vnode VFS level VFS level handle inode NFS level UFS

How to make this work? • Could integrate it into the kernel • Non-portable, non-distributable • Instead, use existing features to do the work • VFS for common interface • RPC for data transport

Using RPC for NFS • Must have some process at server that answers the RPC requests • Continuously running daemon process • Somehow, must perform mounts over machine boundaries • A second daemon process for this

NFS Processes • nfsd daemons—server daemons that accept RPC calls for NFS • rpc.mountd daemons—server daemons that handle mount requests • biod daemons—optional client daemons that can improve performance

NFS from the Client’s Side • User issues a normal file operation • Like read() • Passes through vnode interface to client-side NFS implementation • Client-side NFS implementation formats and sends an RPC packet to perform operation • Single client blocks until NFS RPC returns

NFS RPC Procedures • 16 RPC procedures to implement NFS • Some for files, some for file systems • Including directory ops, link ops, read, write, etc. • Lookup() is the key operation • Because it fetches handles • Other NFS file operations use the handle

Naming (1) • Mounting (part of) a remote file system in NFS. Distributed Operating Systems

Naming (2) • Mounting nested directories from multiple servers in NFS. Distributed Operating Systems

Automounting (1) • A simple automounter for NFS. Distributed Operating Systems

Automounting (2) • Using symbolic links with automounting. Distributed Operating Systems

File Attributes (1) • Some general mandatory file attributes in NFS. • NFS modeled based on Unix-like file systems • Implementing NFS on other file systems (Windows) difficult • NFS v4 enhances compatibility by using mandatory and recommended attributes Distributed Operating Systems

File Attributes (2) • Some general recommended file attributes. Distributed Operating Systems

Semantics of File Sharing (1) • On a single processor, when a read follows a write, the value returned by the read is the value just written. • In a distributed system with caching, obsolete values may be returned. Distributed Operating Systems

Semantics of File Sharing (2) • Four ways of dealing with the shared files in a distributed system. • NFS implements session semantics • Can use remote/access model for providing UNIX semantics (expensive) • Most implementations use local caches for performance and provide session semantics Distributed Operating Systems

Implications of Statelessness • NFS RPC requests must completely describe operations • NFS requests should be idempotent • NFS should use a stateless transport protocol (e.g., UDP) • Servers don’t worry about client crashes • Server crashes won’t leave junk lying around

An Important Implication of Statelessness • Servers don’t know what files clients think are open • Unlike in UFS, LFS, most local VFS file systems • Makes it much harder to provide certain semantics • Also scales nicely, though • NFS works hard to provide identical semantics to local UFS operations • Some of this is tricky • Especially given statelessness of server • E.g., how do you avoid discarding pages of unlinked file a client has open?

Sleazy NFS Tricks • Used to provide desired semantics despite statelessness of the server • E.g., if client unlinks open file, send rename to server rather than remove • Perform actual remove when file is closed • Won’t work if file removed on server • Won’t work with cooperating clients

File Handles • Method clients use to identify files • Created by the server on the file lookup • Must be unique mappings of server file identifier to universal identifier • File handles become invalid when server frees or reuses inode • Inode generation number in handle shows when stale

rpc.lockd Daemon • NFS server is stateless, so it does not handle file locking • rpc.lockd provides locking • Runs on both client and server • Client side catches request, forwards to sever daemon • rpc.lockd handles lock recovery when server crashes

rpc.statd Daemon • Also runs on both client and server • Used to check status of a machine • Server’s rpc.lockd asks rpc.statd to store permanent lock information (in file system) • And to monitor status of locking machine • If client crashes, clear its locks from server

Recovering Locks After a Crash • If server crashes and recovers, its rpc.lockd contacts clients to reestablish locks • If client crashes, rpc.statd contacts client when it becomes available again • Client has short grace period to revalidate locks • Then they’re cleared

File Locking in NFS (1) • NFS version 4 operations related to file locking. • Applications can use locks to ensure consistency • Locking was not part of NFS until version 3 • NFS v4 supports locking as part of the protocol (see above table) Distributed Operating Systems

File Locking in NFS (2) • The result of an open operation with share reservations in NFS. • When the client requests shared access given the current denial state. • When the client requests a denial state given the current file access state. Requestaccess Currentaccessstate Distributed Operating Systems

Caching in NFS • What can you cache at NFS clients? • How do you handle invalid client caches? • Data blocks read ahead by biod daemon • Cached in normal file system cache area • File attributes • Specially cached by NFS • Directory attributes handled a little differently than file attributes • Especially important because many programs get and set attributes frequently

Client Caching (1) • Client-side caching is left to the implementation (NFS does not prohibit it) • Different implementation use different caching policies • Sun: allow cache data to be stale for up to 30 seconds Distributed Operating Systems

Client Caching (2) • NFS V4 supports open delegation • Server delegates local open and close requests to the NFS client • Uses a callback mechanism to recall file delegation Distributed Operating Systems

RPC Failures • Three situations for handling retransmissions. • The request is still in progress • The reply has just been returned • The reply has been some time ago, but was lost. Distributed Operating Systems

Security • The NFS security architecture. • Simplest case: user ID, group ID authentication only Distributed Operating Systems

Secure RPCs • Secure RPC in NFS version 4. Distributed Operating Systems

Access Control • The classification of operations recognized by NFS with respect to access control. Distributed Operating Systems

Andrew Model • Files are stored permanently at file server machines • Users work from workstation machines • With their own private namespace • Andrew provides mechanisms to cache user’s files from shared namespace

User Model of AFS Use • Sit down at any AFR workstation anywhere • Log in and authenticate who I am • Access all files without regard to which workstation I’m using

Advanced Operating Systems