Distributed File Systems: Concepts and Implementation

File Systems IV CS 423, Fall 2007 Klara Nahrstedt/Sam King

Administrative • MP3 deadline, November 5, 2007 • Today discussion • Basic Concepts • RPC, Reliability/Failure, State, Replication • Examples of Distributed File Systems • NFS, AFS, Google

Remote Procedure Call RPC servers will call arbitrary functions in dll, exe, with arguments passed over the network, and return values back over network Client: Server: foo.dll,bar(4, 10, “hello”) “returned_string” foo.dll,baz(42) err: no such function …

Possible Interfaces RPC can be used with two basic interfaces: synchronous and asynchronous Synchronous RPC is a “remote function call” – client blocks and waits for return val Asynchronous RPC is a “remote thread spawn”

Synchronous RPC

Asynchronous RPC

Asynchronous RPC 2: Callbacks

Wrapper Functions Writing rpc_call(foo.dll, bar, arg0, arg1..) is poor form Confusing code Breaks abstraction Wrapper “stub” function makes code cleaner bar(arg0, arg1); //programmer writes this; // makes RPC “under the hood”

More Design Considerations Who can call RPC functions? Anybody? How do you handle multiple versions of a function? Need to marshal objects How do you handle error conditions? Numerous protocols: DCOM, CORBA, JRMI…

Characteristics of Reliable DFS • Fault-tolerant • Highly available • Recoverable • Consistent • Scalable • Predictable performance • secure

Failures • Hardware failures • Happen less frequently now • Software failures • Software bugs account for estimated 25-35% of unplanned downtime • Residual bugs in mature systems • Heisenbug – disappears or alters characteristics when observed or researched • Bohrbug – does not disappear or alter its characteristics when researched – manifests itself under well-defined conditions

Specific Failures in DFS • Halting failures • Fail-stop • Omission failures • Network failures • Network partition failures • Timing failures • Byzantine failures

8 Fallacies • Network is reliable • Latency is 0 • Bandwidth is infinite • Network is secure • Topology does not change • There is one administrator • Transport cost is 0 • Network is homogeneous

Stateful versus Stateless Service • Stateful • Server records which client is accessing file • What are the advantages and disadvantages? • UNIX stateful • Stateless • Each request independent from previous requests (contains state info) • What are the advantages and disadvantages? • NFS stateless

File Replication • Replicas of same file reside on failure-independent machines • Improves availability • Replicas should be invisible, yet distinguished at lower levels • Updates to replicas must be duplicated -- need exactly once semantics. • Demand replication -- build a cache of whole file

Network File System (NFS) • Arbitrary collection of clients and servers share a common file system • Multiple file system trees on different machines can be mounted together • Mount procedure • OS is given the name of the device and the location within the file structure at which to attach the file system (called ‘mount point’ ) • OS verifies that the device contains a valid file system • OS notes in its directory structure that a file system is mounted at the specified mount point

Examples of remote mounted file systems

Major Layers of NFS Architecture • vnode -- network wide unique (like an inode but for a network) • RPC and NFS Service layer -- NFS Protocol • Path name look up (past mount point) requires RPC per name. • client cache of remote vnodes for remote directory names • Can Client access another server through a server?

NSF Layer Structure

NFS Protocols (2 client-server protocols) • First NFS protocol handles mounting • A client can send a path name to a server and request permission to mount that directory somewhere in its directory hierarchy • The place where it is to be mounted is not contained in the message, as the server does not care where it is mounted • If the path name is legal and the directory specified has been exported, the server returns a file handle to the client • File handle contains fields uniquely identifying the file system type, disk, i-node number of the directory, security information

NSF Protocols • Second NFS Protocol is for directory and file access • Clients send messages to servers to manipulate directories, read and write files • Clients access file attributes • NFS ‘read’ operation • Lookup operation – returns file handle • Read operation – uses file handle to read the file • Advantage: stateless server !!!!

NFS Caching • File blocks and file-attribute caches • Attributes used only if up to date. Discarded after 60 seconds. • Read-ahead and delayed write techniques used. • Delayed write used even for concurrent access (not UNIX semantics.) • New files may not be visible for 30 seconds. • Updated files may not be visible to systems with file open for reading for a while.

SUN Network File System • Uses UDP/IP protocol and stateless server • A remote file system is mounted over a local file system directory • Local file system directory is no longer visible. • The mount command uses name of remote machine • No concurrency control mechanisms, modified data must be committed to server disk before request returned to client to avoid problems • Works on heterogeneous machines by using a machine independent RPC

Andrew File System Architecture User admin cache servers volumes Desktop computers /afs/hq.firm/ User/alice Solaris/bin Group/research Transarc.com/pub AFS Namespace

AFS (1) • AFS – Andrew File System • workstations grouped into cells • note position of venus and vice • Client's view

AFS (2) • Aimed at scalability • Clients are not servers • Local name space and shared name space • Local name space is root file system • Whole file caching • Clients may access files from any workstation using same name space

AFS (3) • Security imposed at server interfaces -- no client programs run on servers. • Access lists for files • Client workstation interacts with servers only during opening and closing of files • Reading and writing bytes performed by kernel • AFS used by NCSA Web server over FDDI

Google File System (GFS) • Design Assumptions • Component failures are norm rather than the exception • Inexpensive commodity components • Files are huge by traditional standards • Multi-GB files are common • High sustained bandwidth is more important than low latency "The Google File System", SOSP 2003

GFS (Design Assumptions) • Most files are mutated by appending new data rather than overwriting existing data • Once written, the files are only read and often only sequentially • Two types of reads – large streaming reads and small random reads • Efficient implementation • Allow multiple clients read the same file • Atomicity with minimal synchronization overhead is essential

GPS Architecture

GFS Design Parameters • Client and Chunkserver can run on the same machine • Files divided into fixed-sized chunks (64 MB) • Chunk handle • What are the tradeoffs of the 64 MB chunk size? • Chunkserver stores chunks on local disks as Linux files • Chunks are replicated on multiple chunkservers • Master maintains all file system metadata – stateful (namespace, access control, mapping file to chunks, current location)

GFS Design Parameters • Client’s code implements GFS API and communicates with master and chunkservers to read and write • Client communicates with master for metadata information • Client communicates with chunkservers for data over TCP/IP • No data caching!!!

GFS Design Parameters • Single Master • Sophisticated chunk placement • Replication decisions using global knowledge • Minimal involvement in reads and writes • Metadata: file and chunk namespaces, mapping from files to chunks and locations of each chunk’s replicas. • Heartbeat protocol between master and chunkservers

GFS Consistency Model • Relaxed consistency model • File namespace mutations (e.g., file creation) are atomic (handled by master) • Data mutations (e.g. Writes or record appends) are atomic • Each mutation is performed at all chunk’s replicas • Stale replica detection • use chunk version number to distinguish up-to-date and stale replicas

Distributed File Systems: Concepts and Implementation

Distributed File Systems: Concepts and Implementation

Presentation Transcript

File Systems

File Systems - IV

File Systems

File Systems

File Systems

File-Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems