260 likes | 320 Vues
Distributed File System. Lovekeshkumar Desai CSC 8320 Date: 03/28/2006. Outline. History Terminologies Distributed File System Design The File Service Interface Two types of File Services The Directory Server Interface Naming Transparency. Two-Level Naming Semantics of File Sharing
E N D
Distributed File System. Lovekeshkumar Desai CSC 8320 Date: 03/28/2006.
Outline • History • Terminologies • Distributed File System Design • The File Service Interface • Two types of File Services • The Directory Server Interface • Naming Transparency. • Two-Level Naming • Semantics of File Sharing • The Future of DFS( Most important features that future DFS should support. ) • Current DFS Systems. • References
History of DFS • The major work in DFS after middle eighties. • In the beginning only Datacomputer DFS was there which used to support an FTP-like service for clients which did not have large amount of local storage. It started running on a PDP-10 in the end of 1973. • The Interim File Server (IFS) was created two years later in the XEROX PARC. It was able to organize public and personal files on a hierarchical directory tree. • The same research center brought the Woodstock File Server(SMB79). In this system it was possible to access single pages of a file which allowed the user of diskless clients and over the network virtual memory. • In the following years, a lot of different systems like XDFS, LOCUS, SWALLOW, ACORN, and CMU’s VICE were designed and implemented.
Terminologies • File Service The File service specifies the file system’s interface to the clients. • File Server A File server is a process that runs on some machine and helps implement the file service.
A true File Service Concerned with the operations on individual files, such as reading, writing, and appending. Directory Service Concerned with creating and managing directories, adding and deleting files from directories, and so on. Distributed File System Design
A file can be structured as a sequenced of records. A file can have attributes like owner, size, creation date, and access permissions. Immutable files on server. ( Eliminates all the problems associated with having to update all copies of a file whenever it changes. ) Protection in distributed file systems. Capability: Specifies which kinds of access are permitted. Access control list: associate with each file a list of users who may access the file and how. The File Service Interface
Two types of File Services • The Upload/Download Model • The file service provides only two major operations: read file and write file. • The conceptual model is moving whole files in client-server interaction.
Two types of File Services (Continue…) • Advantages: • Conceptual simplicity • Whole file transfer is highly efficient • Disadvantages: • Enough storage required on client side • If only fraction of file needed, moving entire file is wasteful.
Two types of File Services (Continue…) • The Remote Access Model • The file service provides a large number of operations for opening and closing files, reading and writing parts of files, moving around files ( LSEEK), examining and changing file attributes. • This model come over the disadvantages like requirement of space on client side and moving entire files when only small pieces are needed.
The Directory Server Interface • Basic functionality includes operation for creating and deleting directories, naming and renaming files, and moving them from one directory to another. • Defines some alphabet and syntax for composing file names. Uses file extension or attribute to describe file type • Hierarchical file system. • Allows to create links of pointers to an arbitrary directory. Allows both graph and tree structure. • Different rules for removing links to directories.
The Directory Service Interface (Continue) • Removing links in Graph directory service can lead to creating orphan directories. • Big dilemma in DFS directory service whether to go for single view or different view to individual client. • Global root directory, in which path take form “/server/path”.
Naming Transparency • Location Transparency: The path name gives no hint where the file is located. Like /server1/dir1/dir2/x tells that x is located on server1 but it does not tell where that server is located. The server is free to move anywhere in the network without the path name having to be changed. • Disadvantage is that file can not be moved to another server like server2. • A system in which files can be moved without their names changing is said to have location independence. • A Distributed system that embeds machine or server name in path names is not location independence not either, Remote mounting ( Drive mount).
Two-Level Naming • Most Distributed systems use some form of two-level naming. File have Symbolic names such as prog.c, but they can also have some internal , Binary names for use by the system. • Some sort of mapping functions like hashing used to map two different names. Sometimes the binary names are visible to the users. • Binary name indicate both a server and a specific file on that server. Same approach sometimes preferred to use a symbolic link. A Symbolic link is a directory entry that maps onto a (Server, filename) string , which can be looked on the server to find the binary name (file name).
Semantics of File Sharing • Permits for reading and writing the file. • Unix Semantics to ensure the read-write synchronization • In distributed systems, if only one server is there and clients do not cache files then Unix semantics achieved. • If clients allowed to maintain local copies of files and a client locally modifies a cached file and shortly after another client reads the same file from the server, the second client will get an obsolete file.
Semantics of File Sharing • Session semantics: Changes to an open file are visible only to the machine that modified the file. Only when file is closed are the changes made visible to other machines as well. • What if two clients simultaneously caching and modifying the same file?? • It violates the UNIX semantics in a way not having all READs return the value most recently written.
Semantics of File Sharing • A completely new approach is to make all the files immutable. Only CREATE and READ are allowed operations. In fact entire new file can be created and can be replaced with the same existing name in the same directory. • What if two machines try to replace same file with two new file??? • The final option is to use the atomic transaction approach.
Semantics of File Sharing • UNIX semantics: Every operation on a file is instantly visible to all processes. • Session semantics: No changes are visible to other processes until the file is closed. • Immutable files: No updates are possible; simplifies sharing and replication • Transactions: All changes have the all or nothing property.
Current DFS systems: • SUN NFS The Network File System developed by Sun Microsystems is the most widely used DFS in the Unix world. In 1985 SUN made public the NFS protocol specification. This protocol defines an RPC interface using the external data representation. • ANDREW The ANDREW project started at the Carnegie Mellon University in 1983 with IBM’s support. Its goal was to design and implement an ideal distributed file system for the academic environment which would allow the sharing of a common directory structure among thousands of client machines. • CODA The CODA system implemented in early 90’s. Its main goal was to provide access to a distributed file system from portable computers. CODA also implements automatic replication mechanisms not present on AFS.
Current DFS systems: • SPRITE The SPRITE Network Operating system has been developed in the University of California at Berkeley since middle eighties. SPRITE has transparently distributed file system and a mechanism for process migration, which can be transparent both to the process and its user. • ZEBRA The ZEBRA system has been developed at University of California at Berkeley since 1990. It joined two efficient concepts, Redundant Arrays of Inexpensive Disks (RAIDs) and Log-structured File Systems. • HARP HARP (Highly, Available, Reliable, Persistent file system) is an experimental system developed at MIT in the early nineties. If offers a highly fault tolerant file service by adopting a primary-backup replication scheme.
Current DFS systems: • ECHO The ECHO distributed file system is an ambitious project carried since 1988 at the DEC research center. Two important features • Integration of the local naming service with the Internet domain name service. • In a file server some files called junctions would contain pointers to other server directories and protocol specification for the access to that particular sub-tree. • FROLIC FROLIC has been developed in the nineties at the University of Toronto. It has based on assumption that file sharing on networks may be much larger that what was expected by the AFS team. The system is divided in the clusters. Clients can only perform intra-cluster communication with local servers using NFS. Servers can perform inter-cluster communication in order to serve client requests.
The Future ( Most important features that future DFS should support. ) • Transparency: the file system must make easier user interaction and programmers’ efforts by hiding aspects like data location, network type, operating system type, failure occurrences. • Unix semantics: Provides a simple tool for synchronization and data sharing among processes on different machines. • Automatic replication: not only increases the system availability and reliability but also can offer the user more than one server allowing a choice based on efficiency and workload distribution. • Striping: evenly distributes the workload and allow parallel transfers of single files.
The Future ( Most important features that future DFS should support. ) • Aggressive caching: increases system performance by making the cache hit ration as big as possible. • Automatic compression: by making use of idle CPU cycles decreases the storage space requirements and the data transfer times. • Adaptation: by monitoring the workload, the system should try to make future accesses the most efficient as possible • Multimedia support: multimedia applications deal with huge amount of information like terabytes of data and megabytes of transfer rate per second. File system must decreased the latency and increase transfer speed and file sizes.
References: • Mary Baker: Fast Crash Recovery in DFS, PhD thesis, University of California, Berkeley 1996. • Andrew D. Birrell, Andy Hisgen, “The echo distributed file system” DEC systems Research Center, Palo Alto, CA 1998. • John H. Hartman, Michael D. Kupfer, “ Measurements of a Distributed file system” 13th Symposium on operating system principles. • B. Gr¨onvall, I. Marsh, S. Pink, A Multicastbased Distributed File System for the Internet, In Proceedings of the 8th ACMEuropean SIGOPSWorkshop,1996. • J. H. Howard, M. L. Kazar, S. G. Menees, D. A. Nichols, M. Satyanarayanan, R. N. Sidebotham, M. J.West, Scale and Performance in a Distributed File System, ACM Transactions on Computer Systems,6(1), Feb. 1988. • T. Ballardie, P. Francis, J. Crowcroft, Core Based Trees (CBT), In Proceedings of the ACM SIGCOMM 1999. • Textbook: Distributed Operating Systems by Andrew S. Tanenbaum