150 likes | 158 Vues
Farsite: Ferderated, Available, and Reliable Storage for an Incompletely Trusted Environment. Microsoft Reseach, Appear in OSDI’02. Design Assumption. 100,000 machines in a large corporation or university, interconnected by a high-bandwidth, low-latency network
E N D
Farsite: Ferderated, Available, and Reliable Storage for an Incompletely Trusted Environment Microsoft Reseach, Appear in OSDI’02
Design Assumption • 100,000 machines in a large corporation or university, interconnected by a high-bandwidth, low-latency network • Allow large-scale read-only sharing • Allow small-scale read/write sharing • A small fraction of users misbehave
Enabling Technology Trends • Large amount of unused disk space enables the use of replication for reliability • Relatively low cost of strong cryptography enables distributed security
Problems • Namespace roots • A file system is a hierachical directory namespace, originated at a root • Allows multiple roots, each of which can be regarded as a virtual file server • A root cooresponds to a set of participating machines • Trust and Certification • The security of any distributed system is an issue of trust • Manage trust using public-key-cryptographic certificates • A namespace certificate • A user certificate • A machine certificate
Basic System • Each machine performs three roles: a client, a member of a directory group, and a file host • A directory group: a set of machine that collectively manage file information using a Byzantine-fault-tolerant protocol • A file host: a machine used to store file data replicas
Performance Considerations • Problems ? • All FS metadata operations involve Byzantine-fault-tolerant protocol(BFT) • BFT is high-cost • Solution • Local caching improves read performance (by content leases) • Batch logged updates(write-back caching, due to many writes are deleted or overwritten shortly after they occur)
Security • Access control by ACL • Privacy • Convergent encryption to protect the file data • Exclusive encryption to protect directory or file names • Integrity by a Merkle hash tree
Scalability • When a directory group becomes overloaded, it can delegate part of its namespace to another group • When open a file/directory with a paticular pathname, it needs to determine which group of machines is responsible for that name • Hint-based pathname translation (caching) like in Sprite
Taming aggressive replication in the Pangaea wide-area file system HP Labs
Design Goals • Speed: hide the wide-area networking latency • Availability and autonomy • Network economy: transfer data between nodes in physical proximity, thereby reducing latency and bandwidth
Structure of a file system • Gold replicas • The directory entry of a file lists the file’s gold replicas • Form a clique • Bronze replicas
Replica set management • Pervasive replication: a replica is created whenever a file is accessed by a user • File creation • Replica addition: the new replica S must be added to the graph (m edged) • adds an edge to a random gold replica (from a different region than S) • Asks a random gold replica P, to pick the replica (among P’s immediate graph neighbors)closest to S • Asks P to choose m-2 random replicas using random walk • Name-space containment
Propagating updates • Efficient and reliable update propagation • Delta propagation, harbingers, and using a spanning tree to exploit physical topology • Conflict resolution: combing version vectors and last-writer-win rules • Lack of strong consistency guarantees: eventually achieved
Questions? • Graph-based replica for each file, too much metadata to maintain • Like a multicast-based file system, updates are propagated using multicast
Discussion • Metadata and data management in a distributed file sytem • Either mutable, but have to trust some machines, like xFS, or Farsite using Byzantine-fault-tolerant to trust part of machines to serialize updates • Or immutable, using logged updates, it relies on each individual user to form the image of a file system • The replication factor of metadata and data maybe differ according to their usage?