1 / 15

Farsite: Ferderated, Available, and Reliable Storage for an Incompletely Trusted Environment

Farsite: Ferderated, Available, and Reliable Storage for an Incompletely Trusted Environment. Microsoft Reseach, Appear in OSDI’02. Design Assumption. 100,000 machines in a large corporation or university, interconnected by a high-bandwidth, low-latency network

meganm
Télécharger la présentation

Farsite: Ferderated, Available, and Reliable Storage for an Incompletely Trusted Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Farsite: Ferderated, Available, and Reliable Storage for an Incompletely Trusted Environment Microsoft Reseach, Appear in OSDI’02

  2. Design Assumption • 100,000 machines in a large corporation or university, interconnected by a high-bandwidth, low-latency network • Allow large-scale read-only sharing • Allow small-scale read/write sharing • A small fraction of users misbehave

  3. Enabling Technology Trends • Large amount of unused disk space enables the use of replication for reliability • Relatively low cost of strong cryptography enables distributed security

  4. Problems • Namespace roots • A file system is a hierachical directory namespace, originated at a root • Allows multiple roots, each of which can be regarded as a virtual file server • A root cooresponds to a set of participating machines • Trust and Certification • The security of any distributed system is an issue of trust • Manage trust using public-key-cryptographic certificates • A namespace certificate • A user certificate • A machine certificate

  5. Basic System • Each machine performs three roles: a client, a member of a directory group, and a file host • A directory group: a set of machine that collectively manage file information using a Byzantine-fault-tolerant protocol • A file host: a machine used to store file data replicas

  6. Performance Considerations • Problems ? • All FS metadata operations involve Byzantine-fault-tolerant protocol(BFT) • BFT is high-cost • Solution • Local caching improves read performance (by content leases) • Batch logged updates(write-back caching, due to many writes are deleted or overwritten shortly after they occur)

  7. Security • Access control by ACL • Privacy • Convergent encryption to protect the file data • Exclusive encryption to protect directory or file names • Integrity by a Merkle hash tree

  8. Scalability • When a directory group becomes overloaded, it can delegate part of its namespace to another group • When open a file/directory with a paticular pathname, it needs to determine which group of machines is responsible for that name • Hint-based pathname translation (caching) like in Sprite

  9. Taming aggressive replication in the Pangaea wide-area file system HP Labs

  10. Design Goals • Speed: hide the wide-area networking latency • Availability and autonomy • Network economy: transfer data between nodes in physical proximity, thereby reducing latency and bandwidth

  11. Structure of a file system • Gold replicas • The directory entry of a file lists the file’s gold replicas • Form a clique • Bronze replicas

  12. Replica set management • Pervasive replication: a replica is created whenever a file is accessed by a user • File creation • Replica addition: the new replica S must be added to the graph (m edged) • adds an edge to a random gold replica (from a different region than S) • Asks a random gold replica P, to pick the replica (among P’s immediate graph neighbors)closest to S • Asks P to choose m-2 random replicas using random walk • Name-space containment

  13. Propagating updates • Efficient and reliable update propagation • Delta propagation, harbingers, and using a spanning tree to exploit physical topology • Conflict resolution: combing version vectors and last-writer-win rules • Lack of strong consistency guarantees: eventually achieved

  14. Questions? • Graph-based replica for each file, too much metadata to maintain • Like a multicast-based file system, updates are propagated using multicast

  15. Discussion • Metadata and data management in a distributed file sytem • Either mutable, but have to trust some machines, like xFS, or Farsite using Byzantine-fault-tolerant to trust part of machines to serialize updates • Or immutable, using logged updates, it relies on each individual user to form the image of a file system • The replication factor of metadata and data maybe differ according to their usage?

More Related