1 / 26

Petal and Frangipani

Petal and Frangipani. Petal/Frangipani. NFS. “NAS”. Frangipani. “SAN”. Petal. NFS. Frangipani. Petal. Petal/Frangipani. Untrusted OS-agnostic. FS semantics Sharing/coordination. Disk aggregation (“bricks”) Filesystem-agnostic Recovery and reconfiguration Load balancing

roz
Télécharger la présentation

Petal and Frangipani

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Petal and Frangipani

  2. Petal/Frangipani NFS “NAS” Frangipani “SAN” Petal

  3. NFS Frangipani Petal Petal/Frangipani Untrusted OS-agnostic FS semantics Sharing/coordination Disk aggregation (“bricks”) Filesystem-agnostic Recovery and reconfiguration Load balancing Chained declustering Snapshots Does not control sharing Each “cloud” may resize or reconfigure independently. What indirection is required to make this happen, and where is it?

  4. Remaining Slides • The following slides have been borrowed from the Petal and Frangipani presentations, which were available on the Web until Compaq SRC dissolved. This material is owned by Ed Lee, Chandu Thekkath, and the other authors of the work. The Frangipani material is still available through Chandu Thekkath’s site at www.thekkath.org. • For CPS 212, several issues are important: • Understand the role of each layer in the previous slides, and the strengths and limitations of each layer as a basis for innovating behind its interface (NAS/SAN). • Understand the concepts of virtual disks and a cluster file system embodied in Petal and Frangipani. • Understand the similarities/differences between Petal and the other reconfigurable cluster service work we have studied: DDS and Porcupine. • Understand how the features of Petal simplify the design of a scalable cluster file system (Frangipani) above it. • Understand the nature, purpose, and role of the three key design elements added for Frangipani: leased locks, a write-ownership consistent caching protocol, and server logging for recovery.

  5. Petal: Distributed Virtual Disks Systems Research Center Digital Equipment Corporation Edward K. Lee Chandramohan A. Thekkath 10/22/2014

  6. /dev/vdisk5 /dev/vdisk1 /dev/vdisk3 /dev/vdisk4 /dev/vdisk2 Logical System View PC FS UFS AdvFS NT FS ScalableNetwork Petal

  7. Petal Server Petal Server Petal Server Petal Server Physical System View Parallel Database or Cluster File System ScalableNetwork /dev/shared1

  8. Virtual Disks • Each disk provides 2^64 byte address space. • Created and destroyed on demand. • Allocates disk storage on demand. • Snapshots via copy-on-write. • Online incremental reconfiguration.

  9. (server, disk, diskOffset) (vdiskID, offset) Virtual to Physical Translation Server 0 Server 1 Server 2 Server 3 Virtual Disk Directory vdiskID GMap offset PMap2 PMap3 PMap0 PMap1 (disk, diskOffset)

  10. Global State Management • Based on Leslie Lamport’s Paxos algorithm. • Global state is replicated across all servers. • Consistent in the face of server & network failures. • A majority is needed to update global state. • Any server can be added/removed in the presence of failed servers.

  11. Fault-Tolerant Global Operations • Create/Delete virtual disks. • Snapshot virtual disks. • Add/Remove servers. • Reconfigure virtual disks.

  12. Data Placement & Redundancy • Supports non-redundant and chained-declustered virtual disks. • Parity can be supported if desired. • Chained-declustering tolerates any single component failure. • Tolerates many common multiple failures. • Throughput scales linearly with additional servers. • Throughput degrades gracefully with failures.

  13. Chained Declustering Server0 Server1 Server2 Server3 D0 D1 D2 D3 D3 D0 D1 D2 D4 D5 D6 D7 D7 D4 D5 D6

  14. Chained Declustering Server0 Server1 Server2 Server3 D0 D1 D2 D3 D3 D0 D1 D2 D4 D5 D6 D7 D7 D4 D5 D6

  15. The Prototype • Digital ATM network. • 155 Mbit/s per link. • 8 AlphaStation Model 600. • 333 MHz Alpha running Digital Unix. • 72 RZ29 disks. • 4.3 GB, 3.5 inch, fast SCSI (10MB/s). • 9 ms avg. seek, 6 MB/s sustained transfer rate. • Unix kernel device driver. • User-level Petal servers.

  16. Digital ATM Network (AN2) The Prototype ……… src-ss8 src-ss1 src-ss2 /dev/vdisk1 /dev/vdisk1 /dev/vdisk1 /dev/vdisk1 ……… petal8 petal1 petal2

  17. Throughput Scaling

  18. Virtual Disk Reconfiguration 8 servers 6 servers virtual disk w/ 1GB of allocated storage 8KB reads & writes

  19. Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation

  20. Why Not An Old File System on Petal? • Traditional file systems (e.g., UFS, AdvFS) cannot share a block device • The machine that runs the file system can become a bottleneck

  21. Frangipani • Behaves like a local file system • multiple machines cooperatively managea Petal disk • users on any machine see a consistentview of data • Exhibits good performance, scaling, and load balancing • Easy to administer

  22. Ease of Administration • Frangipani machines are modular • can be added and deleted transparently • Common free space pool • users don’t have to be moved • Automatically recovers from crashes • Consistent backup without halting the system

  23. Components of Frangipani • File system core • implements the Digital Unix vnode interface • uses the Digital Unix Unified Buffer Cache • exploits Petal’s large virtual space • Locks with leases • Write-ahead redo log

  24. Locks • Multiple reader/single writer • Locks are moderately coarse-grained • protects entire file or directory • Dirty data is written to disk before lock is given to another machine • Each machine aggressively caches locks • uses lease timeouts for lock recovery

  25. Logging • Frangipani uses a write ahead redo log for metadata • log records are kept on Petal • Data is written to Petal • on sync, fsync, or every 30 seconds • on lock revocation or when the log wraps • Each machine has a separate log • reduces contention • independent recovery

  26. Recovery • Recovery is initiated by the lock service • Recovery can be carried out on any machine • log is distributed and available via Petal

More Related