1 / 35

DISTRIBUTED FILE SYSTEMS

DISTRIBUTED FILE SYSTEMS. Computer Engineering Department Distributed Systems Course Asst. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 201 3. File System Basics. File: named collection of logically related data Unix file: an uninterpreted sequence of bytes File system:

shen
Télécharger la présentation

DISTRIBUTED FILE SYSTEMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DISTRIBUTED FILE SYSTEMS Computer Engineering Department Distributed Systems Course Asst. Prof. Dr. AhmetSayar • Kocaeli University - Fall 2013

  2. File System Basics • File: named collection of logically related data • Unix file: an uninterpreted sequence of bytes • File system: • Provides a logical view of data and storage functions • User-friendly interface • Provides facility to create, modify, organize, and delete files • Provides sharing among users in a controlled manner • Provides protection

  3. Client-Server Architectures (1) • Remote access model – work done at server • Stateful server • Consistent sharing (+) • Server may be bottleneck (-) • Need for communication (-) • Upload download model • Stateless server • Simple functionality (+) • Moves files blocks need storage (-)

  4. System Structure: Server Type • Stateless server • No information is kept at server between client requests • All information needed to service a requests must be provided by the client with each request (what info?) • More tolerant to server crashes • Stateful server • Server maintains information about client accesses • Shorted request messages • Better performance • Idempotency easier • Consistency is easier to achieve

  5. NFS Architecture • Sun’s Network File System (NFS) – widely used distributed file system • Uses the virtual file system layer to handle local and remote files

  6. NFS Interface I

  7. NFS Interface II

  8. Communication • (a) Reading data from a file in NFS version 3. (b) Reading data using a compound procedure in version 4. • Both version use ONC’s RPC: one RPC per operation (v3), multiple operations in v4

  9. Naming: Mount Protocol • NFS uses mount protocol to access remote files. • Mount protocol establishes a local name for remote files • Users access remote files using local names; OS takes care of mapping

  10. Naming: Crossing Mount Points • NFS does not allow transitive mounts

  11. Automounting • Automounting: mount on demand

  12. Automounting • The problem with auto-mounting and its solution • Auto-mounter needs to handle all write and read to make the system transparent • Even for file systems that have been already mounted • This may incur a performance bottleneck • Solution: • Use auto-mounter only for mounting and unmounting • Mount directories in a special directory and use symbolic link

  13. File Sharing • If file sharing • Synchronizing • consistency • Performance • Necessary to determine the semantics of reading and writing precisely to avoid problems.

  14. Semantics of File Sharing I • On a single processor, when a read follows a write, the value returned by the read is the value just written.

  15. Semantics of File Sharing II • In a distributed system with caching, obsolete values may be returned.

  16. Semantics of File Sharing III • Four ways of dealing with the shared files in a distributed system • NFS implements session semantics • Can use remote access model for providing UNIX semantics (expensive) • Most implementations use local caches for performance and provide session semantics.

  17. Semantics of File Sharing IV • Unix Semantics • Can be achieved easily if there is only one file server and clients don’t cache it • All reads writes go directly to the server • Session Semantics • Most widely used • Follow remote access model (not upload download) • Make use of local cache by upload/download model • Immutable files • There is no way to open a file for writing • How to write then? • Transactions • To access a file or a group of files – begin_transaction x end_transaction • Run jobs indivisibly • If two or more transaction starts up at the same time, the system ensures final result is correct.

  18. File Locking • NFS supports file locking • Applications can use locks to ensure consistency • Locking was not part of NFS until version 3 • NFS v4 supports locking as part of the protocol (see above table)

  19. Client Caching • Client-side caching is left to the implementation (NFS does not prohibit it) • Different implementation use different caching policies • Sun: allow cache data to be stale for up to 30 seconds

  20. Client Caching: Delegation • NFS V4 supports open delegation • Server delegates local open and close requests to the NFS client • Uses a callback mechanism to recall file delegation.

  21. RPC Failures • Three situations for handling retransmissions: use a duplicate request cache • The request is still in progress • The reply has just been returned • The reply has been some time ago, but was lost.

  22. AFS - CODA

  23. AFS Background • Andrew File System - ASF • Alternative to NFS • Originally a research project at CMU • Then, defunct commercial product from Transarc (IBM) - Public versions now available • Goal: Single namespace for global files (sound familiar? DNS? Web?) • Designed for wide-area file sharing and scalability • Example of stateful file system design • Server keeps track of clients accessing files • Uses callback-based invalidation • Eliminates need for timestamp checking, increases scalability • Complicates design

  24. CODA Overview • DFS designed for mobile clients • Offshoot of AFS designed for mobile clients • Nice model for mobile clients who are often disconnected • Use file cache to make disconnection transparent • At home, on the road, away from network connection • Coda supplements AFS file cache with user preferences • E.g., always keep this file in the cache • Supplement with system learning user behavior • How to keep cached copies on disjoint hosts consistent? • In mobile environment, simultaneous writes can be separated by hours/days/weeks

  25. Disconnected Operation • The state-transition diagram of a Coda client with respect to a volume. • Use hoarding to provide file access during disconnection • Prefetch all files that may be accessed and cache (hoard) locally • Emulating fact that you are connected to the file server. • Reintegration: Client and server file systems are synchronizing

  26. The RPC2 Subsystem in CODA • RPC2 is reliable RPC on top of UDP (unreliable) • Side effects comm. Ex. • In case of video transfer – isochronous communication – requires application spec protocol

  27. The RPC2 Subsystem in CODA • When a file is modified at server – sending invalidate msg • If there are more than one client – send all of them one by one – or- send in parallel?

  28. xFS

  29. xFS Summary • Distributes data storage across disks using software RAID and log-based network striping • RAID == Redundant Array of Independent Disks • Dynamically distribute control processing across all servers on a per-file granularity • Utilizes serverless management scheme • Eliminates central server caching using cooperative caching • Harvest portions of client memory as a large, global file cache.

  30. RAID Overview • Basic idea: files are "striped" across multiple disks • Redundancy yields high data availability • Availability: service still provided to user, even if some components failed • Disks will still fail • Contents reconstructed from data redundantly stored in the array • Capacity penalty to store redundant info • Bandwidth penalty to update redundant info

  31. Problem in Striping Type Partitioning • If there is a failure (if any one disk fails) • Some parts of every file are lost • All of the data gets impacted • SOLUTION IS USING RAID • If any disk fails – through RAID data still becomes available • You can implement it in hardware or software

  32. Array Reliability • Reliability of N disks = Reliability of 1 Disk ÷ N • 50,000 Hours ÷ 70 disks = 700 hours • That is if one fails everything fails • Disk system MTTF: Drops from 6 years to 1 month! • Arrays (without redundancy) too unreliable to be useful!

  33. Redundant Arrays of Inexpensive DisksRAID 1: Disk Mirroring/Shadowing • There are a couple of alternatives – here is the easiest one • Each disk is fully duplicated onto its “mirror” • Very high availability can be achieved • Bandwidth sacrifice on write: • Logical write = two physical writes • Reads may be optimized • Most expensive solution: 100% capacity overhead

  34. Inspiration for RAID 5-RAID4- • Use parity for redundancy • D0 xor D1 xor D2 xor D3 = P • If any disk fails, then reconstruct block using parity: • E.g. D0 =D1 xor D2 xor D3 xor P • Raid 4: all parity blocks stored on the same disk • Small writes are still limited by parity disk • Parity disk becomes bottleneck

  35. Redundant Arrays of Inexpensive DisksRAID 5: High I/O Rate Interleaved Parity • Independent writes possible because of interleaved parity • Examples: write to D0 and D5 uses disk 0,1,3,4

More Related