Scale and Performance in a Distributed File System - PowerPoint PPT Presentation

jinyong yoon 2010 10 18 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Scale and Performance in a Distributed File System PowerPoint Presentation
Download Presentation
Scale and Performance in a Distributed File System

play fullscreen
1 / 21
Scale and Performance in a Distributed File System
379 Views
Download Presentation
africa
Download Presentation

Scale and Performance in a Distributed File System

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Jinyong Yoon, 2010. 10. 18. Scale and Performance in a Distributed File System

  2. Outline • Andrew File System • The Prototype • Changes for Performance • Effect of Changes for Performance • Comparison with A Remote-Open File System • Conclusion

  3. Andrew File System • Developed at Carnegie Mellon University • Distributed file system by considerations of scale • Locality of file references • Present a homogeneous, location-transparent file name space to all the client workstations • Use 4.2 BSD • Server • A set of trusted servers – Vice • Clients • User level processes – Venus • File system call hooking • Contacts with servers only opens and closes for a whole-file transfer • Caches files from Vice • Store modified copies of files back on the servers

  4. Andrew File System - Overview workstation workstation workstation Server Server User Program User Program User Program Network Vice Vice Venus Venus Venus Unix Kernel Unix Kernel Unix Kernel Unix Kernel Unix Kernel Disk Disk Disk Disk Disk

  5. The Prototype - Description • Venus on the client with a dedicated process • Persistent process on the server • Each server stored the directory hierarchy • Mirroring the structure of the Vice files • .admin directory – Vice file status info • Stub directory – location database • Vice-Venus interface by their full pathname • There’s no notion of a low-level name such as inode • Before using a cached file, Venus verifies its timestamp • Each open of a file thus resulted in at least one interaction with a server, even if the file were already in the cache and up to date

  6. The Prototype – Qualitative Observations • stat primitive • To test for the presence of files • To obtain status information before opening files • Each stat call involved a cache validity check • Increase total running time and the load on servers • Dedicated Process • Excessive context switching overhead • Critical resource limits excess • High virtual memory paging demands

  7. The Prototype – Qualitative Observations • Remote Procedure Call (RPC) • Simplification of implementation • Network related resources in the kernel to be exceeded • Location Database • Difficult to move users’ directories between servers • Etc. • Use Vice file without recompilation or relinking

  8. The Prototype - Benchmark • Benchmark • Command scripts that operates on a collection of files • 70 files(source code of an application program) • 200kb • Stand-alone Benchmark and 5 phases

  9. The Prototype - Benchmark • Skewed distribution of Vice calls • TestAuth – Validate cache entries • GetFileStat – Obtain status information about files absent from the cache

  10. The Prototype - Benchmark • Load unit • Load placed on a server by a single client workstation running this benchmark • A load unit – 5 Andrew users

  11. The Prototype - Benchmark • CPU/disk utilization profiling • Performance bottleneck is CPU • Frequently context switches • The time spent by the servers in traversing full pathnames

  12. Changes for Performance • Cache management • Previous • Status(in virtual memory)/Data(in local disk) cache • Interception only opening/closing operations • Modifications to a cached files are reflected back to Vice when the file is closed • Callback - the server promises to notify it before allowing a modification • This reduces cache validation traffic • Each should maintain callback state information • There is a potential for inconsistency

  13. Changes for Performance • Name resolution • Previous • inode – unique, fixed-length • pathname – one or more, variable-length • namei routine – maps a pathname to an inode • Each Vice pathname involves implicit namei operation • CPU overhead on the servers • fid – unique, fixed-length, two-level name • Map a component of a pathname to a fid • Each 32 bit-Volume number, Vnode number, Uniquifuier • Volume number: Identifying a Volume on one server • Vnode number: Index into an file storage information array • Uniquifuier: Allowing Reuse of Vnode number

  14. Changes for Performance • Communication and server process structure • Using Lightweight Processes (LWPs) instead of a single process • An LWP is bound to a particular client only for the duration of a single server operation. • Low-level storage representation • Access files by their inodes • vnode on the servers • inode on the clients

  15. Change for Performance – Overall Design workstation • If D is in the cache and has a callback on it • If D is in the cache but has no callback on it • If D is not in the cache User Program Unix file system calls Non-local file operations Unix Kernel Unix File System Local Disk

  16. Change for Performance – Effect • Scalability • 19% slower than stand-alone workstation • Prototype is 70% slower

  17. Change for Performance – Effect • Scalability

  18. Comparison with A Remote-Open File System • Remote Open • The data in a file are not fetched en masse • Instead the remote site potentially participates in each individual read an write operation • File is actually opened on the remote site rather than the local site • NFS

  19. Comparison with A Remote-Open File System

  20. Comparison with A Remote-Open File System • Advantage of remote-open file system • Low latency

  21. Conclusion • Scale impacts Andrew in areas besides performance and operability