PVFS (parallel Virtual file system)

PVFS(parallel Virtual file system) Bohao She CSS 534 Spring, 2014

Background • First developed in 1993 by Walt Ligon and Eric Blumer • It was conducted jointly between The Parallel Architecture Research Laboratory at Clemson University and The Mathematics and Computer Science Division at Argonne National Laboratory. • Funded by NASA Goddard Space Flight Center Code 930 and The National Computational Science Alliance through the National Science Foundation's Partnerships for Advanced Computational Infrastructure. • Based on Vesta, a parallel file system developed in IBM

Features • Object-based design • All PVFS server requests involved objects call dataspaces • A dataspace can be used to hold: • File data • File metadata • Directory metadata • Directory entries • Symbolic links • Every dataspace in the file system has a unique handle • A dataspace has two components: • Bytestream, typically used to hold file data • Set of key/value pairs, typically used to hold metadata

Features (cont.) • Separation of data and metadata • Client can access a server for metadata once • Then access data server without further interaction with the metadata server. • Removes a critical bottleneck from the system. • MPI-based requests • Client program requests data from PVFS it can supply a description of the data that is based on MPI_Datatypes

Features (cont.) • Multiple network support • PVFS uses a networking layer named Buffer Message Interface (BMI) which provides a non-blocking message interface designed specifically for file systems. • BMI has multiple implementation modules for a number of different networks used in high performance computing including TCP/IP, Myrinet, Infiniband, and Portals. • Stateless • Server does not share any state with each other or with clients. If a server crashes, another can be started in its place. • Update are performed without using locks.

Features (cont.) • User-level implementation • Clients and servers are running at user level • It has an optional kernel module to allow a PVFS to be mounted like any other file system • Or, programs can be linked directly to user interface like MPI-IO and Posix-like interface • Makes it easy to install and less prone to causing system crashes • System-level interface • PVFS interface is designed to integrate at the system level • It exposes many of the features of the underlying file system so that interfaces can take advantage of them if desired • It is similar to the Linux VFS, making it easy to implement as a mountable file system.

Architecture • There are four major components to the PVFS system: • Metadata server (MGR) • I/O server (ION or Iod) • PVFS native API (libpvfs on CN) • PVFS Linux kernel support Figure courtesy of P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A Parallel File System For Linux Clusters", Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317-327

Architecture (cont.) • The metadata server, MGR, manages all file metadata for PVFS files. Metadata is information which describes a file, such as its name, its place in the directory hierarchy, its owner, and how it is distributed across nodes in the system. • By having a daemon which atomically operates on file metadata we avoid many of the shortcomings of storage area network approaches, which have to implement complex locking schemes to ensure that metadata stays consistent in the face of multiple accesses. Figures courtesy of P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A Parallel File System For Linux Clusters", Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317-327

Architecture (cont.) • The I/O server, ION or Iod, handles storing and retrieving file data stored on local disks connected to the node. • These servers actually create files on an existing file system on the local node, and they use the traditional read(), write(), and mmap() for access to these files. • This means that one can use whatever local file system one likes for storing this data. One can even use software or hardware RAID support on the node to tolerate disk failures transparently and to create extremely large file systems. Figure courtesy of P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A Parallel File System For Linux Clusters", Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317-327

Architecture (cont.) • The PVFS native API, libpvfs, provides user-space access to the PVFS servers. It handles the scatter/gather operations necessary to move data between user buffers and PVFS servers, keeping these operations transparent to the user. • For metadata operations, applications communicate through the library with the metadata server. • For data access the metadata server is eliminated from the access path and instead I/O servers are contacted directly. Figures courtesy of P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A Parallel File System For Linux Clusters", Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317-327

Architecture (cont.) • The PVFS Linux kernel support provides the functionality necessary to mount PVFS file systems on Linux nodes. It includes a loadable module, an optional kernel patch to eliminate a memory copy, and a daemon (pvfsd) accesses the PVFS file system on behalf of applications. • This allows existing programs to access PVFS files without any modification. This support is not necessary for PVFS use by applications, but it provides an extremely convenient means for interacting with the system. Figure courtesy of P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A Parallel File System For Linux Clusters", Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317-327

File striping and partitioning ION CN • Though there are six IONs in this example, the file is striped across only three IONs, starting from node 2, because the metadata file specifies such a striping. • Each I/O daemon stores its portion of the PVFS file in a file on its ION local file system. • The name of this file is based on the inode number that the manager assigned to the PVFS file. It is 1092157504 in this example. Left images courtesy of P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A Parallel File System For Linux Clusters", Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317-327 Right image courtesy of Munehiro Fukuda, “CSS534 Parallel Programming in Grid and Cloud, Lecture 11: File Management,” UW Bothell, 2014, pp. 17

Performance over Ethernet Data courtesy of P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A Parallel File System For Linux Clusters", Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317-327

Performance over Myrinet Data courtesy of P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A Parallel File System For Linux Clusters", Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317-327

Pros and Cons • Pros: • Higher cluster performance than NFS. • Many hard drives to act a one large hard drive. • Works with current software. • Best when reading/writing large amounts of data • Cons: • Multiple points of failure. • Poor performance when using kernel module. • Not as good for “interactive” work.

Question?

PVFS (parallel Virtual file system)

PVFS (parallel Virtual file system)

Presentation Transcript

Introduction to FUSE File system in USEr space

The Operating System Level

parallel virtual file system

Technical Approach – Parallel File System Virtualization Per application virtual PFS’s

Parallel File System Simulator

Welcome to the PVFS BOF!

Sensitivity of Cluster File System Access to I/O Server Selection

A Look at PVFS, a Parallel File System for Linux

Chapter 12: File System Implementation

File Layer and Virtual File System

4.3. The Virtual File System

Overview

Ch 12 Virtual File System (VFS)

教育部顧問式嵌入式軟體聯盟

General Parallel File System

PVFS: A Parallel File System for Linux Clusters

FILE SYSTEM FRAMEWORK —— virtual file system framework

Panasas Parallel File System

Chapter 15: Advanced File System

Ch 12 Virtual File System (VFS)

Chapter 15: File System Internals