Massive High-Performance Global File Systems for Grid Computing

Massive High-Performance Global File Systems for Grid Computing By Phil Andrews, Patricia Kovatch, Christopher Jordan Presented by Han S Kim

Outline I Introduction II GFS via Hardware Assist: SC’02 III Native WAN-GFS: SC’03 True Grid Prototype: SC’04 IV V Production Facility: 2005 VI Future Work Concurrent Systems Architecture Group

I Introduction Concurrent Systems Architecture Group

Introduction- The Original Mode of Operation for Grid Computing • To submit the user’s job to the ubiquitous grid. • The job would run on the most appropriate computational platform available. • Any data required for the computation would be moved to the chosen compute facility’s local disk. • Output data would be written to the same disk. • The normal utility used for the data transfer would be GridFTP. Concurrent Systems Architecture Group

Introduction- In Grid Supercomputing, • The very large size of the data sets used. • The National Virtual Observatory • consists of approximately 50 Terabytes, • is used as input by several applications. • Some applications write very large amounts of data • The Southern California Earthquake Center simulation • Writes close to 250 Terabytes in a single run • Other applications require extremely high I/O rates • The Enzo application-AMR Cosmological Simulation code • Multiple Terabytes per hour is routinely written and read. Concurrent Systems Architecture Group

Introduction- Concerns about Grid Supercomputing • The normal approach of moving data back and forth may not translate well to a supercomputing grid, mostly relating to the very large size of the data sets used. • These size and required transfer rates are not conducive to routine migration of wholesale input and output data between grid sites. • The computation system may not have enough room for a required dataset or output data. • The necessary transfer rates may not be achievable. Concurrent Systems Architecture Group

Introduction- In this paper.. • Show • How a Global File System, where direct file I/O operations can be performed across a WAN can obviate these concerns. • A series of large-scale demonstrations Concurrent Systems Architecture Group

II GFS via Hardware Assist: SC’02 Concurrent Systems Architecture Group

2. GFS via Hardware Assist: SC’02 - At That Time… • Global File Systems were still in the concept stage. • Two Concerns • The latencies involved in a widespread network such as the TeraGrid • The file systems did not yet have the capability of exportation across a WAN Concurrent Systems Architecture Group

2. GFS via Hardware Assist: SC’02 - Approach • Used hardware capable of encoding Fibre Channel frames within IP packets (FCIP) • Internet Protocol-based storage networking technology developed by IETF • FCIP mechanisms enable the transmission of Fiber Channel information by tunneling data between storage area network facilities over IP networks. Concurrent Systems Architecture Group

2. GFS via Hardware Assist:SC’02- The Goal of This Demo • In that year, the annual Supercomputing conference was Baltimore. • The distance between show floor and San Diego is greater than any within the TeraGrid. • The perfect opportunity to demonstrate whether latency effects would eliminate any chance of a successful GFS at that distance. Concurrent Systems Architecture Group

San Diego Baltimore Force 10 GbE switch Force 10 GbE switch Nishan 4000 Nishan 4000 Brocade 12000Fiber Channel Switch Brocade 12000Fibre Channel Switch FC Disk Cache, 17TB Sun SF6800 Silos and TapeDrives, 6PB 2. GFS via Hardware Assist: SC’02 - Hardware Configuration btw San Diego and Baltimore Two 4GbE channels Two 4GbE channels Two 4GbE channels Two 4GbE channels TeraGrid backbone, ScieNet 10Gb/s WAN Encoded and decoded Fiber Channel frames into IP packets for transmission and reception Concurrent Systems Architecture Group

2. GFS via Hardware Assist: SC’02 - SC’02 GFS Performance btw SDSC and Baltimore • 720 MB/s, 80ms round trip SDSC-Baltimore • Demonstrated the a GFS could provide some of the most efficient data transfers possible over TCP/IP Concurrent Systems Architecture Group

III Native WAN-GFS: SC’03 Concurrent Systems Architecture Group

3. Native WAN-GFS: SC’03 - Issue and Approach • Issue: Whether Global File Systems were possible without hardware FCIP encoding. • SC’03 was the chance to use pre-release software from IBM’s General Parallel File System (GPFS) • A true wide area-enabled file system • Shared-Disk Architecture • Files are striped across all disks in the file system • Parallel access to file data and metadata Concurrent Systems Architecture Group

3. Native WAN-GFS: SC’03 - WAN-GPFS Demonstration The Central GFS,40 Two-processor IA64 nodes which provides sufficient bandwidth to saturate the 10GbE link Each server had a single FC HBA and GbE connecters Serves the file system across the WAN to SDSC and NCSA The mode of operation was to copy data produced at SDSC across the WAN to the disk systems on the show floor To visualize it at both SDSC and NCSA 10GbE to TeraGrid Concurrent Systems Architecture Group

3. Native WAN-GFS: SC’03 - Bandwidth Results at SC’03 The visualization application terminated normally as it ran out of data and was restarted. Concurrent Systems Architecture Group

3. Native WAN-GFS: SC’03 - Bandwidth Results at SC’03 • Over a maximum bandwidth 10 Gb/s link, the peak transfer rate was almost 9Gb/s and over 1GB/s was easily sustained. Concurrent Systems Architecture Group

IV True Grid Prototype: SC’04 Concurrent Systems Architecture Group

4. True Grid Prototype: SC’04 - The Goal of This Demonstration • To implement a true grid prototype of what a GFS node on the TeraGrid would look like. • The possible dominant modes of operation for grid supercomputing: • The output of a very large dataset to a central GFS repository, followed by its examination and visualization at several sites, some of which may not have the resources to ingest the dataset whole. • The Enzo application • Writes on the order of a Terabyte per hour: enough for 30Gb/s TeraGrid connection • With the post processing visualization they could check how quickly the GFS could provide data in a scenario. • Ran at SDSC, writing its output directly the GPFS disks in Pittsburgh. Concurrent Systems Architecture Group

4. True Grid Prototype: SC’04 - Prototype Grid Supercomputing at SC’04 40Gb/s 40Gb/s 30Gb/s Concurrent Systems Architecture Group

4. True Grid Prototype: SC’04- Transfer Rates • The aggregate performance: 24Gb/s • The momentary peak: over 27Gb/s • The rates were remarkably constant. Three 10Gb/s connections between the show floor and the TeraGrid backbone Concurrent Systems Architecture Group

V Production Facility: 2005 Concurrent Systems Architecture Group

5. Production Facility: 2005- The needs for Large Disk • By this time, the size of datasets had become large. • The NVO dataset was 50 Terabytes per location, which was a noticeable strain on storage resources. • If a single, central, site could maintain the dataset this would be extremely helpful to all the sites who could access it in an efficient manner. • Therefore, a very large amount of spinning disk would be required. • Approximately 0.5 Petabytes of Serial ATA disk drives was acquired by SDSC. Concurrent Systems Architecture Group

5. Production Facility: 2005 - Network Organization • The Network Shared Disk server • 64 two-way IBM IA64 systems with a single GbE interface and Fibre Channel 2Gb/s Host Bus Adapter NCSA, ANL The disks are 32 IBM FastT100 DS4100 RAID systems with 67 250GB drivers in each. The total raw storage is 32 x 67 x 250GB = 536 TB .5 PetabyteFastT100 Disk Concurrent Systems Architecture Group

5. Production Facility: 2005 - Serial ATA Disk Arrangement 2 Gb/s FC connection 2 Gb/s FC connection 8+P RAID Concurrent Systems Architecture Group

The Number of Remote Nodes 5. Production Facility: 2005- Performance Scaling Maximum of almost 6GB/s out of theoretical maximum of 8GB/s Concurrent Systems Architecture Group

5. Production Facility: 2005- Performance Scaling • The observed discrepancy between read and write rates is not yet understood • However, the dominant usage of the GFS is to be remote reads. Concurrent Systems Architecture Group

VI Future Work Concurrent Systems Architecture Group

6. Future Work • Next year (2006), the authors hope to connect to the DEISA computational Grid in Europe which is planning a similar approach to Grid computing, allowing them to unite the TeraGrid and DEISA Global File Systems in a multi-continent system. • The key contribution of this approach is a paradigm. • At least in the supercomputing regime, data movement and access mechanisms will be the most important delivered capability of Grid computing, outweighing even the sharing or combination of compute resources. Concurrent Systems Architecture Group

Thank you ! Concurrent Systems Architecture Group

Massive High-Performance Global File Systems for Grid Computing

Massive High-Performance Global File Systems for Grid Computing

Presentation Transcript

Introduction to Grid Computing with High Performance Computing

High Performance Integration of Data Parallel File Systems and Computing

High Performance and Grid Computing Group

High-Performance Grid Computing and Research Networking

High-Performance Grid Computing and Research Networking

Massive High-Performance Global File Systems for Grid Computing

High Performance File System Service for Cloud Computing

High-Performance Grid Computing and Research Networking

High-Performance Grid Computing and Research Networking

Credentials for Global High Performance/Grid Computing Research Community Scott Rea

High-Performance Grid Computing and Research Networking

High-Performance Grid Computing and Research Networking

Grid Middleware for High Performance Computing

SE-292 High Performance Computing File Systems

High Performance Cluster and Grid Computing

High-Performance Grid Computing and Research Networking

High-Performance Grid Computing and Research Networking

High-Performance Grid Computing and Research Networking