90 likes | 169 Vues
Explore MPI, MPI-IO, NetCDF, and HDF5 for high-performance I/O. Discuss testbeds, PVFS opportunities, scalability, and visualization with Jumpshot. Learn to handle TB datasets efficiently with Chiba City Scalability Testbed.
E N D
The Parallel and Grid I/O Perspective • MPI, MPI-IO, NetCDF, and HDF5 are in common use • Multi TB datasets also common • Testbeds needed for software at scale
Topics for Discussion • NetCDF • Other applications (leverage) • Read (parallel analysis tools) • PVFS opportunities • TB in your office • Scalability Testbed • Where do you test at scale? • Application Log files • Real app log files at > 1GB • Other • Can we quantify apps needs?
PVFS Peak Write Performance • Using compute nodes for storage in these tests • Peak at around 25-30 Mbytes/sec per I/O server • Clients cannot maintain this to disk
Processes Logfile Jumpshot Display Performance Visualizationwith Jumpshot • For detailed analysis of parallel program behavior, timestamped events are collected into a log file during the run. • A separate display program (Jumpshot) aids the user in conducting a post mortem analysis of program behavior. • Log files can become large (>1GB), making it impossible to inspect the entire program at once. • The FLASH Project motivated an indexed file format (SLOG) that uses a preview to select a time of interest and quickly display an interval. • We collaborated with IBM and LLNL to collect SLOG files directly from AIX trace records and display traces from multithreaded programs.
Chiba City Scalability Testbed http://www.mcs.anl.gov/chiba/
Notes • FAQ on Parallel I/O • Include performance graphs, tutorial links • Interaction with P2 (Data Mining and Access Pattern Discovery) • Parallel NetCDF (P2 as an application group) • Managing datasets of NetCDF files • Collect log files of application I/O • Explore use of WAN FTP for Grid I/O • Remote I/O through MPI-IO interface • PVFS Clusters for TB dataset experimentation • Close with John Drake on parallel NetCDF for Climate
SC02 Demo • Use Parallel NetCDF over MPI-IO over PVFS to access dataset • Extract time series from collection of files • Parallel reads as well as writes • New feature: handle dynamically changing datasets • Observe progress of running application • Perform data analysis and visualization • Contrast with nonparallel approach • Prototype on Chiba scalability testbed at ANL • Bonus: collect log files of I/O behavior and show analysis and visualizations of log files
Demo Steps • Select variable from collection of files, write a new NetCDF file • Illustrates fast I/O • (address open performance for collections of files) • Perform PCA • Illustrates algorithmically efficient methods • Visualize at each time step
Vision for the Future • Databases and parallel I/O integration • Data representations for standard file formats that provide better performance for typical access patterns (post NetCDF/HDF) • Transparent parallel I/O to/from everywhere (grid transparent, file system hierarchy transparent)