100 likes | 209 Vues
Explore how Gordon supercomputer was integrated into the CMS workflow to accelerate LHC data analysis, addressing delays in scientific output due to data parked during the collision events. The processes involved in shipping data, running CMS jobs on Gordon, and transferring results back to FNAL are detailed, along with the system specifications and components utilized during this collaboration. Key insights are drawn from the successful completion of the project in early 2013, emphasizing the compatibility of OSG and XSEDE technologies and suggesting areas for further integration improvement.
E N D
Using Gordon toAccelerate LHC Science Rick Wagner San Diego Supercomputer Center Brian Bockelman University of Nebraska-Lincoln XSEDE 13 July 22-25, 2013 San Diego, CA
Coauthors MahidharTatineni Eva Hocks Kenneth Yoshimoto Scott Sakai Michael L. Norman Igor Sfiligoi (UCSD) MatevzTadel (UCSD) James Letts (UCSD) Frank Würthwein (UCSD) Lothar A. Bauerdick (FNAL)
Overview • 2012 LHC data collection rates higher than first planned (1000Hz vs. 150Hz) • Additional data was “parked” to be reduced during 2 year shutdown • Delays the science from data at the end
Overview • Frank Würthwein (UCSD, CMS Tier II lead) approaches Mike Norman (Director of SDSC) regarding analysis delay • A rough plan emerges: • Ship data at the tail of the analysis chain to SDSC • Attach Gordon to CMS workflow • Ship results back to FNAL • From CMS perspective, Gordon becomes a compute resources • From SDSC perspective, CMS jobs run like a gateway
Gordon Overview • 1,024 2S Xeon E5 (Sandy Bridge) nodes • 16 cores, 64 GB/node • Intel Jefferson Pass mobo • PCI Gen3 • 300 GB Intel 710 eMLC SSDs • 300 TB aggregate • 64, 2S Westmere I/O nodes • 12 core, 48 GB/node • 4 LSI controllers • 16 SSDs • Dual 10GbE • SuperMicro mobo • PCI Gen2 • 3D Torus • Dual rail QDR • Large Memory vSMP Supernodes • 2TB DRAM • 10 TB Flash “Data Oasis” Lustre PFS 100 GB/sec, 4 PB
CMS Components • CMSSW: Base software components, NFS exported from IO node • OSG worker node client: CA certs, CRLs • Squid proxy: cache calibration data needed for each job, running on IO node • glideinWMS: worker node manager pulls down CMS jobs • BOSCO: GSI-SSH capable batch job submission tool • PhEDEx: data transfer management
Results • Work completed in February to March 2013 • 400 million collision events • 125TB in, ~150 TB out • ~2 million SUs • Good experience regarding OSG-XSEDE compatibility
Thoughts& Conclusions • OSG & XSEDE technologies very similar • GridFTP • GSI authentication • Batch systems, etc. • Staff at both ends speak the same language • Some things would make a repeat easier: • CVMFS (Fuse-based file system for CMS tools) • Common runtime profile for OSG & XSEDE • Common SU and data accounting