90 likes | 233 Vues
The PHENIX collaboration spans over 400 collaborators across three continents, including Brazil, handling hundreds of terabytes of complex data annually. To optimize data flow and management, the team focuses on automating data exports, managing replicas, and analyzing simulated data across multiple sites. Current efforts involve integrating diverse job management systems, enhancing remote data transfer to international sites, and creating a centralized analysis framework that allows for efficient processing and retrieval of data. The goal is to streamline operations and facilitate effective data science in particle physics.
E N D
PHENIX and the data grid >400 collaborators Active on 3 continents + Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals
Grid use that would help PHENIX • Data management • Replica management to/from remote sites • Management of simulated data • Replica management within RCF • Job management • Simulated events generation and analysis • Centralized analysis of summary data at remote sites
Replica management: export to remote sites • Export of PHENIX data • Send data by network or FedEx net to Japan, France (IN2P3) and US collaborator sites • Network to Japan via APAN using bbftp (right?) • Network to France using bbtfp (right?) • Network within US using bbftp and globus-url-copy • Currently transfers initiated & logged by hand • Much/most transfers use disks as buffer • Goals • Automate data export and logging into replica catalog • Allow transfer of data from most convenient site, rather than only the central repository at RCF
Simulated data management • Simulations are performed at • CC-J(RIKEN/Wako),Vanderbilt, UNM, LLNL,USB • Will add other sites, including IN2P3 for run3 • Simulated hits data were imported to RCF • For detector response, reconstruction, analysis • Simulation projects managed by C. Maguire • actual simulation jobs run by expert at each site • Data transfers initiated by scripts or by hand • Goals • Automate importation and archive of simulated data • Ideally by merging with centralized job submission utility • Export PHENIX software effectively to allow remote site detector response and reconstruction
Replica management within RCF • VERY important short term goal! • PHENIX tools have been developed • Replica catalog, including DAQ/production/QA info • lightweight POSTGRES version as well as Objy • logical/physical filename translator • Goals • Use and optimize existing tools at RCF • Investigate merging with Globus middleware • relation to GDMP? • different from Magda – carry more file info (?) • Integrate into job management/submission • Can we collect statistics for optimization and scheduling?
Job management • Currently use scripts and batch queues at each site • Have two kinds of jobs we should manage better • Simulations • User analysis jobs
Requirements for simulation jobs • Job specifications • Conditions & particle types to simulate • Number of events • May need embedding into real events (multiplicity effects) • I/O requirements • I=database access for run # ranges, detector geometry • O= the big requirement • send files to RCF for further processing • eventually can reduce to DST volume for RCF import • Job sequence requirements • Initially rather small, only interaction is random # seed • Eventually: hits generation -> response -> reconstruction • Site selection criteria • CPU cycles! Also buffer disk space & access for expert