40 likes | 158 Vues
The STAR project currently operates across three grid sites: BNL, LBNL, and São Paulo. Presently, it runs Monte Carlo simulation grid jobs, but the file transfer process using globus-url-copy is not scalable and couples worker nodes to file transfers. To address this, we propose a scalable method utilizing a Data Resource Manager (DRM) to separate file transfers from job execution. The two-step transfer process caches outputs locally, then transfers to final destinations via srm-copy, enhancing efficiency while maintaining compute resource availability.
E N D
STAR SRM use-case Eric Hjort, LBNL TG-storage/SRM meeting September 13-15, 2005
STAR SRM use-case • At present STAR has 3 grid sites: BNL, LBNL, Sau Paulo • Grid jobs are simulation (Monte Carlo) for now • Output files are transferred to final destination with globus-url-copy: • Not scalable • Couples worker node to file transfer • Use a DRM to manage output files from grid jobs • Scalable • Decouples worker node from file transfer • Method: Two-step transfer • Put output into local (same site as job execution) DRM cache • Transfer output to final destination
Use-case steps • Grid submission at Site A (SUMs/condor-g) • Local batch submission at Site B (LSF/SGE) • Job execution: output files go to local scratch disk ($TMP_WN) on Site B worker node • Call srm-put from worker node: • moves output files to DRM cache on Site B gatekeeper • assigns LFNs • Call srm-copy from worker node: • moves output from Site B DRM cache back to Site A • referenced by LFN • Srm-copy client contacts Site A gatekeeper from Site B worker node • Requires outgoing connections be allowed on Site B worker node • Job completes without waiting for srm-copy callback • Compute resources released
Present Status • New SRM version (1.2.8) released 7/05 • LFN included • No callback option • Initial testing/debugging completed • Pre-production testing • Put into production: • Incorporate into SUMS • Call RRS to catalog files