270 likes | 386 Vues
STORK is a pivotal tool for managing Data Placement (DaP) activities in heterogeneous grid environments. It transforms data management from a secondary role into a prioritized task, akin to computational job scheduling. By queuing, scheduling, monitoring, and managing DaP jobs, STORK ensures efficient and reliable data transfers. Our case study on transferring 3TB of DPOSS data demonstrates STORK's effective integration with systems like SRB and UniTree, showcasing automatic failure recovery and modular capabilities. Future improvements include intelligent scheduling and enhanced data management methodologies.
E N D
STORK: A Scheduler for Data Placement Activitiesin Grid Tevfik Kosar University of Wisconsin-Madison kosart@cs.wisc.edu
Some Remarkable Numbers Characteristics of four physics experiments targeted by GriPhyN: Source: GriPhyN Proposal, 2000
Even More Remarkable… “ ..the data volume of CMS is expected to subsequently increase rapidly, so that the accumulated data volume will reach 1 Exabyte (1 million Terabytes) by around 2015.” Source: PPDG Deliverables to CMS
Other Data Intensive Applications • Genomic information processing applications • Biomedical Informatics Research Network (BIRN) applications • Cosmology applications (MADCAP) • Methods for modeling large molecular systems • Coupled climate modeling applications • Real-time observatories, applications, and data-management (ROADNet)
Need to Deal with Data Placement • Data need to be moved, staged, replicated, cached, removed; storage space for data should be allocated, de-allocated. • We call all of these data related activities in the Grid as Data Placement (DaP) activities.
State of the Art • Data placement activities in the Grid are performed either manually or by simple scripts. • Data placement activities are simply regarded as “second class citizens” of the computation dominated Grid world.
Our Goal • Our goal is to make data placement activities “first class citizens” in the Grid just like the computational jobs! • They need to be queued, scheduled, monitored and managed, and even checkpointed.
Outline • Introduction • Grid Challenges • Stork Solutions • Case Study: SRB-UniTree Data Pipeline • Conclusions & Future Work
Grid Challenges • Heterogeneous Resources • Limited Resources • Network/Server/Software Failures • Different Job Requirements • Scheduling of Data & CPU together
Stork • Intelligently & reliably schedules, runs, monitors, and manages Data Placement (DaP) jobs in a heterogeneous Grid environment & ensures that they complete. • What Condor means for computational jobs, Stork means the same for DaP jobs. • Just submit a bunch of DaP jobs and then relax..
Stork Solutions to Grid Challenges • Specialized in Data Management • Modularity & Extendibility • Failure Recovery • Global & Job Level Policies • Interaction with Higher Level Planners/Schedulers
Already Supported URLs • file:/ -> Local File • ftp:// -> FTP • gsiftp:// -> GridFTP • nest:// -> NeST (chirp) protocol • srb:// -> SRB (Storage Resource Broker) • srm:// -> SRM (Storage Resource Manager) • unitree:// -> UniTree server • diskrouter:// -> UW DiskRouter
SRM SRB NeST Higher Level Planners DAGMan Condor-G (compute) Stork (DaP) Gate Keeper StartD RFT GridFTP
Interaction with DAGMan Condor Job Queue A Job A A.submit DaP X X.submit Job C C.submit Parent A child C, X Parent X child B ….. DAGMan A Stork Job Queue X X C B Y D
Sample Stork submit file [ Type = “Transfer”; Src_Url = “srb://ghidorac.sdsc.edu/kosart.condor/x.dat”; Dest_Url = “nest://turkey.cs.wisc.edu/kosart/x.dat”; …… …… Max_Retry = 10; Restart_in = “2 hours”; ]
Case Study: SRB-UniTree Data Pipeline • We have transferred ~3 TB of DPOSS data (2611 x 1.1 GB files) from SRB to UniTree using 3 different pipeline configurations. • The pipelines are built using Condor and Stork scheduling technologies. The whole process is managed by DAGMan.
Submit Site SRB Server 1 UniTree Server SRB get UniTree put NCSA Cache
Submit Site SRB Server 2 UniTree Server SRB get GridFTP UniTree put SDSC Cache NCSA Cache
Submit Site SRB Server 3 UniTree Server SRB get DiskRouter UniTree put SDSC Cache NCSA Cache
Outcomes of the Study 1. Stork interacted easily and successfully with different underlying systems: SRB, UniTree, GridFTP and Diskrouter.
Outcomes of the Study (2) 2. We had the chance to compare different pipeline topologies and configurations:
Outcomes of the Study (3) 3. Almost all possible network, server, and software failures were recovered automatically.
Failure Recovery Diskrouter reconfigured and restarted UniTree not responding SDSC cache reboot & UW CS Network outage SRB server maintenance
For more information on the results of this study, please check: http://www.cs.wisc.edu/condor/stork/
Conclusions • Stork makes data placement a “first class citizen”. • Stork is the Condor of data placement world. • Stork is fault tolerant, easy to use, modular, extendible, and very flexible.
Future Work • More intelligent scheduling • Data level management instead of file level management • Checkpointing for transfers • Security
You don’t have to FedEx your data anymore.. Stork delivers it for you! • For more information • Drop by my office anytime • Room: 3361, Computer Science & Stats. Bldg. • Email to: • kosart@cs.wisc.edu