110 likes | 244 Vues
Bulk Data Copy Generalization Some DMI/JSDL overlap (this indeed might be out of scope of JSDL). Extensibility options / possibly some new requirements for recursive file/dir copying between multiple sources and sinks ?. In-Scope. Job Submission Description Language (JSDL)
E N D
Bulk Data Copy Generalization Some DMI/JSDL overlap (this indeed might be out of scope of JSDL) Extensibility options / possibly some new requirements for recursive file/dir copying between multiple sources and sinks ?
In-Scope • Job Submission Description Language (JSDL) • An activity description language for generic compute applications. • OGSA Data Movement Interface(DMI) • Low level schema for defining the transfer of bytes between and single source and sink. • JSDL HPC File Staging Profile (HPCFS) • Designed to address file staging not bulk copying. • OGSA Basic Execution Service (BES) • Defines a basic framework for defining and interacting with generic compute activities: JSDL + extensible state and information models. • Others that I am sure that I have missed ! (…ByteIO) • Neither fully captures our requirements (not a criticism, they are designed to address their use-cases which only partially overlap with the requirements for our bulk data copy activity). Other • Condor Stork - based on Condor Class-Ads • Not sure if Globus has/intends a similar definition in its new developments (e.g. SaaS) anyone ? – I believe Ravi was originally supportive of a DMI for data transfers between multiple sources/sinks
Stork – Condor Class Ads Example of a Stork job request: [ dest_url= "gsiftp://eric1.loni.org/scratch/user/"; arguments = ‐p 4 dbg ‐vb"; src_url = "file:///home/user/test/"; dap_type = "transfer"; verify_checksum = true; verify_filesize = true; set_permission = "755" ; recursive_copy = true; network_check = true; checkpoint_transfer = true; output = "user.out"; err = "user.err"; log = "userjob.log"; ] • Purportedly the first batch scheduler for data placement and data movement in a heterogeneous environment . Developed with respect to Condor • Uses Condor’s ClassAd job description language and is designed to understand the semantics and characteristics of data placement tasks • Recent NSF funding to develop as a production service
JSDL Data Staging 1 and the HPC File Staging Profile <jsdl:DataStaging> <jsdl:FileName>fileA</jsdl:FileName> <jsdl:CreationFlag>overwrite</jsdl:CreationFlag> <jsdl:DeleteOnTermination>true</jsdl:DeleteOnTermination> <jsdl:Source> <jsdl:URI>gsiftp://griddata1.dl.ac.uk:2811/myhome/fileA</jsdl:URI> </jsdl:Source> <jsdl:Target> <jsdl:URI>ftp://ngs.oerc.ox.ac.uk:2811/myhome/fileA</jsdl:URI> </jsdl:Target> <Credentials> … </Credentials> </jsdl:DataStaging> • Define both the source and target within the same <DataStaging/> element which is permitted in JSDL. • The HPC File Staging Profile (Wasson et al. 2008), limits the use of credentials to a single credential definition within a data staging element. Different credentials will be required for the source and the target. • Maybe profile use of credentials within JSDL Source and Target ?
<jsdl:DataStaging> <jsdl:FileName>fileA</jsdl:FileName> <jsdl:FilesystemName>MY_SCRATCH_DIR</jsdl:FilesystemName> <jsdl:CreationFlag>overwrite</jsdl:CreationFlag> <jsdl:DeleteOnTermination>true</jsdl:DeleteOnTermination> <jsdl:Source> <jsdl:URI>gsiftp://griddata1.dl.ac.uk:2811/myhome/fileA</jsdl:URI> </jsdl:Source> <Credentials> e.g. MyProxyToken </Credentials> </jsdl:DataStaging> <jsdl:DataStaging> <jsdl:FileName>fileA</jsdl:FileName> <jsdl:FilesystemName>MY_SCRATCH_DIR</jsdl:FilesystemName> <jsdl:CreationFlag>overwrite</jsdl:CreationFlag> <jsdl:Target> <jsdl:URI>ftp://ngs.oerc.ox.ac.uk:2811/myhome/fileA</jsdl:URI> </jsdl:Target> <Credentials> e.g. wsa:Username/password token </Credentials> </jsdl:DataStaging> Staging 2 • Coupled staging elements; A source data staging element for fileA and a corresponding target element for staging out of the same file. By specifying that the input file is deleted after the job has executed, this example simulates the effect of a data copy from one location to another through the staging host. • No multiple data locations (alternative sources and sinks – we think this is kinda useful). • Some more (proprietary?) elements required (e.g. DMI transfer requirements, file selectors, URI connection properties).
OGSA DMI • The OGSA Data Movement Interface (DMI) (Antonioletti et al. 2008) defines a number of elements for describing and interacting with a data transfer activity. • The data source and destination are each described separately with a Data End Point Reference (DEPRs), which is a specialized form of WS-Address element (Box et al. 2004). • In contrast to the JSDL data staging model, a DEPR facilitates the definition of one or more <Data/> elements within a <DataLocations/> element. This is used to define alternative locations for the data source and/or sink. • An implementation can select between its supported protocols and retry different source/sink combinations from the available list (improves resilience and the likelihood of performing a successful copy). • There are some limitations: • DMI is intended to describe only a single data copy operation between one source and one sink. To do several transfers, multiple invocations of a DMI service factory would be required to create multiple DMI service instances. We require a single (atomic) message packet that wraps multiple transfers (e.g. for routing through a message broker). • Some of the existing constructs require extension / slight modification. • Therefore: DMI/JSDL discussion at OGF to canvass some new possible? Extensions. Maybe build on DMI, and/or closer integration with JSDL data staging to describe a bulk copy activity.
<other:BulkDataCopy> • <other:DataCopy id=“transfer1”> + • <dmi:SourceDataEPR> • <wsa:Address>http://www.ogf.org/ogsa/2007/08/addressing/none</wsa:Address> • <wsa:Metadata> • <dmi:DataLocations> • <dmi:DataProtocolUri="http://www.ogf.org/ogsadmi/2006/03/im/protocol/gridftp-v20" • DataUrl="gsiftp://example.org/name/of/the/dir/"> • <dmi:Credentials><other:MyProxyToken/></dmi:Credentials> • <other:stuff/> • </dmi:Data> • <dmi:DataProtocolUri="urn:my-project:srm" • DataUrl="srm://example.org/name/of/the/dir/"> • <dmi:Credentials><wsse:UsernameToken/></dmi:Credentials> • <other:stuff/> • </dmi:Data> • </dmi:DataLocations> • </wsa:Metadata> • </dmi:SourceDataEPR> • <dmi:SinkDataEPR> . . . Sink Details. . . </dmi:SinkDataEPR> • </other:DataCopy> • <dmi:TransferRequirements> <dmi:StartNotBefore/> ? <dmi:EndNoLaterThan/> ? <dmi:StayAliveTime/> ? <dmi:MaxAttempts/> ? </dmi:TransferRequirements> • </other:BulkDataCopy> • Bulk DMI Draft • A pseudo-example • Some overlap with jsdl data staging Source (wsa:EndpointReference type) Sink (wsa:EndpointReference type) DEPR defines alternativelocations for the data source and/or sink and each <Data/> nests its own credentials. Transfer Requirements (needs extending)
Bulk Data Copy and JSDL Integration ? <jsdl:JobDefinition> <jsdl:JobDescription> <jsdl:JobIdentification ... /> <jsdl:Application> <!-- Possibility? a) Embed BulkDataCopy document --> <other:BulkDataCopy ... /> <!-- Possibility? b) Stage BulkDataCopy doc and name copy agent --> <jsdl-hpcpa:HPCProfileApplication> <jsdl-hpcpa:Executable>/usr/bin/datacopyagent.sh<jsdl-hpcpa:Executable> <jsdl-hpcpa:Argument>‘myBulkDataCopyDoc.xml’</jsdl-hpcpa:Argument> . . . </jsdl-hpcpa:HPCProfileApplication> </jsdl:Application> <jsdl:Resources> <jsdl:DataStaging> <jsdl:FileName>myBulkDataCopyDoc.xm</jsdl:FileName> . . . </jsdl:DataStaging> </jsdl:Resources> </jsdl:JobDescription> </jsdl:JobDefinition> Some (sketchy) integration options? Possible? options for integrating the proposed <BulkDataCopy/> document within JSDL; a) nesting within the <jsdl:Application/> element or b) staging-in of a <BulkDataCopy/> document as input for the named executable? (ideas, advice…).
Or New staging requirements ? JSDL intended to be a generic compute activity description language. Rather than use a separate document to describe a bulk data copy activity, is it better to suggest some JSDL extensions to cater for bulk copying ? (ideas, advice…) Potentially a better route for more widespread adoption (e.g. existing BES implementations). Other thoughts: Orchestration of copy activities / DAG ?
Cancelled Running: Transferring Pending Finished Suspend () Request Resume ()Request Failed: Clean Unclean Unknown Running: Suspended BES and DMI sub-state specialisations ? • Profile the OGSA BES state model for DMI sub-state specializations. • Adds optional DMI sub-state specializations. Client/service may only recognize the main BES states if necessary. • Suspend, resume, cancel. • Add DMI fault types? Cancel () BES states DMI based sub-states
Message Model Requirements • Document Message • Bulk Data Copy Activity description • Capture all information required to connect to each source URI and sink URI and subsequently enact the data copy activity. • Transfer requirements, e.g. additional URI Properties, file selectors (reg-expression), scheduling parameters to define a batch-window, retry count, source/sink alternatives, checksums?, sequential ordering? DAG? • Serialized user credential definitions for each source and sink. • Control Messages • Interact with a state/lifecycle model (e.g. stop, resume, cancel) • Event Messages • Standard fault types and status updates • Information Model • To advertise the service capabilities / properties / supported protocols