160 likes | 291 Vues
SRM-Lite: overcoming the firewall barrier for data movement. Arie Shoshani Alex Sim Viji Natarajan Lawrence Berkeley National Laboratory. SDM Center All-Hands Meeting November, 2007. Outline. What are Resource Storage Managers (SRM) Requirement of using SRM behind firewalls
E N D
SRM-Lite:overcoming the firewall barrier for data movement Arie Shoshani Alex Sim Viji Natarajan Lawrence Berkeley National Laboratory SDM Center All-Hands Meeting November, 2007
Outline • What are Resource Storage Managers (SRM) • Requirement of using SRM behind firewalls • Satisfying the Requirements • Architecture • Potential uses
Storage Resource Managers SRMs are middleware components whose function is to provide: dynamic space allocation AND file management in spaces for storage components on the local or wide-area network Based on a common standard SRM (DPM) SRM (StoRM) SRM/ dCache SRM/ CASTOR SRM (StoRM) dCache CASTOR client/user applications SRM (BeStMan) GPFS Unix-based Disk Pools Unix-based Disk Pools CCLRC RAL Examples of storage systems currently supported by SRMs
Storage Resource Managers:Main concepts • Non-interference with local policies • Advance space reservations • Dynamic space management • Pinning file in spaces • Support abstract concept of a file name: Site URL (SURL) • Temporary assignment of file names for transfer: Transfer URL (TURL) • Directory Management and ACLs • Multi-file requests (srmRquestToPut, srmRequestToGet, srmCopy) • Transfer protocol negotiation • Peer to peer request support • Support for asynchronous multi-file requests • Support abort, suspend, and resume operations • SRM relies on other services for data movement (GridFTP, HTTPS, SCP, …)
Concepts: Site URL and Transfer URL • Provide: Site URL (SURL) • URL known externally – e.g. in Replica Catalogs • e.g. srm://ibm.cnaf.infn.it:8444/dteam/test.10193 • Get back: transfer URL (TURL) • Path can be different than SURL – SRM internal mapping • Protocol chosen by SRM based on request protocol preference • e.g. gsiftp://ibm139.cnaf.infn.it:2811//gpfs/dteam/test.10193 • One SURL can have many TURL • Files can be replicated in multiple storage components • Files may be in near-line and/or on-line storage • In light-weight SRM (a single file system on disk) • SURL can be the same as TURL except protocol • File sharing is possible • Same physical file, but many requests • Needs to be managed by SRM
Earth Science Grid Analysis Environment(in production for 4 years) >5000 users 160 TBs managed LBNL HPSS High Performance Storage System disk ANL CAS Community Authorization Services NCAR HRM Storage Resource Management gridFTP Striped server gridFTP server openDAPg server Tomcat servlet engine MyProxy server LLNL disk MCS client MyProxy client CAS client DRM Storage Resource Management RLS client DRM Storage Resource Management gridFTP server GRAM gatekeeper ORNL gridFTP server gridFTP HRM Storage Resource Management ISI gridFTP gridFTP server HRM Storage Resource Management MCS Metadata Cataloguing Services SOAP HPSS High Performance Storage System RLS Replica Location Services RMI MSS Mass Storage System disk disk SRMs are used and inter-communicate in several sites SRMs
Disk Cache Disk Cache Robust Data Movement provided by SRMs and DataMover • Problem: move thousands of files robustly • Takes many hours • Need error recovery • Mass storage systems failures • Network failures • Solution: Use Storage Resource Managers (SRMs) • File streaming paradigm • By reserving and releasing storage space automatically • Problem: too slow • Solution: • in GridFTP • Use parallel streams • Use large FTP windows • Pre-stage files from MSS • Use concurrent transfers Anywhere DataMover Get list of files SRM-COPY (thousands of files) NCAR LBNL SRM-GET (one file at a time) SRM (performs writes) SRM (performs reads) GridFTP GET (pull mode) MSS Network transfer archive files stage files Example setup for Earth System Grid (ESG)
File tracking shows recovery from transient failures Total: 45 GBs
Requirements for SRM-Lite • Run SRM behind a firewall • Cannot have third party transfers (source/target is local) • May not be able to run GridFTP • Remote site may not support it • Some communities choose not to use GSI • Need support for multi-file transfer • Or entire directory • Need support for asynchronous request • Also support for intermediate status of request • Need to support concurrent file transfers
Satisfying the Requirements: SRM-Lite • Run SRM behind a firewall • Must have a client tool (SRM-Lite) • May not be able to run GridFTP • Support high-performance SCP: Use HPN-SSS from Pittsburgh supercomputing Center • But, also use other transfer protocols (GridFTP, bbcp, https, …) • Need support for multi-file transfer • Manage queues for large requests • Need support for asynchronous request • SRM-Lite returns a “request token”; token can be used for “request status” • Need to support concurrent file transfers • Use multi-threading to manage concurrent transfers • Monitor transfers and recover from mid-transfer interruptions
Process Steps Login to ORNL using OTP At ORNL invoke SRM-Lite User composes XML input file, srmlite.xml for selectedfiles/directories to copy from/to another site Or, user gives command lineoption for a selected file/directory SRM-Lite uses srmlite.xml orcommand line inputto automatically Push/Pull files to/from NERSC Use multiple threads for concurrent transfers Scenario A: firewall at one site OTP Login ORNL NERSC SRM- Lite SSH Channel (SCP) SSH Server Local Commands And Protocols GridFTP/FTP/ BBCP/HTTP transfers srmlite.xml Disk Cache Disk Cache Put example: Source: file:////my_directory/file_foo Target: scp://host/target_dir/file_foo Get example: Source: GridFTP://host/target_dir/file_foo Target: file:////my_directory/file_foo
Scenario B: one end has a firewall, The other end has SRM OTP Login ORNL NERSC SRM- Lite SRM Request SRM srmlite.txt GridFTP/FTP/ SCP transfers Disk Cache Disk Cache HPSS Put example: Source: file:////my_directory/file_foo Target: srm://host/target_dir/file_foo
Process Steps Login to Site1 using OTP At site1 invoke SRM-Lite SRM-Lite at site1 uses SSH to invoke SRM-Lite at site2 Use SSH channel for SCP Same as before: User composes XML input file, srmlite.xml for selected files/directories to copy from/to another site Or, user gives command line option for a selected file/directory Scenario C: firewalls at both ends SRM- Lite SRM- Lite OTP Login site2 site1 SSH Channel (SCP) SSH Server srmlite.xml Disk Cache Disk Cache
Scenario C: SRM-Lite manages MSS access SRM- Lite SRM- Lite OTP Login site2 site1 SSH Channel (SCP) SSH Server srmlite.xml Disk Cache Disk Cache HPSS HPSS
GUI for SRM-Lite • Called DataMover-Lite • Versions exist for Linux, PC, Mac • Used in ESG • Special version for data movement to user workstations
Usage • Combustion project • The Applied Partial Differential Equations Center (APDEC) • John Bell • Efficient, robust data movement from sites behind firewalls • At DoE and DoD sites • Kepler-SRM-Lite actor • To be used for managing multi-file transfers from sites behind firewalls • Launch SRM-Lite remotely through SSH • Initial version – help from NCSU: Pierre Mouallem • Two modes • Entire request • Streaming file requests • To be used in CPES workflows first with Norbert’s help