1 / 18

CMS : T1 Disk/Tape separation

CMS : T1 Disk/Tape separation. Nicol ò Magini , CERN IT/SDC Oliver Gutsche , FNAL November 11 th 2013. Outline. Motivation: gains in operations Impact on data federation Progress and technical issues Changes in operations and procedures. Introduction.

ismail
Télécharger la présentation

CMS : T1 Disk/Tape separation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMS: T1 Disk/Tape separation NicolòMagini, CERN IT/SDC Oliver Gutsche, FNAL November 11th 2013

  2. Outline Motivation: gains in operations Impact on data federation Progress and technical issues Changes in operations and procedures WLCG Workshop: Disk/Tape separation

  3. Introduction • CMS asked the Tier-1 sites to change their storage setup to gain more flexibility and control of the available disk and tape resources • Old setup: • One MSS system controlling both disk and tape • Automatic migration of new files to tape • Disk pool automatically purges unpopular files to make room for more popular files • Automatic recall of files from tape when accessing files without disk copy • Several disadvantages: • Pre-staging needed for organized processing, not 100% efficient because system was still allowed to automatically purge files if needed • User analysis was not allowed at Tier-1 sites to protect the tape drives from chaotic user access patterns WLCG Workshop: Disk/Tape separation

  4. Disk/Tape separation • CMS asked the Tier-1 sites to separate disk and tape and base the management of both on PhEDEx • Sites were asked to deploy two independent [*] PhEDEx endpoints • “Large” [**] persistent disk • Tape archive with “small” [**] disk buffer • All file access will be restricted to the disk endpoint • All processing will write only on the disk endpoint • [*] Can write/delete a file on disk-only, or on tape-only, or on both simultaneously • [**] “small” ~ 10% of “large”, but can be sized according to expected rates to tape WLCG Workshop: Disk/Tape separation

  5. Motivation Increase flexibility for Tier-1 processing Enable user analysis at Tier-1s Enable remote access of Tier-1 data WLCG Workshop: Disk/Tape separation

  6. Processing at Tier-1s: Location independence • Use case: • Organized processing needs to access input samples stored custodially on tape at one of the Tier-1 sites • Old model: • Jobs needed to run close to tape endpoint hosting input and output data (custodial location) • New model: • Jobs can run against any disk endpoint, not necessarily close to tape endpoint hosting input or output data • Benefit of new model: • Custodial distribution optimizes tape space utilization taking into account processing capacities of the Tier-1 sites • Not all data is being accessed at the same time causing uneven processing resource utilization • Location independence enables to use both tape and processing resources efficiently at the same time WLCG Workshop: Disk/Tape separation

  7. Processing at Tier-1s: Pre-staging and Pinning • Use case: • Staging and pinning input files to local disk for organized processing is required to optimize CPU efficiency • Input files need to be released from disk when processing is done • Old model: • Pre-staging via SRM or Savannah tickets was used to convince the MSS to have input files available on disk • Release of input relied on automatic purge within MSS • New model: • CMS will centrally subscribe and therefore pre-stage input files to have them available on disk before jobs start • CMS will permanently keep input files on disk for regular activities • Benefit of new mode: • CMS is in control of what is on disk at the Tier-1 sites and can optimize disk utilization (CMS will have to actively manage the disk space through PhEDEx) WLCG Workshop: Disk/Tape separation

  8. Processing at Tier-1s: Output from central processing • Use case: • Central processing produces output which needs to be archived on tape • Old model: • Output of individual workflows could only be produced at one site, the site of the custodial location • New model: • Output can be produced at one or more disk endpoints, then migrated to tape only at single final custodial location • Benefit of new model: • CMS can optimize processing resource utilization • Tier-1s with no free tape are no longer idle • CMS can validate data before final tape migration, reducing unnecessary tape usage WLCG Workshop: Disk/Tape separation

  9. Impact on data federation • CMS would like to benefit from a fully deployed CMS data federation • Tier-1s need to publish files on the disk endpoints in the Xrootd federation • Eventually, all popular data will be accessible through the federation • Benefits: • Further optimize processing resource utilization by processing input files without the need to relocate samples through PhEDEx • Enables processing not only on remote Tier-1 sites through the LHCOPN but also at Tier-2 sites WLCG Workshop: Disk/Tape separation

  10. Technical implementation • Sites and storage providers free to choose implementation • Two possibilities identified in practice: • Two independent storage endpoints • CERN, FNAL • Single storage endpoint with two different trees in the namespace • RAL, KIT, CNAF, CCIN2P3, PIC WLCG Workshop: Disk/Tape separation

  11. Internal transfers • Currently using standard tools for disktape buffer transfers at all sites • e.g. FTS, xrdcp • No bottleneck seen so far • If needed, internal optimizations are possible with a single endpoint • e.g. on a single dCache endpoint, internal data flow can be delegated to the pools WLCG Workshop: Disk/Tape separation

  12. Site concerns • Main site concern has been duplication of space used between disk and tape buffer • Should not be a big effect given the “small” size of the buffer in front of tape • For dCache, a solution is planned: • “flush-on-demand” command creating a hard link in tape namespace instead of copy • development schedule will depend on need, for now gather experience with current version WLCG Workshop: Disk/Tape separation

  13. Current status • DONE • RAL, CNAF • KIT (in commissioning last week) • ~ DONE • CERN (except for Tier-0 streamers and user) • IN PROGRESS • PIC, CCIN2P3, FNAL WLCG Workshop: Disk/Tape separation

  14. Issues • At sites • No blocking technical issues • Not stress-tested yet: challenge in 2014? • In CMS software • Minor update needed in PhEDEx to handle disktape moves • Need to settle data location for job matching • PhEDEx node vs. SE… • CMS internal, in progress WLCG Workshop: Disk/Tape separation

  15. Changes in operations and procedures • The Tier-1 disk endpoint is a central space • CMS will manage subscriptions and deletions on disk • Tape endpoint subscriptions are subject to approval by Tier-1 data managers (functions that are held by site-local colleagues) • CMS would like to auto-approve disk subscription and deletion requests to be able to reduce latencies WLCG Workshop: Disk/Tape separation

  16. Changes in operations and procedures • Tape families: • Together with the Tier-1 sites, CMS optimized placement of files on tape for reading by requesting tape families • In the old model, tape family requests needed to be made before processing started, could lead to complications if forgotten • New model allows processing on disk endpoints without the need for tape families • A PhEDEx subscription archives the output to tape: needs to be approved by the site-local data manager • Tape family requests by CMS are not needed anymore, Sites can create tape families before approving archival PhEDEx subscriptions • CMS is happy and available for the sites to optimize rules for tape family creation • CMS would like to evolve the tape family procedure from requesting individual families to a dialogue with the sites defining tape family setups and rules WLCG Workshop: Disk/Tape separation

  17. Changes in site readiness • Site readiness metrics for Tier-1s will evolve taking into account separated disk and tape PhEDEx endpoints • SAM tests only on CEs close to disk • SAM tests for SRM both on disk and on tape endpoints • More links to monitor: • diskWAN • tapeWAN • disktape WLCG Workshop: Disk/Tape separation

  18. Conclusions • Hosting Tier-1 data on disk will increase flexibility in all computing workflows • Technical solutions identified for all sites • Deployment in progress with no blocking issues, expecting completion at all sites by beginning of 2014 • For more details: • https://twiki.cern.ch/twiki/bin/view/CMSPublic/CompProjDiskTape • https://indico.cern.ch/conferenceDisplay.py?confId=249032 WLCG Workshop: Disk/Tape separation

More Related