1 / 25

Dynamic Federations

Dynamic Federations. Seamless aggregation of standard-protocol-based storage endpoints Fabrizio Furano Patrick Fuhrmann Paul Millar Daniel Becker Adrien Devresse Oliver Keeble (Presenter) Ricardo Brito da Rocha Alejandro Alvarez Credits to ShuTing Liao (ASGC). EMI INFSO-RI-261611. 1.

Télécharger la présentation

Dynamic Federations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Federations • Seamless aggregation of standard-protocol-based storage endpoints • Fabrizio Furano • Patrick Fuhrmann • Paul Millar • Daniel Becker • Adrien Devresse • Oliver Keeble (Presenter) • Ricardo Brito da Rocha • Alejandro Alvarez • Credits to ShuTing Liao (ASGC) EMI INFSO-RI-261611 1

  2. Dynamic HTTP Federations • Federation • Simplicity, redundancy, storage/network efficiency, elasticity, performance • HTTP • Standard clients everywhere • One protocol for everything • Single protocol for WAN and LAN • Transparent redirection • Use cases • Easy, direct job/user data access, WAN friendly • Access missing files after job starts • Friend sites can share storage • Diskless sites • Cache integration (future) EMI INFSO-RI-261611 EMI INFSO-RI-261611

  3. Storage federations • What’s the goal? • Make different storage clusters be seen as one • Make global file-based data accessseamless • How should this be done? • Dynamically • easy to setup/maintain • no complex metadata persistency • no DB babysitting (keep it for the experiment’s metadata) • no replica catalogue inconsistencies, by design • Light constraints on participating storage • Using standards • No strange APIs, everything looks familiar • Global access to global data EMI INFSO-RI-261611 EMI INFSO-RI-261611

  4. What is federated? • We federate (meta)data repositories that are ‘compatible’ • HTTP interface • Name space (modulo simple prefixes) • Including catalogues • Permissions (they don’t contradict across sites) • Content (same key or filename means same file [modulo translations]) • Dynamically and transparently discovering metadata • looks like a unique, very fast file metadata system • properly presenting the aggregated metadata views • redirecting clients to the geographically closest endpoint • Local SE is preferred • The system also can load a “Geo” plugin EMI INFSO-RI-261611 EMI INFSO-RI-261611

  5. What is federated? • Technically TODAY we can aggregate: • SEs with DAV/HTTP interfaces • dCache, DPM • Future: Xrootd? EOS? Storm? • Catalogues with DAV/HTTP interfaces • LFC supported • Future: Experiment catalogues could be integrated • Cloud DAV/HTTP/S3 services • Anything else that happens to have an HTTP interface… • Caches • Native LFC and DPM databases EMI INFSO-RI-261611 EMI INFSO-RI-261611

  6. Why HTTP/DAV? • It’s everywhere • A very widely adopted technology • It has the right features • Redirection, WAN friendly • Convergence • Transfers and data access • No other protocols required • We (humans) like browsers, they give an experience of simplicity • Integrated web apps EMI INFSO-RI-261611 EMI INFSO-RI-261611

  7. DPM/HTTP • DPM has invested significantly in HTTP as part of the EMI project • New HTTP/DAV interface • Parallel WAN transfers • 3rd party copy • Solutions for replica fallback • “Global access” and metalink • Performance evaluations • Experiment analyses • Hammercloud • Synthetic tests • Root tests

  8. DPM i/o interface comparison • DPM and Random I/O • Longstanding issue with ATLAS DPM usage • Bad RFIO performance forced ‘download first’ • We’re doing a thorough evaluation of the current alternatives to RFIO • XROOT vs HTTP vs RFIO • NFS to be included when it’s writable • In collaboration with ASGC • Results will make you happy  PRELIMINARY PRELIMINARY PRELIMINARY

  9. DPM i/o interface comparison • DPM and Random I/O • Longstanding issue with ATLAS DPM usage • Bad RFIO performance forced ‘download first’ • We’re doing a thorough evaluation of the current alternatives to RFIO • XROOT vs HTTP vs RFIO • NFS to be included when it’s writable • In collaboration with ASGC • Results will make you happy  PRELIMINARY PRELIMINARY PRELIMINARY

  10. Demo • We have set up a stable demo testbed, using HTTP/DAV • Head node in DESY: http://federation.desy.de/myfed/ • a DPM instance at CERN • a DPM instance at ASGC (Taiwan) • a dCache instance in DESY • a Cloud storage account by Deutsche Telecom • The feeling it gives is surprising • Metadata performance is in avg higher than contacting the endpoints • We see the directories as merged, as it was only one system • There’s one test file in 3 sites, i.e. 3 replicas. • /myfed/atlas/fabrizio/hand-shake.JPG • Clients in EU get the one from DESY/DT/CERN • Clients in Asia get the one from ASGC EMI INFSO-RI-261611 EMI INFSO-RI-261611

  11. /dir1/file1 /dir1/file2 /dir1/file2 /dir1/file3 The basic idea Aggregation Storage/MD endpoint 1 Storage/MD endpoint 2 /dir1 /dir1/file1 /dir1/file2 /dir1/file3 We see this All the metadata interactions are hidden NO persistency needed here, just efficiency and parallelism With 2 replicas EMI INFSO-RI-261611 EMI INFSO-RI-261611

  12. Example Frontend (Apache2+DMLite) LFC LFC or DB Aggregator (UGR) SE SE SE SE Plugin DMLite Plugin DAV/HTTP Plugin HTTP SE SE SE SE Client EMI INFSO-RI-261611 EMI INFSO-RI-261611 Plain DAV/HTTP Plain DAV/HTTP 08 May 2012

  13. Server architecture • Clients come and are distributed through: • different machines (DNS alias) • different processes (Apache config) Clients are served by the UGR. They can browse/stat or be redirected for action. The architecture is multi/manycore friendly and uses a fast parallel caching scheme EMI INFSO-RI-261611 EMI INFSO-RI-261611

  14. Name translation • A sophisticated scheme of name translation is a key to be able to federate almost any source of metadata • UGR implements algorithmic translations and can accommodate non algorithmic ones as well • A plugin could also query an external service (e.g. an LFC or a private DB) EMI INFSO-RI-261611 EMI INFSO-RI-261611

  15. Design and performance • Full parallelism • Composes on the fly the aggregated metadata views by managing parallel tasks of information location • Never stacks up latencies! • The endpoints are treated in a completely independent way • No limit to the number of outstanding clients/tasks • No global locks/serialisations! • Thread pools, prod/consumer queues used extensively (e.g. to stat N items in M endpoints while X clients wait for some items) • Aggressive metadata caching • The metadata caching keeps the performance high • Peak raw cache performance is ~500K->1M hits/s per core • A relaxed, hash-based, in-memory partial name space • Juggles info in order to always contain what’s needed • Keep them in an LRU fashion and we have a fast 1st level namespace cache • Stalls clients the minimum time that is necessary to juggle their information bits EMI INFSO-RI-261611 EMI INFSO-RI-261611

  16. Design and performance • Horizontally scalable deployment • Multithreaded • DNS balanceable • High performance DAV client implementation • Wraps DAV calls into a POSIX-like API, saves from the difficulty of composing requests/responses • Performance is privileged: uses libneon w/ sessions caching • Compound list/stat operations are supported • Loaded by the core as a “location” plugin EMI INFSO-RI-261611 EMI INFSO-RI-261611

  17. A performance test • Two endpoints: DESY and CERN (poor VM) • One UGR at DESY • 10K files in a 4-levels deep directory • Files exist on both endpoints • The test (written in C++) invokes Stat only once per file, using many parallel clients doing stat() at the maximum pace from 3 machines EMI INFSO-RI-261611 EMI INFSO-RI-261611

  18. The result, WAN access EMI INFSO-RI-261611 EMI INFSO-RI-261611

  19. Get started • Get it here: https://svnweb.cern.ch/trac/lcgdm/wiki/Dynafeds • What you can do with it: • Easy, direct job/user data access, WAN friendly • Access missing files after job starts • Friend sites can share storage • Diskless sites • Federating catalogues • Combining catalogue-based and catalogue-free data

  20. Dynamic Feds cf XROOTD feds • XROOTD federations are focused on the “redirection” concept • Very light at the meta-manager, just redirect clients away as soon as possible • If not possible, the penalty is 5 seconds per jump • Global listing is implemented in the client, slowish, hiccup-prone • Some details do not match yet very well with quick geography-aware redirections • Dynamic Federations support both the “redirection” concept and the “browsing” concept by design • Much more centred on the meta-manager • We can’t touch the clients • Cache metadata for the clients, in-memory • Designed for scalability, performance and features • Extendable plugin architecture, geography-aware redirection • Can speak any protocol, our focus in on http-based things EMI INFSO-RI-261611 EMI INFSO-RI-261611

  21. Next steps • Release our beta, as the nightlies are good • More massive tests, with many endpoints, possibly distant • We are now looking for partners • Precise performance measurements • Refine the handling of the ‘death’ of the endpoints • Immediate sensing of changes in the endpoints’ content, e.g. add, delete • SEMsg in EMI2 SYNCAT would be the right thing in the right place • Some more practical experience (getting used to the idea, using SQUIDs, CVMFS, EOS, clouds,... <put your item here> ) EMI INFSO-RI-261611 EMI INFSO-RI-261611

  22. HTTP for xrootd HTTP or XROOTD? HTTP and XROOTD? • An XROOTD federation gives the goodies/hooks of the XROOTD framework • This involves also many other components and groups of people • Monitoring of the FAX is a perfect example • IT-GT (CERN) will produce an HTTP plugin for XROOTD • Double-headed data access • Discussions started • Effort will be scoped early Oct • Will involve xrootd framework enhancements too • Federate the same clusters also via HTTP • Pure HTTP/DAV endpoints can join normally • Let users enjoy EMI INFSO-RI-261611 EMI INFSO-RI-261611

  23. References • Wiki page and packages • https://svnweb.cern.ch/trac/lcgdm/wiki/Dynafeds • CHEP papers • Federation • http://cdsweb.cern.ch/record/1460525?ln=en • DPM & dmlite • https://cdsweb.cern.ch/record/1458022?ln=en • HTTP/dav • https://cdsweb.cern.ch/record/1457962?ln=en

  24. Conclusions • Dynamic Federations: an efficient, persistency-free, easily manageable approach to federate remote storage endpoints • HTTP, standard, WAN and cloud friendly • Interoperating with and augmenting the xrootd ones is desirable and productive • Work in progress, status is very advanced, demoable, installable, documented. EMI INFSO-RI-261611 EMI INFSO-RI-261611

  25. Thank you • Questions? EMI INFSO-RI-261611 EMI INFSO-RI-261611 Partially funded by EMI is partially funded by the European Commission under Grant Agreement INFSO-RI-261611

More Related