250 likes | 257 Vues
Dynamic Federations. Seamless aggregation of standard-protocol-based storage endpoints Fabrizio Furano Patrick Fuhrmann Paul Millar Daniel Becker Adrien Devresse Oliver Keeble (Presenter) Ricardo Brito da Rocha Alejandro Alvarez Credits to ShuTing Liao (ASGC). EMI INFSO-RI-261611. 1.
E N D
Dynamic Federations • Seamless aggregation of standard-protocol-based storage endpoints • Fabrizio Furano • Patrick Fuhrmann • Paul Millar • Daniel Becker • Adrien Devresse • Oliver Keeble (Presenter) • Ricardo Brito da Rocha • Alejandro Alvarez • Credits to ShuTing Liao (ASGC) EMI INFSO-RI-261611 1
Dynamic HTTP Federations • Federation • Simplicity, redundancy, storage/network efficiency, elasticity, performance • HTTP • Standard clients everywhere • One protocol for everything • Single protocol for WAN and LAN • Transparent redirection • Use cases • Easy, direct job/user data access, WAN friendly • Access missing files after job starts • Friend sites can share storage • Diskless sites • Cache integration (future) EMI INFSO-RI-261611 EMI INFSO-RI-261611
Storage federations • What’s the goal? • Make different storage clusters be seen as one • Make global file-based data accessseamless • How should this be done? • Dynamically • easy to setup/maintain • no complex metadata persistency • no DB babysitting (keep it for the experiment’s metadata) • no replica catalogue inconsistencies, by design • Light constraints on participating storage • Using standards • No strange APIs, everything looks familiar • Global access to global data EMI INFSO-RI-261611 EMI INFSO-RI-261611
What is federated? • We federate (meta)data repositories that are ‘compatible’ • HTTP interface • Name space (modulo simple prefixes) • Including catalogues • Permissions (they don’t contradict across sites) • Content (same key or filename means same file [modulo translations]) • Dynamically and transparently discovering metadata • looks like a unique, very fast file metadata system • properly presenting the aggregated metadata views • redirecting clients to the geographically closest endpoint • Local SE is preferred • The system also can load a “Geo” plugin EMI INFSO-RI-261611 EMI INFSO-RI-261611
What is federated? • Technically TODAY we can aggregate: • SEs with DAV/HTTP interfaces • dCache, DPM • Future: Xrootd? EOS? Storm? • Catalogues with DAV/HTTP interfaces • LFC supported • Future: Experiment catalogues could be integrated • Cloud DAV/HTTP/S3 services • Anything else that happens to have an HTTP interface… • Caches • Native LFC and DPM databases EMI INFSO-RI-261611 EMI INFSO-RI-261611
Why HTTP/DAV? • It’s everywhere • A very widely adopted technology • It has the right features • Redirection, WAN friendly • Convergence • Transfers and data access • No other protocols required • We (humans) like browsers, they give an experience of simplicity • Integrated web apps EMI INFSO-RI-261611 EMI INFSO-RI-261611
DPM/HTTP • DPM has invested significantly in HTTP as part of the EMI project • New HTTP/DAV interface • Parallel WAN transfers • 3rd party copy • Solutions for replica fallback • “Global access” and metalink • Performance evaluations • Experiment analyses • Hammercloud • Synthetic tests • Root tests
DPM i/o interface comparison • DPM and Random I/O • Longstanding issue with ATLAS DPM usage • Bad RFIO performance forced ‘download first’ • We’re doing a thorough evaluation of the current alternatives to RFIO • XROOT vs HTTP vs RFIO • NFS to be included when it’s writable • In collaboration with ASGC • Results will make you happy PRELIMINARY PRELIMINARY PRELIMINARY
DPM i/o interface comparison • DPM and Random I/O • Longstanding issue with ATLAS DPM usage • Bad RFIO performance forced ‘download first’ • We’re doing a thorough evaluation of the current alternatives to RFIO • XROOT vs HTTP vs RFIO • NFS to be included when it’s writable • In collaboration with ASGC • Results will make you happy PRELIMINARY PRELIMINARY PRELIMINARY
Demo • We have set up a stable demo testbed, using HTTP/DAV • Head node in DESY: http://federation.desy.de/myfed/ • a DPM instance at CERN • a DPM instance at ASGC (Taiwan) • a dCache instance in DESY • a Cloud storage account by Deutsche Telecom • The feeling it gives is surprising • Metadata performance is in avg higher than contacting the endpoints • We see the directories as merged, as it was only one system • There’s one test file in 3 sites, i.e. 3 replicas. • /myfed/atlas/fabrizio/hand-shake.JPG • Clients in EU get the one from DESY/DT/CERN • Clients in Asia get the one from ASGC EMI INFSO-RI-261611 EMI INFSO-RI-261611
/dir1/file1 /dir1/file2 /dir1/file2 /dir1/file3 The basic idea Aggregation Storage/MD endpoint 1 Storage/MD endpoint 2 /dir1 /dir1/file1 /dir1/file2 /dir1/file3 We see this All the metadata interactions are hidden NO persistency needed here, just efficiency and parallelism With 2 replicas EMI INFSO-RI-261611 EMI INFSO-RI-261611
Example Frontend (Apache2+DMLite) LFC LFC or DB Aggregator (UGR) SE SE SE SE Plugin DMLite Plugin DAV/HTTP Plugin HTTP SE SE SE SE Client EMI INFSO-RI-261611 EMI INFSO-RI-261611 Plain DAV/HTTP Plain DAV/HTTP 08 May 2012
Server architecture • Clients come and are distributed through: • different machines (DNS alias) • different processes (Apache config) Clients are served by the UGR. They can browse/stat or be redirected for action. The architecture is multi/manycore friendly and uses a fast parallel caching scheme EMI INFSO-RI-261611 EMI INFSO-RI-261611
Name translation • A sophisticated scheme of name translation is a key to be able to federate almost any source of metadata • UGR implements algorithmic translations and can accommodate non algorithmic ones as well • A plugin could also query an external service (e.g. an LFC or a private DB) EMI INFSO-RI-261611 EMI INFSO-RI-261611
Design and performance • Full parallelism • Composes on the fly the aggregated metadata views by managing parallel tasks of information location • Never stacks up latencies! • The endpoints are treated in a completely independent way • No limit to the number of outstanding clients/tasks • No global locks/serialisations! • Thread pools, prod/consumer queues used extensively (e.g. to stat N items in M endpoints while X clients wait for some items) • Aggressive metadata caching • The metadata caching keeps the performance high • Peak raw cache performance is ~500K->1M hits/s per core • A relaxed, hash-based, in-memory partial name space • Juggles info in order to always contain what’s needed • Keep them in an LRU fashion and we have a fast 1st level namespace cache • Stalls clients the minimum time that is necessary to juggle their information bits EMI INFSO-RI-261611 EMI INFSO-RI-261611
Design and performance • Horizontally scalable deployment • Multithreaded • DNS balanceable • High performance DAV client implementation • Wraps DAV calls into a POSIX-like API, saves from the difficulty of composing requests/responses • Performance is privileged: uses libneon w/ sessions caching • Compound list/stat operations are supported • Loaded by the core as a “location” plugin EMI INFSO-RI-261611 EMI INFSO-RI-261611
A performance test • Two endpoints: DESY and CERN (poor VM) • One UGR at DESY • 10K files in a 4-levels deep directory • Files exist on both endpoints • The test (written in C++) invokes Stat only once per file, using many parallel clients doing stat() at the maximum pace from 3 machines EMI INFSO-RI-261611 EMI INFSO-RI-261611
The result, WAN access EMI INFSO-RI-261611 EMI INFSO-RI-261611
Get started • Get it here: https://svnweb.cern.ch/trac/lcgdm/wiki/Dynafeds • What you can do with it: • Easy, direct job/user data access, WAN friendly • Access missing files after job starts • Friend sites can share storage • Diskless sites • Federating catalogues • Combining catalogue-based and catalogue-free data
Dynamic Feds cf XROOTD feds • XROOTD federations are focused on the “redirection” concept • Very light at the meta-manager, just redirect clients away as soon as possible • If not possible, the penalty is 5 seconds per jump • Global listing is implemented in the client, slowish, hiccup-prone • Some details do not match yet very well with quick geography-aware redirections • Dynamic Federations support both the “redirection” concept and the “browsing” concept by design • Much more centred on the meta-manager • We can’t touch the clients • Cache metadata for the clients, in-memory • Designed for scalability, performance and features • Extendable plugin architecture, geography-aware redirection • Can speak any protocol, our focus in on http-based things EMI INFSO-RI-261611 EMI INFSO-RI-261611
Next steps • Release our beta, as the nightlies are good • More massive tests, with many endpoints, possibly distant • We are now looking for partners • Precise performance measurements • Refine the handling of the ‘death’ of the endpoints • Immediate sensing of changes in the endpoints’ content, e.g. add, delete • SEMsg in EMI2 SYNCAT would be the right thing in the right place • Some more practical experience (getting used to the idea, using SQUIDs, CVMFS, EOS, clouds,... <put your item here> ) EMI INFSO-RI-261611 EMI INFSO-RI-261611
HTTP for xrootd HTTP or XROOTD? HTTP and XROOTD? • An XROOTD federation gives the goodies/hooks of the XROOTD framework • This involves also many other components and groups of people • Monitoring of the FAX is a perfect example • IT-GT (CERN) will produce an HTTP plugin for XROOTD • Double-headed data access • Discussions started • Effort will be scoped early Oct • Will involve xrootd framework enhancements too • Federate the same clusters also via HTTP • Pure HTTP/DAV endpoints can join normally • Let users enjoy EMI INFSO-RI-261611 EMI INFSO-RI-261611
References • Wiki page and packages • https://svnweb.cern.ch/trac/lcgdm/wiki/Dynafeds • CHEP papers • Federation • http://cdsweb.cern.ch/record/1460525?ln=en • DPM & dmlite • https://cdsweb.cern.ch/record/1458022?ln=en • HTTP/dav • https://cdsweb.cern.ch/record/1457962?ln=en
Conclusions • Dynamic Federations: an efficient, persistency-free, easily manageable approach to federate remote storage endpoints • HTTP, standard, WAN and cloud friendly • Interoperating with and augmenting the xrootd ones is desirable and productive • Work in progress, status is very advanced, demoable, installable, documented. EMI INFSO-RI-261611 EMI INFSO-RI-261611
Thank you • Questions? EMI INFSO-RI-261611 EMI INFSO-RI-261611 Partially funded by EMI is partially funded by the European Commission under Grant Agreement INFSO-RI-261611