1 / 15

HDFS and S3 plugins

HDFS and S3 plugins. Andrea Manzi Martin Hellmich 13/12/2013. Plugins functionalities. HTTP/DAV. XROOT. GridFTP. NFS . RFIO. Namespace Management. Pool Management. Pool Driver. I/O. Legacy DPM. Legacy DPM. Legacy DPM. Legacy DPM. MySQL. MySQL. HDFS. HDFS. S3. Oracle.

upton
Télécharger la présentation

HDFS and S3 plugins

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013

  2. Plugins functionalities HTTP/DAV XROOT GridFTP NFS RFIO Namespace Management Pool Management Pool Driver I/O Legacy DPM Legacy DPM Legacy DPM Legacy DPM MySQL MySQL HDFS HDFS S3 Oracle Oracle HDFS Memcache DPM Workshop

  3. HDFS plugin • dmlite plugin implementing I/O, pool driver and namespace functionalities through Apache Hadoop HDFS ensuring: • Automatic data replication • Fault tolerance to client’s read • Dead of Datanode and Namenode • Scalability DPM Workshop

  4. Deployment with Lcgdm-dav DPM Head Node Lcgdm-dav + dmlite HDFS-plugin HDFS Namenode HDFS Datanode(s) Lcgdm-dav + dmlite HDFS-plugin DPM Workshop

  5. Some details • HDFS C APIs (libhdfs) do not implement functions to retrieve the available datanodes ( LIVE nodes) • Patch implemented and submitted to Hadoop • hadoop-libhdfs rpm from our repo • First version for Puppet installation is available. • To be adapted to recent dav/dmlite module changes DPM Workshop

  6. On-going issues • Tested with new dmlite-based GridFTP plugin • Same deployment model as http/dav frontend or single node writing to HDFS • But…HDFS does not support multiple write streams / random writes: • OSG developed in-memory stream reordering in GridFTP in order to avoid this limitation ( gridftp-hdfsDSI available also in Globus toolkit) • To test and understand integration DPM Workshop

  7. On-going issues • SRM frontend does not speak dmlite • SRM calls through old dpm daemons do not handle properly new pools (as HDFS) • Patch to dpm daemon to be implemented DPM Workshop

  8. Future steps • Distribution: • Need to understand how to distribute the plugin • HDFS client only in Fedora 20 and Rawhide • https://apps.fedoraproject.org/packages/libhdfs • Support for security enabled HDFS clusters ( Kerberos) DPM Workshop

  9. Performances Tests through LCDM-DAV: • HDFS Namespace • stat/s half performances compared to Mysqlplugin namespace • To be optimized with Memcached in front • ROOT analysis with massive Vector I/O and TTreeCache • Comparable performance with standard disk pools DPM Workshop

  10. S3 plugin DPM Workshop

  11. Key Facts 2 3 1 Data directly to the cloud HTTP/HTTPS only DPM provides the namespace DPM Workshop

  12. Data in the Cloud GET REDIRECT GET DATA • No data through DPM • Inherits all capabilities from S3 provider: • Amazon: range-header, no multi-range, multi-stream download only, no 3rd party copy, http access only DPM Workshop

  13. How to install an S3 pool yum install dmlite-plugins-s3 dmlite-shell > pooladdpoolaws s3 > poolmodifypoolawsbucketsaltxFVlsrg > poolmodifypoolaws s3accesskeyid <ID> > poolmodifypoolaws s3secretaccesskey <SK> <create an s3 bucket on your storage> DPM Workshop

  14. More info • HDFS plugin • https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Dmlite/Plugins/HDFS • S3 plugin • https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Dmlite/Plugins/S3 DPM Workshop

  15. Thanks! Questions? DPM Workshop

More Related