460 likes | 476 Vues
Explore the research, development, architecture, and implementation of DPM, a scalable and easy-to-manage solution for storage elements at Tier2s. Learn about its simple design, integrated security features, access control lists, and space reservation capabilities.
E N D
Research and developments DPM overview Jean-Philippe Baud DPM overview - 1
Outline • Why did we develop a DPM? • DPM architecture • DPM implementation • An example: PrepareToGet of a DICOM file • Operations • Current usage DPM overview - 2
History • A Storage Element solution was needed for Tier2s • Possible candidates: • Classic SEs • Castor 1 • EDG SE • dCache • new development DPM overview - 3
Goals • Provide a scalable solution to replace the Classic Storage Elements at Tier2s • Focus on manageability • Easy to install • Easy to configure • Easy to add/remove resources • Low effort for maintenance and support • Integrated security (authentication/authorization) DPM overview - 4
DPM architecture CLI, C API, SRM-enabled client, etc. DPMCOPY node DPM head node • DPM Name Server • Namespace • Authorization • Physical files location • DPM Server • Requests queuing and processing • Space management • SRM Servers (v1.1, v2.1, v2.2) • Disk Servers • Physical files • Direct data transfer from/to disk server (no bottleneck) data transfer … DPM disk servers DPM overview - 5
Simple design (1) • DPM architecture is database centric • Only 2 DBs • Fairly simple schema • No complex query (mostly key access) • Use of bind variables, indices, transactions and integrity constraints • Support for MySQL and Oracle • Automatic reconnection to the DB (allows transparent failover when using Oracle) DPM overview - 6
Simple design (2) • Few daemons • Mainly communicating through the DB • Stateless • Configuration is kept in DB • A given daemon can be restarted on a different server • Scalability and high availability • All servers (except the DPM one) can be replicated if needed (DNS load balancing) • Daemons can be restarted independently • Automatic retries in clients DPM overview - 7
Security (1) • All control and I/O services have security built-in (GSI) • The entries in the name space can be protected by Posix Access Control Lists • All privileged operations can only be done with a Host Certificate on a trusted host • VOMS integration: groups, sub-groups and roles are supported • The DNs and VOMS FQANs are mapped to virtual ids (no pool account) • All the groups present in the proxy are used for authorization in the namespace • Only the primary group/role is used in disk pool selection DPM overview - 8
Security (2) • Virtual IDs are normally created on the fly the first time the system receives a request for this DN or for this VOMS FQAN (no pool account) • DNs are mapped to virtual UIDs • VOMS FQANs are mapped to virtual GIDs • A given user may normally have one DN and several FQANs (voms-proxy-init) • For legacy case (grid-proxy-init) a grid map file is used (single group per user): /opt/lcg/etc/lcgdm-mapfile • Virtual IDs can be manually assigned by the Storage Element administrator • To create VO groups in advance • To keep same uid when DN changes • To get same uid for a DN and a kerberos principal DPM overview - 9
Security (3) • Integration with LCAS to be done for blacklisting of users • Integration with LCMAPS will not be done (implies real Unix ids instead of virtual ones) DPM overview - 10
Access Control Lists • LFC and DPM support Posix ACLs based on Virtual Ids • Access Control Lists on files and directories • Default Access Control Lists on directories: they are inherited by the sub-directories and files under the directory • Example • dpns-mkdir /dpm/cern.ch/home/dteam/jpb • dpns-setacl -m d:u::7,d:g::7,d:o:5 /dpm/cern.ch/home/dteam/jpb • dpns-getacl /dpm/cern.ch/home/dteam/jpb # file: /dpm/cern.ch/home/dteam/jpb # owner: /C=CH/O=CERN/OU=GRID/CN=Jean-Philippe Baud 7183 # group: dteam user::rwx group::r-x #effective:r-x other::r-x default:user::rwx default:group::rwx default:other::r-x DPM overview - 11
File properties • Retention policies • Given quality of disks, admin defines quality of service • Replica, Output, Custodial • Access latency • Online, Nearline • Nearline is used for BIOMED DICOM integration • File storage type • Volatile (user defined lifetime), Permanent (infinite lifetime) • File pinning • Set TURL lifetime (srmPrepareToGet, srmPrepareToPut) • Set file lifetime in space (srmBringOnline) DPM overview - 12
Space reservation (1) • Implicit (rfio, gsiftp, …) • Explicit • Per file (PrepareToPut) • Bulk (ReserveSpace) DPM overview - 13
Space reservation (2) • Static space reservation (admin) $ dpm-reservespace --gspace 20G --lifetime Inf --group atlas --token_desc Atlas_ESD $ dpm-reservespace --gspace 100M --lifetime 1h --group dteam/Role=lcgadmin --token_desc LcgAd $ dpm-updatespace --token_desc myspace --gspace 5G $ dpm-releasespace --token_desc myspace • Dynamic space reservation (user) • Defined by user on request • dpm-reservespace • srmReserveSpace • Limitation on duration and size of space reserved DPM overview - 14
Handling of space tokens (1) • Space tokens are honored in PrepareToPut • Space tokens are ignored in PrepareToGet/BringOnline for files already online • Space tokens will be honored for PrepareToGet/BringOnline when bringing a nearline file to online • Disk pool definitions: • Disk pools can offer the same type of Retention Policies as srmReserveSpace: Replica, Output and Custodial. Default is Replica. • Pool space type should normally be set to ‘-’ (hyphen) to say any • srmReserveSpace can be given a better Retention Policy than requested DPM overview - 15
Handling of space tokens (2) • Disk pools can be dedicated to a list of groups (VOMS FQAN) • Spaces can be dedicated to a user (DN) or to a single group (VOMS FQAN) • Files can be stored without specifying a space token (compatibility with SRM v1) DPM overview - 16
Interfaces • Control interfaces • Socket (CLI, C API and Python) • SRM v1.0, v2.1 and v2.2 • Data access protocols • Secure RFIO • gsiftp • http/https • xroot DPM overview - 17
Implementation • Name Server • DPM • SRM • gridFTP • DICOM backend DPM overview - 18
DPNS (1) • DPNS stands for DPM Name Server • Multi-threaded daemon using a pool of threads • Each thread • connects to the DB backend when used for the first time • keeps the connection to the DB between requests • automatically reconnects if necessary • The client connection is closed after every request except for • listings (directories, replicas, links, expired files) • sessions • transactions DPM overview - 19
DPNS (2) File Metadata User Metadata File Name (SFN) fileid System Metadata (Ownership, Size, Checksum, ACL) User Defined Metadata File Replica Symlinks Cns_groupinfo Cns_userinfo StorageFileName StorageHost LinkName Virtual GID Group name Virtual UID User name (DN/principal) DB schema DPM overview - 20
DPNS (3) File replica table struct Cns_file_replica { u_signed64 fileid; u_signed64 nbaccesses; time_t ctime; /* replica creation time */ time_t atime; /* last access to replica */ time_t ptime; /* replica pin time */ time_t ltime; /* replica lifetime */ char r_type; /* 'P' for Primary, 'S' for Secondary */ char status; /* ‘-’ for Online, ‘N’ for Nearline */ char f_type; /* 'V' for Volatile, 'P' for Permanent */ char setname[37]; /* space token */ char poolname[CA_MAXPOOLNAMELEN+1]; char host[CA_MAXHOSTNAMELEN+1]; char fs[80]; char sfn[CA_MAXSFNLEN+1]; }; DPM overview - 21
DPM (1) • Multi-threaded daemon using 2 pools of threads • One pool of threads to handle synchronous (fast) operations • One pool of threads to handle asynchronous (slow) operations • One thread per disk pool to handle garbage collection • One thread to remove expired puts and expired spaces • Configuration is kept in the DB • Information about disk free space is kept in memory shared by the threads • History of asynchronous requests is kept in DB for a certain amount of time settable by the SE admin DPM overview - 22
DPM (2) Head node architecture DPM daemon (fast ops) DPM daemon (slow ops) DPM tables Client SRM v1 and v2 daemons DICOM backend DPNS tables DPNS daemon DPM overview - 23
DPM (3) • Policies • Defined per disk pool for • File system selection (currently round robin) • Garbage collection (can be enabled/disabled, thresholds can be set) • Request selection (currently FIFO, based on timestamp and request ordinal) • Migration • GC and RS policies are implemented as SQL queries • Policies are not pluggable yet DPM overview - 24
SRM • Support 3 versions: v1.0, v2.1 and v2.2 • One daemon per SRM version listening on a different port • Directly talk to DPNS for Directory and Permission functions • Talk to the DPM for Space Management functions and srmRm • Store Data Transfers requests in the DPM request DB (prepareToGet, PrepareToPut, BringOnline, Copy) • Query the DPM DB for status requests DPM overview - 25
gridFTP • 2 versions supported: • gridFTP 1 on SLC3 (heavily patched Globus 2 code) • gridFTP 2 on SL4 and SL5 (native Globus 4.0 code distributed by VDT and a DPM specific plugin) • Runs with authlevel 0 to support virtual ids • 2 modes supported: • User provides a TURL (physical file name) • User provides an SFN (/dpm/ …) • If the gridFTP daemon does not run on the machine where the data resides, a memory to memory copy is done with RFIO (should be avoided) DPM overview - 26
DICOM backend • The goal is to be able to recall nearline files from a DICOM server with on-the-fly encryption • The single-threaded backend server • Loads a plugin to avoid putting dependencies on DICOM libraries in DPM • Polls the DB for work • Forks one process per user request, but the request can be multi-file • Copies the file to the DPM disk cache • Updates the DB • We had to use processes and not threads to be able to pass different credentials to other components like Hydra • The number of concurrent workers is configurable DPM overview - 27
PrepareToGet of a DICOM file • Client sends request to the SRM v2.2 server • SRM server stores request in the DPM request DB • DPM daemon “slow” thread • picks the request and marks it active • checks file existence and permissions • if file is online • selects the best replica • sets the PFN in the DB • sets the file status to DPM_READY • pins the file • (there is no callback to the user) • If file is nearline • Stores a recall request in the DB DPM overview - 28
Recall of a DICOM file • Dicomcopyd daemon forks a worker child which • Picks the request and marks it active • Gets space on disk to store the file • Executes the DICOM plugin which • Gets encryption key from Hydra • Copies the file from the DICOM server to the DPM disk cache, encrypting it on the fly • Issues a dpm_putdone request • Sets the file status in the DB to DPM_READY • Tells the DPM to pin the file DPM overview - 29
PrepareToGet of a DICOM file (1) Sends the request to the SRM server DPM daemon (fast ops) DPM daemon (slow ops) DPM tables Client SRM v1 and v2 daemons DICOM backend DPNS tables DPNS daemon DPM overview - 30
PrepareToGet of a DICOM file (2) Stores the request in the request DB DPM daemon (fast ops) DPM daemon (slow ops) DPM tables Client SRM v1 and v2 daemons DICOM backend DPNS tables DPNS daemon DPM overview - 31
PrepareToGet of a DICOM file (3) Picks the request and marks it active DPM daemon (fast ops) DPM daemon (slow ops) DPM tables Client SRM v1 and v2 daemons DICOM backend DPNS tables DPNS daemon DPM overview - 32
PrepareToGet of a DICOM file (4) Checks existence and permission If online gets best replica DPM daemon (fast ops) DPM daemon (slow ops) DPM tables Client SRM v1 and v2 daemons DICOM backend DPNS tables DPNS daemon DPM overview - 33
PrepareToGet of a DICOM file (5) Sets PFN and status READY in DB DPM daemon (fast ops) DPM daemon (slow ops) DPM tables Client SRM v1 and v2 daemons DICOM backend DPNS tables DPNS daemon DPM overview - 34
PrepareToGet of a DICOM file (6) Pins the file DPM daemon (fast ops) DPM daemon (slow ops) DPM tables Client SRM v1 and v2 daemons DICOM backend DPNS tables DPNS daemon DPM overview - 35
PrepareToGet of a DICOM file (7) If the file is nearline, stores a recall request in the DB DPM daemon (fast ops) DPM daemon (slow ops) DPM tables Client SRM v1 and v2 daemons DICOM backend DPNS tables DPNS daemon DPM overview - 36
PrepareToGet of a DICOM file (8) Dicomcopyd picks the request and marks it active DPM daemon (fast ops) DPM daemon (slow ops) DPM tables Client SRM v1 and v2 daemons DICOM backend DPNS tables DPNS daemon DPM overview - 37
PrepareToGet of a DICOM file (9) Gets space on disk to store the file DPM daemon (fast ops) DPM daemon (slow ops) DPM tables Client SRM v1 and v2 daemons DICOM backend DPNS tables DPNS daemon DPM overview - 38
PrepareToGet of a DICOM file (10) Dicomcopyd plugin copies the file to the cache DPM daemon (fast ops) DPM daemon (slow ops) DPM tables Client SRM v1 and v2 daemons DICOM backend DPNS tables DPM disk cache DPNS daemon DPM overview - 39
PrepareToGet of a DICOM file (11) Issues a dpm_putdone request DPM daemon (fast ops) DPM daemon (slow ops) DPM tables Client SRM v1 and v2 daemons DICOM backend DPNS tables DPNS daemon DPM overview - 40
PrepareToGet of a DICOM file (12) Sets PFN and status READY in DB DPM daemon (fast ops) DPM daemon (slow ops) DPM tables Client SRM v1 and v2 daemons DICOM backend DPNS tables DPNS daemon DPM overview - 41
PrepareToGet of a DICOM file (13) Tells the DPM to pin the file DPM daemon (fast ops) DPM daemon (slow ops) DPM tables Client SRM v1 and v2 daemons DICOM backend DPNS tables DPNS daemon DPM overview - 42
Operations • Common logging format with timestamps and user identity • DPNS/SRM upgrade is transparent if no DB schema change and if multiple front-ends are used • We limit the number of DB schema updates to about once a year • DPM databases do not need to run on the same machine as the front-end server DPM overview - 43
Installation and configuration • RPMs produced for SL3, SL4 (32 and 64 bits), SL5 • Build on Ubuntu and Debian • Almost build on MacOSX (ppc and i386) • Can be configured using Yaim or Quattor (LAL) DPM overview - 44
131 DPMs in LCG/EGEE (largest 300 TB) DPM overview - 45
Need help? • DPM online documentation https://twiki.cern.ch/twiki/bin/view/LCG/DataManagementDocumentation • Reference man pages • Administrator guide • Troubleshooting • Support • helpdesk@ggus.org • General questions • hep-service-dpm@cern.ch DPM overview - 46