1 / 26

Advanced Data Movement and Management features of SRB By Michael Wan

Advanced Data Movement and Management features of SRB By Michael Wan. SDSC/UCSD. Sput – upload files to SRB. [-fprabvsmMkKV] [-c container] [-D dataType] [-n replNum] [-N numThreads] [-S resourceName] [-P pathName] [-R retry_count] localFileName|localDirectory ... TargetName

keren
Télécharger la présentation

Advanced Data Movement and Management features of SRB By Michael Wan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Data Movement and Management features of SRB By Michael Wan SDSC/UCSD

  2. Sput – upload files to SRB [-fprabvsmMkKV] [-c container] [-D dataType] [-n replNum] [-N numThreads] [-S resourceName] [-P pathName] [-R retry_count] localFileName|localDirectory ... TargetName Upload one or more local files and/or directories Default mode – sequential Sput –v /tmp/srb/lfile LOCAL:/tmp/srb/lfile->SRB:lfile | 84.315 MB | 13.219 MB/s | 6.38 s | 2005.07.29 21:41:56 Sls -l lfile fedsrbbrick8 0 demoResc 84314624 2005-07-29-15.18 % lfile

  3. Sput – serial mode Peer-to-peer Request srbObjCreate srbObjWrite Sput 1 5 SRB server2 SRB server1 3 4 6 SRB agent SRB agent 2 Server(s) Spawning MCAT 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control R Data Transfer

  4. Serial Mode Data Transfer Simple to Implement and Use Unix-like API – srbObjCreate, srbObjWrite Performance Issue 2 hops data transfer Single data stream One file at a time – overhead relatively high for small files MCAT interaction – query and registration Small buffer transfer Large files – Single Hop, multiple data streams Small files – Single Hop, multiple files at a time

  5. Parallel Mode Data Transfer For large file transfer multiple data streams Single hop data transfer Two modes Server initiated Client initiated (for clients behind firewall) Up to 5 times speed up for WAN Two simple API – srbObjPut and srbObjGet Use –m (Server initiated), -M (Client initiated) options Available to all Scommands involving data transfer Sput, Sget, Srsync,Sreplicate, Scp, Sbkupsrb, SsyncD, Ssyncont

  6. Parallel mode Data Transfer – Server Initiated Peer-to-peer Request Data transfer Sput -m srbObjPut + socket addr , port and cookie 6 1 SRB server2 5 SRB server1 3 4 SRB agent SRB agent 2 Connect to client MCAT 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control R

  7. Parallel mode Data Transfer – Client Initiated Connect to server Data transfer Sput -M srbObjPut 8 1 6 7 SRB server2 SRB server1 3 4 SRB agent SRB agent 2 5 Return socket addr., port and cookie MCAT 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control R

  8. Small files Data Transfer (Bulk operation) Upload/download large number of small files One file at a time – relative high overhead MCAT interaction, Small buffer transfer <= 0.5 sec/file for LAN, > 1 sec/files for WAN Bulk Operation Bulk data transfer transfer multiple files in a single large buffer (8 Mb) Bulk Registration Register large number of files (1,000) in a single call Multiple threads for transfer and registration Single Hop 3-10 times speedup Specify -b in Sput/Sget

  9. Bulk Load Operation Bulk Data transfer thread 8 Mb buffer Query Resource Sput -b Return Resource Location 4 1 5 Bulk Registration threads SRB server2 3 SRB server1 Store Data in a temp file SRB agent SRB agent 2 6 MCAT 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control R Bulk Register Unfold temp file

  10. Container - Archival of Small files Performance issues with storing/retrieving large number of small files to/from tape Container design physical grouping of small files Implemented with a Logical Resource A pool of Cache Resource for the frontend resource An Archival Resource for the backend resource The entire container is stored on tape as a single file Bulk operation with container – faster Container specific commands – Smkcont, Srmcont, Ssyncont, Slscont, Sreplcont

  11. Summary of Data Transfer modes Serial - default mode Parallel - for large files Bulk - for large number of small files Container - Archiving small files (to tapes). Container + bulk - faster archival of small files

  12. Sput (cont) -m parallel, server initiated connection -M parallel, client initiated connection client behind firewall problem -r recursive -b bulk (directories of small files) time Sput –r /tmp/srb/d200 d200a time Sput -b /tmp/srb/d200 d200b -k – register checksum computed by client Sput –kv /tmp/srb/mfile Schksum -l mfile -K checksum verification Client computes checksum Server independently computes checksum by reading back uploaded file

  13. Sget – Download files from SRB Sget [-n n] [-N numThreads] [-pbfrvsmMV] [-T ticketFile | -t ticket] [-A condition] [-R retry_count] [-k] srbObj|Collection ... [localFile| Download one or more files from SRB to local file system -r recursive, -b bulk, -m parallel (server based), -M parallel (-N numTreads, client based), -k , -n replica number

  14. Types of Data Transfer Local to SRB - Sput, Srsync SRB to Local - Sget, Srsync SRB to SRB - Scp, Sreplicate, Sbkupsrb, Srsync, Sphymove Third party transfer Server to Server data transfer, client not involved Parallel I/O

  15. Third Party Data Transfer Scp srbObjCopy 1 SRB server SRB server 2 MCAT SRB agent SRB server2 3 5 SRB server1 SRB agent 6 SRB agent 4 R dataPut- socket addr., port and cookie Connect to server2 Data transfer R

  16. Sreplicate {-n replicaNum] [-pr] [-S resourceName] [-P pathName] srbFile|collection …} makes a relica of srbFiles or collections Replica have same path but different replica number Use third party parallel transfer

  17. Sreplicate/Sbkupsrb Sreplicate –S demoResc1 mfile Sls –l mfile fedsrbbrick8 0 demoResc 3029449 2005-07-29-15.37 % mfile fedsrbbrick8 1 demoResc1 3029449 2005-07-29-21.28 % mfile Sget –vn1 mfile Sreplicate –rS demoResc1 testdir Sls –lr testdir Sbkupsrb – similar to Sreplicate but won’t make a copy if a good copy already exist in the targetResc Sbkupsrb -S demoResc1 mfile Sls –l mfile Sbkupsrb -S demoResc2 mfile Sls –l mfile

  18. Sphymove –move file to another resource Move file to another resource without making another replica Normally used by admin to move files around Used by the BBSRC project. Sphymove [-b|r] [–c container] [-S targetResource ] [-s sourceResource ] srbFile|srbCollection ... -b bulk, -r recursive (for collection) -c container – move files into container -S targetResource – move file to this resource if specified. -s sourceResource – If specified, move only files stored in the sourceResource to the targetResource. Otherwise, move all files that are not in the targetResource

  19. Sphymove - cont Sphymove –b –s demoResc –S demoResc1 testdir Bulk move all files stored in demoResc in the ‘testdir’ collection to demoResc1 Sphymove –b –c myContainer testdir Bulk move all files in the testdir collection into the ‘myContainer’ container.

  20. Scp {[-n n ] [-fpra] [-c container] [-S newResourceName] [-P newPathName] srcFile|srcCollection … targFile| targCollection SRB to SRB copy From one SRB path to another SRB path Use third party parallel transfer Cross zone copy -a write to all resources, -r recursive, -b bulk Scp –S demoResc1 –r testdir testdir1 Scp –S xyzResc /z1/a/b/c /Z2/x/y/z (cross zone)

  21. Data Synchronization Srsync [-S resource] [-t tmpInxDir] [-rvamMls] sourceFile|sourceDirectory [....] targetFile|targetDirectory Similar to Unix rsync Modes: Local to SRB SRB to Local SRB to SRB Use checksum value for synchronization Data transfer only when checksums are different

  22. Rrsync (cont) Srsync -vMr /tmp/srb/testdir s:testdir2 /tmp/srb/testdir/./Sget.c 11514 55818 N LOCAL:/tmp/srb/testdir/./Sget.c->SRB:Sget.c&REG_CHKSUM=55818 | 0.012 MB | 0.026 MB/s | 0.45 s | 2005.07.29 22:13:40 | 1 thr . Sls –lr testdir2 Change /tmp/srb/testdir/SgetD.c with an editor Srsync -vMr /tmp/srb/testdir s:testdir2 Srsync -vMr s:testdir2 testdir2 Srsync -vMr s:testdir2 s:testdir3

  23. Schksum – checksum utility [-f|l|c] [-n replNum] [-rv] srbFile|collection ... Computes and lists checksum values of SRB files -f force recompute, -c verification mode, -l list mode, -r recursive, -v verbose mode, -n replNum, -s verify integrity based on size Example: Schksum -lr testdir Schksum -crv testdir2

  24. SRB Proxy operation Perform operations on server on behalf of user Operation where data is located File format conversion, md5 checksum, subsetting and filtering, etc Two types of proxy operations Proxy commands Invoked by client using Spcommand Server fork and exec executable/script in bin/commands on server Pipe output back to client Proxy functions Functions built into server Well defined framework for writing proxy functions

  25. Spcommand – proxy command Spcommand [-hc] [-H hostAddr | -d srbPath] command Command – an executable/script in bin/commands on server -H the server host address where the command should be executed -d execute on host where this srbPath is located Spcommand "hello mike" Hello mike from SRB world

  26. SRB shell Ssh [-v][-c command] Put client into a SRB shell Make a one time connection to SRB Keep the connection open Use it for all subsequent Scommands Example : Ssh – put into an interactive Ssh shell Issue Scommand and UNIX command UNIX shell environment not supported Ssh –c bash - create an interactive bash session Ssh -c myShellScript

More Related