1 / 30

SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB)

SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB). Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science Washington University in St. Louis kenw@arl.wustl.edu, http://www.arl.wustl.edu/~kenw. OUTLINE OF TALK. SRB and HPSS Overview

ohio
Télécharger la présentation

SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science Washington University in St. Louis kenw@arl.wustl.edu, http://www.arl.wustl.edu/~kenw

  2. OUTLINE OF TALK • SRB and HPSS Overview • SRB Concepts and Examples • Alternatives to SRB • Other SRB Projects • Our Experience

  3. WU DATA CACHE AND THE SRB

  4. WU DATA CACHE • 1.4 TB DEC Storage Works RAID (Level 5) • 2-processor Sun Enterprise 450, 1 GB main memory • 622 Mbps ATM interface, 10/100 Mbps Ethernet interface • 1.7 TB (raw) = 48 x 9 + 24 x 18 + 24 x 36 GB • Backups • Incremental: Tue, Wed, Thu • Full: Mon, Fri, Sat • Data Volume • Used: 560 GB • Burn Rate: 7.0 GB/week (This Year); 5.5 GB/week (Lifetime)

  5. INSTALLATION HISTORY • Jun/Jul 98: Sun host and then 432 GB RAID • 3 year extended warranty and 3 year maintenance on controllers • Sep 98: SRB • Aug 99: 24 x18.2 GB disks • 3 year maintenance upgrade on controllers • Dec 99: 24 x 36.4 GB disks

  6. BRAINMAP DATA GROWTH

  7. BRAINMAP DISK USAGE

  8. STORAGE RESOURCE BROKER (SRB)

  9. HIGH-PERFORMANCE STORAGE SYSTEM

  10. HIGH-PERFORMANCE STORAGE SYSTEM • Current Usage • 150 TB (terabytes; trillion) • 15 million files • Current Capacity: 500 TBs of data (assuming a compression ratio of 1.5) • Projected Capacity: 1 PB (10^15) within a year

  11. SRB CONCEPTS • SRB Server: Responds to SRB requests from clients • MCAT (Metadata Catalogue) • Information about data sets and collections (Oracle DB) • SRB Client • SRB Resource: A logical storage resource • Example: HPSS storage and container cache • Data Set: A file registered with the SRB • Collection: Group of registered data sets/collections • Container: Data sets stored as one physical unit • Container cache can be remote from HPSS

  12. SRB SYSTEM CAPABILITIES • Collection-based management of data sets • Persistent identifiers for data sets • Management of data sets (copies or replicas) • Containers for aggregating data sets before archiving • Support for grid security infrastructure authentication • Uses public key certificates • Support for integrating data set collections across file systems, archives, and databases

  13. SRB INTERFACES • Scommands (Unix commands) • Sinit/Sexit, Sput/Sget, Smkdir/Srmdir, Sls/Srm • Smkcont/Ssyncont, Slscont/Srmcont • SgetR/SgetU/SgetD • C-Programming API • Browser

  14. PUBLISHING A DATA SET • Define the SRB environment (.srb/.MdasEnv file) mdasCollectionHome ‘/home/kenw.neurodb’ mdasDomainHome ‘neurodb’ srbUser ‘kenw’ srbHost ‘ghidorah.sdsc.edu’ defaultResource ‘cont-sdsc’ • Interact with SRB server %Sinit # Connect to SRB server %sls # See what is in my collection %Sput ./mydata brain043 # Copy file to SRB space %Schmod r public npaci brain043 # Give read access %SgetD -a brain043 # Check access permissions %Sexit # disconnect from SRB server

  15. GETTING A DATA SET (SCOMMANDS) % Sinit % Scd /home/colin.neurodb # go to Colin's collection % Sls -l # see what is there % Sget colin_avg20_1.0mm_at0.5mm.mnc . # copy to this directory % Sexit

  16. JINGHUA ZHOU'S WORK • Experiments • Test SRB functionality • Measures performance of basic SRB functions • Archiving (Perl Scripts) • Archive an arbitrary Unix directory to HPSS • Verify files were archived • Recover files from archival storage

  17. RETRIEVAL EXPERIMENTS • Load 100 MB container with 1 MB files • Measure time required to retrieve N files • Divide time by N to get average time for each file • Repeat after container has been moved to tape • Repeat above steps for 10 MB container (instead of 100 MB)

  18. AVERAGE RETRIEVAL TIME (OLD FILES)

  19. AVERAGE RETRIEVAL TIME (FRESH FILES)

  20. COMMENTS • SRB Overhead Per Object (File) • 5-7 seconds (Early Measurements) • 2-4 seconds (Recent Measurements) • Tape Overhead Per Object (File): 100 seconds • TCP Connection Needs Tuning • Assymetric routing, bottleneck, ... • snoop and tcptrace analysis • Max Sget effective bandwidth is 8 Mbps • Max Sput effective bandwidth is 4 Mbps • Goal is 32 Mbps

  21. ARCHIVING • Reflect Unix directory structure in SRB collection structure • archiver NPACI/Unix account • Look for inactive files within a directory • Multiple versions handled by appending modification date to file name • Log all archival requests

  22. CURRENT WORK • TCP Tuning and SRB 1.1.7 Performance • Enhance Archival Scripts • Improve usability • Resilience to HPSS Blackouts • Parallel Archiving

  23. RECENT SRB DEVELOPMENTS • Data Cutter • GSI authentication • UsesX.509 certificates • Container redesign • To handle multiple archival and cache resources • Remote proxy (Spcommand) • Textual annotation stored in MCAT

  24. ALTERNATIVES TO SRB • Distributed Database • Do not deal with file data => Requires other means of accessing files • A heavyweight solution; i.e., expense (money, expertise) • Need instances running wherever you want to have storage • If it is only meta-data, then a case can be made but ... • Tied to a particular vendor at all sites • Have to cross link all the databases • AFS (Andrew File System) • Doesn't have concept of application metadata • SRB has some metadata facilities now and more to come • Comments, annotations, user-controlled metadata • SRB provides a uniform authentication and authorization system

  25. TOP SRB PROJECTS (SUMMARY) • 2-Micron All Sky Survey • 10 TB of data from Caltech • 5 million images sorted into 130,000 containers • Digital Embryo Project (NLM funded) • Digitizing existing slides for storage in HPSS • Particle Physics Data Grid (DOE funded) • Data mining • Information Power Grid (NASA funded) • Data Visualization Corridor (DOE funded) • Handles terabyte sized data sets for interactive viewing • Neuroscience Data Set Federation

  26. TOP SRB PROJECTS • 2-Micron All Sky Survey (2MASS) • 10 TB of data from Caltech (3 TB done) • 5 million images sorted into 130,000 containers • SRB container technology used to manage the aggregation process on a disk cache • Replicate Caltech data • Digital Embryo Project (NLM funded) • Digitizing existing slides for storage in HPSS • SRB used to manage data movement, aggregation into containers, and metadata catalog • Queries against the collection • Particle Physics Data Grid (DOE funded) • Replicate data sets that are pulled into local disk caches

  27. TOP SRB PROJECTS • Information Power Grid (NASA funded) • SRB used to support data mining against a distributed data set collection • Data transmission rate: 58 Mbps from SDSC to NASA Ames • Put collection management in front of storage archives through use of the MCAT • Data Visualization Corridor (DOE funded) • SRB has been integrated with the Data Cutter system • For remote manipulation of data sets • Handles terabyte sized data sets for interactive viewing • Neuroscience Data Set Federation

  28. CONCLUDING REMARKS • Documentation • http://www.sdsc.edu/DICE/SRB/index.html • http://www.arl.wustl.edu/kenw/npaci/index.html • Software • Follow SRB link • Get PGP key from SDSC • Can install subset (e.g., client only) • Applications?

  29. WU DATA CACHE vBNS 45 Mbps ATM sdsc.edu wustl.edu 622 Mbps 155 Mbps ghidorah (MCAT) hpss brainmap (1.3 TB) stp, v1 (SUMS) petsun-23 (Scanners) (12 Major Users) UCSD, UCLA, John Hopkins, U. Montana, Caltech

More Related