250 likes | 358 Vues
This presentation covers the SRB (Storage Resource Broker) as a robust data grid solution. Presented by Arun Jagatheesan from the San Diego Supercomputer Center at the NCHC workshop in Taiwan (August 2004), it highlights critical elements such as problem statements, project history, team structure, and the architecture of SRB. The talk elaborates on why people are using SRB, what specific problems it has addressed, and its applications across multiple disciplines, including managing large datasets in distributed environments.
E N D
SRB as data grid solution (Chinese version) Arun Jagatheesan arun@sdsc.edu San Diego Supercomputer Center SRB Workshop National Center for High-performance Computing (NCHC) Taiwan, August 3, 2004
SRB as data grid solution (Chinese version) Arun Jagatheesan arun@sdsc.edu San Diego Supercomputer Center SRB Workshop National Center for High-performance Computing (NCHC) Taiwan, August 3, 2004
SRB? SRB = Storage Resource Broker
More Chinese Oops, don’t know any more Chinese to continue
SRB as data grid solution(English Version) Arun Jagatheesan arun@sdsc.edu San Diego Supercomputer Center
Talk Outline • Introduction to Problem statement(s) • How SRB is the solution • SRB Project History • SRB Team • SRB Architecture (from the Architect him self)
What problem, why SRB solution? • Why are people using SRB? • What problems did it solve for them? • Who are these people? • Did they use it because they liked Arun
Southern California Earthquake Center • Build community digital library • Manage simulation and observational data • Anelastic wave propagation output • 10 TBs, 1.5 million files • Provide web-based interface • Support standard services on digital library • Manage data distributed across multiple sites • USC, SDSC, UCSB, SDSU, SIO • Provide standard metadata • Community based descriptive metadata • Administrative metadata • Application specific metadata
SCEC Data Management Technologies • Portals • Knowledge interface to the library, presenting a coherent view of the services • Knowledge Management Systems • Organize relationships between SCEC concepts and semantic labels • Process management systems • Data processing pipelines to create derived data products • Web services • Uniform capabilities provided across SCEC collections • Data grid • Management of collections of distributed data • Computational grid • Access to distributed compute resources • Persistent archive • Management of technology evolution
NASA Data Grids • NASA Information Power Grid • NASA Ames, NASA Goddard • Distributed data collection using the SRB • ESIP federation • Led by Joseph JaJa (U Md) • Federation of ESIP data resources using the SRB • NASA Goddard Data Management System • Storage repository virtualization (Unix file system, Unitree archive, DMF archive) using the SRB • NASA EOS Petabyte store • Storage repository virtualization for EMC persistent store using the Nirvana version of SRB
OC-12 vBNS Abilene MREN OC-12 OC-3 TeraGrid:13.6 TF, 6.8 TB memory, 900 TB network disk, 10 PB archive ANL 1 TF .25 TB Memory 25 TB disk Caltech 0.5 TF .4 TB Memory 86 TB disk Extreme Blk Diamond 574p IA-32 Chiba City 256p HP X-Class 32 32 24 32 32 128p HP V2500 128p Origin 24 32 24 92p IA-32 32 HR Display & VR Facilities 5 4 8 5 8 HPSS HPSS NTON OC-48 Calren OC-12 ESnet HSCC MREN/Abilene Starlight Chicago & LA DTF Core Switch/Routers Cisco 65xx Catalyst Switch (256 Gb/s Crossbar) Juniper M160 OC-12 ATM OC-48 OC-12 GbE NCSA 6+2 TF 4 TB Memory 400 TB disk SDSC 4.1 TF 2 TB Memory 500 TB SAN vBNS Abilene Calren ESnet OC-12 OC-12 OC-12 OC-3 Myrinet 4 8 HPSS 9 PB UniTree 8 2 Sun Server Myrinet 4 1024p IA-32 320p IA-64 1176p IBM SP 1.7 TFLOPs Blue Horizon 14 16 15xxp Origin 4 2 x Sun E10K
NIH BIRN SRB Data Grid • Biomedical Informatics Research Network • Access and analyze biomedical image data • Data resources distributed throughout the country • Medical schools and research centers across the US • Stable high performance grid based environment • Coordinate data sharing • Federate collections • Support data mining and analysis
SDSC SRB User Community (Major US) • National Science Digital Library (NSDL) • National Optical Astronomy Observatory (NOAO) • ROADNet • Purdue University • SCCOOS, USA • Scientific Rich Media Archive • Salk Institute • Strand Map Service, USA • UC Berkeley Library • UCSD Library • University of Houston • Persistent Archives Test bed • University of Wisconsin, Madison • WebBase, Stanford University • Yale University Library • BaBar, Stanford Linear Accelerator Center (SLAC) • California Digital Library (CDL) • Center for Integrated Space Weather Modeling (CISM) • CVC, Visualization Portal • LDC Data Storage • NIH Bio Informatics Research Network (BIRN) • NSF Southern California Earthquake Center (SCEC) • National Archives and Records Administration (NARA) • National Aeronautics and Space Administration Centers (NASA) • National Virtual Observatory (NVO) • Npackage, NSF Middleware Initiative (NMI)
Academia Sinica, Taiwan Australian National University Bio-Lab, University of Genoa, Italy Council for the Central Laboratory of the Research Councils (CCLRC), UK CC-IN2P3, France Distributed Framework, Singapore Distributed Aircraft Maintenance Environment (DAME), UK eMinerals Project, UK eScience, Belfast Center Fraunhofer ITWM, Germany High Energy Accelerator Organization, KEK, Japan K* Grid Computing, Korea KEK Computing Center, Japan Lyon, France NorGrid, Norway Nanyang Data Grid, Singapore Queensland University of Technology (QUT), Australia Rutherford Appleton Laboratory (RAL), UK T-Systems, Germany UK eScience Project, UK UniGrid, Poland UMK, Poland Virtual Laboratory for eScience, Netherlands SDSC SRB User Community
What problem, why SRB solution? • Why are people using SRB? • What problems did it solve for them? • Who are these people? • Did they use it because they liked Arun
Why they use SRB? • Distributed unstructured data management • Data Grids, Digital Libraries, Persistent Archives, • Workflow/dataflow Pipelines, Knowledge Generation • Distributed data storage provisioning • Common logical namespace for data and storage • Data publication • Browsing and discovery of data in collections • Data Preservation • Management of technology evolution
Total data brokered by SDSC SRB 358 TB 324 TB 682 TB
Looking back… • 1995: MDAS Project by DARPA • 1998: SRB Releases • 2000: Arun joins SRB • Only after that SRB becomes a hit – lucky guy (just kidding) • 2000 ++: Multiple client interfaces, Many more functionalities, Multiple projects across the world • 2005: NCHC demonstrates significant interest in SRB and also their end-users in Taiwan (through this workshop)
Physical Layer (Real World) • Distributed digital entities • Heterogeneous and distributed storage resources • Autonomous Organizations • Distributed Users, distributed authentication • Heterogeneous authorization schemes • Users; sub-organizations; organizations/enterprises; virtual organizations
myActiveNeuroCollection patientRecordsCollection image.cgi image.wsdl image.sql E:\srbVault\image.jpg /users/srbVault/image.jpg Select … from srb.mdas.td where... Data Grid Transparencies/Virtualizations (bits,data,information,..) Inter-organizational Information Storage Management Virtual Data Transparency Data Replica Transparency image_0.jpg…image_100.jpg Data Identifier Transparency Storage Location Transparency Storage Resource Transparency
We are SRB Arun is here! - Shameless Self promotion Not in picture: Many students