370 likes | 559 Vues
Distributed Data Management at DKRZ. Wolfgang Sell Hartmut Fichtel Deutsches Klimarechenzentrum GmbH sell@dkrz.de, fichtel@dkrz.de. Table of Contents. DKRZ - a German HPC Center HPC Systemarchitecture suited for Earth System Modeling The HLRE Implementation at DKRZ
E N D
Distributed Data Managementat DKRZ Wolfgang Sell Hartmut Fichtel Deutsches Klimarechenzentrum GmbH sell@dkrz.de, fichtel@dkrz.de CAS2003, Annecy, France, WFS
Table of Contents DKRZ - a German HPC Center HPC Systemarchitecture suited for Earth System Modeling The HLRE Implementation at DKRZ Implementing IA64/Linux based Distributed Data Management Some Results Summary CAS2003, Annecy, France, WFS
DKRZ - a German HPCC • Mission of DKRZ • DKRZ and its Organization • DKRZ Services • Model and Data Services CAS2003, Annecy, France, WFS
Mission of DKRZ In 1987 DKRZ was founded with the Mission to • Provide state-of-the-art supercomputing and data service to the German scientific community to conduct top of the line Earth System and Climate Modelling. • Provide associated services including high level visualization. CAS2003, Annecy, France, WFS
DKRZ and its Organization (1) Deutsches KlimaRechenZentrum = DKRZ German Climate Computer Center • organised under private law (GmbH) with 4 shareholders • investments funded by federal government,operationsfunded by shareholders • usage 50 % shareholders and 50 % community CAS2003, Annecy, France, WFS
DKRZ and its Organization (2) DKRZ internal Structure • 3 departments for • systems and networks • visualisation and consulting • administration • 20 staff in total • until restructuring end of 1999 a fourth department supported climate model applications and climate data management CAS2003, Annecy, France, WFS
DKRZ Services • operations center: DKRZ • technical organization of computational ressources(compute-, data- and network-services, infrastructure) • advanced visualisation • assistance for parallel architectures(consulting and training) CAS2003, Annecy, France, WFS
Model & Data Services competence center: Model & Data • professional handling of community models • specific scenario runs • scientific data handling Model & Data Group external to DKRZ, administered by MPI for Meteorology, funded by BMBF CAS2003, Annecy, France, WFS
HPC Systemarchitecture suited for Earth System Modeling • Principal HPC System Configuration • Links between Different Services • The Data Problem CAS2003, Annecy, France, WFS
Principal HPC System Configuration CAS2003, Annecy, France, WFS
Link between Compute Powerand Non-Computing Services • Functionality and Performance Requirements for Data Service • Transparent Access to Migrated Data • High Bandwidth for Data Transfer • Shared Filesystem • Possibility for Adaptation in Upgrade Stepsdue to Changes in Usage Profile CAS2003, Annecy, France, WFS
Compute server power CAS2003, Annecy, France, WFS
Adaptation Problem for Data Server CAS2003, Annecy, France, WFS
Pros of Shared Filesystem Coupling • High Bandwidth between the Coupled Servers • Scalability supported by Operating System • No Needs for Multiple Copies • Record Level Access to Data with High Performance • Minimized Data Transfers CAS2003, Annecy, France, WFS
Cons of Shared Filesystem Coupling • Proprietary Software needed • Standardisation still missing • Limited Number of Vendors whose Systems can be connected CAS2003, Annecy, France, WFS
HLRE Implementation at DKRZ • HöchstLeistungsRechnersystem für die Erdsystem-forschung = HLREHigh Performance Computer System for Earth System Research • Principal HLRE System Configuration • HLRE Installation Phases • IA64/Linux based Data Services • Final HLRE Configuration CAS2003, Annecy, France, WFS
Principal HLRE System Configuration CAS2003, Annecy, France, WFS
Date Feb 2002 4Q 2002 3Q 2003 Nodes 8 16 24 CPUs 64 128 192 E xpected Sustained Performance ca. 200 ca. 350 ca. 500 [Gflops] E xpected Increase in Thruput compared ca. 40 ca. 75 ca. 100 to CRAY C916 Main Memory [Tbytes] 0.5 1.0 1.5 Disk - C apac it y [Tbytes] ca. 30 ca. 50 ca. 60 HLRE Phases Ma ss Storage C apac it y [Tbytes] >720 >1400 >3400 CAS2003, Annecy, France, WFS
DS phase 1: basic structure CS client(s) 11 TB • CS performance increase • f = 37 • F = f3/4 = 15 • minimal component performanceindicated in diagram • explicit user access • ftp, scp ... • CS disks with local copies • DS disks for cache • physically distributed DS • NAS architecture other clients 180 MB/s GE 45 MB/s DS 150 MB/s 375 MB/s ~ PB 16.5 TB CAS2003, Annecy, France, WFS
Adaptation Option for Data Server CAS2003, Annecy, France, WFS
DS DS phases 2,3: basic structure • CS performance increase • f = 63/100 • F = f3/4 = 22.4/31.6 • minimal component performanceindicated in diagram • implicit user access • local UFS commands • CS disks with local copies • shared disks (GFS) • DS disks for IO buffercache • Intel/Linux platforms • homogenous HW • technological challenge CS client(s) 11 TB other clients 270/325 MB/s 25/30 TB FC GE 560/675 MB/s 70/80 MB/s 225/270 MB/s ~ PB 16.5 TB CAS2003, Annecy, France, WFS
Implementing IA64/Linux based Distributed Data Management • Overall Phase 1 Configurations • Introducing Linux based Distributed HSM • Introducing Linux based Distributed DBMS • Final Overall Phase 3 Configuration CAS2003, Annecy, France, WFS
SX-6 SX-6 SX-6 SX-6 SX-6 SX-6 SX-6 SX-6 x 32 x 4 Proposed final phase 3 configuration Silkworm 12000 Local Disk FC- RAID 0.28TB x20 =5.6TB Local Disk FC- RAID 0.28TB x20 =5.6TB The Internet x 20 x 20 x 120 IXS 24nodes AsamA 4CPU SQLNET SQLNET Sun 4CPU Oracle Application Server GE x 48 x 4 FC x 72 FE x 2/node For PolestarLite x 72 Migration upon market availability of components GFS Disk (Polestar) 0.28 x 53 =14.8TB GFS Disk (Polestar) 0.28 x 53 =14.8TB HS/MS LAN GigabitEther x 36 x 36 x 2 x 2 x 2 x 2 Fibre channel x 8 x 8 x 8 x 8 x 4 x 4 x 4 x 16 x 8 x 4 AsAmA 4way AsAmA 4way GFS/Client Oracle GFS/Client Oracle AsAmA 4way AsAmA 4way AsAmA 16way GFS/Client Oracle AsAmA 16way GFS/Client Oracle GFS/Server UVDM GFS/Server UVDM AzusA16way GFS/Server UDSN/UDNL UDSN/UDNL UDSN/UDNL UDSN/UDNL UCFM/UDSN UDSN x 4 x 2 for Local disk x 4 Post processing system x 4 Local Disk (Polestar) 0.14 x 2 = 0.28TB Local Disk (Polestar) 0.14 x 2 = 0.28TB x 2 for Local disk Disk FC x 8 Tape FC x 6 Disk FC x 8 Tape FC x 6 x 8 x 16 Disk Cache (DDN) 0.69TB x 12 = 8.3TB Disk Cache (Polestar) 0.57TB x 15 = 8.5TB Oracle DB (DDN) 2TB x 4 = 8TB x 25 9940B x 20 9840B x 0 9840C x 5
Some Results • Growth of the Data Archive • Growth of Transferrate • Observed Transferrates for HLRE • FLOPS-Rates CAS2003, Annecy, France, WFS
DS archive capacity [TB] CAS2003, Annecy, France, WFS
DS archive capacity (2001-2003) CAS2003, Annecy, France, WFS
DS transfer rates [GB/day] CAS2003, Annecy, France, WFS
DS transfer rates (2001-2003) CAS2003, Annecy, France, WFS
DS transfer rates (2001-2003) CAS2003, Annecy, France, WFS
Observed Transferrates for HLRE Link Single Stream Aggregate Transferrate [MB/s] Transferrate [MB/s] CS -> DS 13 100 via ftp, (12.1 SUPER-UX) CS -> DS 25 200 via ftp, (12.2 SUPER-UX) CS -> local disk, 40 - 50 > 2.000 (12.1 SUPER-UX) CS -> GFS disk, Up to 90 3.900 (13.1 SUPER-UX) DS -> GFS disk, 500 per node Up to 80 ( Linux) CAS2003, Annecy, France, WFS
Observed FLOPS-rates for HLRE • 4 node performance > approx.100 GLFOPS ( about 40 % Efficiency) for • ECHAM (70-75) • MOM • Radar Reflection on Sea Ice • 24 node performance for Turbulence Code about 470 GFLOPS (30+ % Efficiency) CAS2003, Annecy, France, WFS
Summary • DKRZ provides Computing Resources for Climate Research in Germany on an competitive international level • The HLRE System Architecture is suited to cope with a data-intensive Usage Profile • Shared Filesystems today are operational in Heterogenous System Environments • Standardisation-Efforts for Shared Filesystems needed CAS2003, Annecy, France, WFS
Thank you for your attention ! CAS2003, Annecy, France, WFS
Tape transfer rates (2001-2003) CAS2003, Annecy, France, WFS
DS transfer requests (2001-2003) CAS2003, Annecy, France, WFS
DS archive capacity (2001-2003) CAS2003, Annecy, France, WFS
DS archive capacity (2001-2003) CAS2003, Annecy, France, WFS