1 / 18

Status of WDC Climate and long-term archiving at DKRZ

Status of WDC Climate and long-term archiving at DKRZ. Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht, Stephan Kindermann, Wolfgang Stahl German Climate Computing Centre (DKRZ) Hamburg

darci
Télécharger la présentation

Status of WDC Climate and long-term archiving at DKRZ

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Status of WDC Climate and long-term archiving at DKRZ Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht, Stephan Kindermann, Wolfgang Stahl German Climate Computing Centre (DKRZ) Hamburg CAS2K9 September 13th – 16th, 2009 in Annecy, France

  2. HLRE-II Architecture StorageTek Silos Total Capacity: 60000 Tapes Approx. 60 PB (LTO and Titan) xtape: „get /hpss/arch/<prjid>/<myfile>“ (sftp xtape.dkrz.de) ssh blizzard HPSS (10 Pbyte /a ) IBM Power6 2 x Login 250 x Compute 150 TFlopspeak pftp GPFS (3 Pbyte) tape:/hpss/arch /hpss/doku /dxul/ut /dxul/utf /dxul/utd blizzard: /work /pf /scratch

  3. Development oflong-termdataarchive • Data production on IBM-P6: 50-60 PB/year • Limit for long-term archiving: 10 PB/year • Required is a complete data catalogue entry in WDCC (metadata) but decision procedure for long-term archive transition is not finally decided (data storage policy). • Limit for field-based data access: 1 PB/year • Oracle BLOB-tables are replaced by CERA container file infrastructure which is developed by DKRZ/M&D

  4. Development oflong-termfilearchive Mid of 2009: 10 PB Oct. 2008

  5. Development of WDCC (field-baseddataaccess) Mid of 2009: 400 TB Oct. 2008

  6. Transfer oflong-termfilearchivefrom DXUL/UniTreeto HPSS

  7. HLRE2 Data System: HPSS Data system (HPSS) (Information on DKRZ Webserver) The DXUL/UniTree will be replaced by HPSS (High Performance Storage System). The existing DXUL-administered data - about 9 PetaByte – will be transferred. 6 robot-operated silos with 60.000 slots for T10000 A/B, LTO4, 9940B and 9840C magnetic cartridges provide a primary capacity of 60 PetaByte with 75 tape drives. The average bandwidth of the data server is at least 3 GigaByte/s while simultaneously reading and writing with peak flow rate up to 5 GigaByte/s. 390 TB Oracle BLOB data transferred into CERA container files

  8. Migration from DXUL to HPSS 9 PB DXUL/UniTree data have to be transfered to HPSS without copying the data • 9 PB DXUL data are stored in 25,000 cartridges and 25 * 10**6 files • It was not feasible to run two systems in parallel for 3 -5 years which is the estimated time for copying from DXUL/UniTree to HPSS at DKRZ Challenges of physical movement from Powderhorn (Unitree) into SL8500 (HPSS): • Technical aspects • Legal aspects • Quality assurance

  9. Migration from DXUL to HPSS Challenges of physical movement from Powderhorn (Unitree) into SL8500 (HPSS): • Technical aspects: In principal it is possible to read UniTree cartridges with HPSS but it has been tested with old systems and with less complexity of name spaces (17 name spaces on 3 servers have to consolidated into 1 HPSS name space) • Legal aspects:An unexpected license problem appeared with the proprietary UniTree library data format. Solution was to write data library information after consilidation into one large text file (10 GB). • Quality assurance: complete comparison of metadata and checksum comparison of a subset of 1% of the data files Transfer to HPSS has been successfully completed, the new system is up and running with the old data.

  10. 3 of 6 StorageTek SL8500 silos under construction Room for 10 000 magnetic cartridges in each silo

  11. WDCC and HLRE2 • CERA-2 data modelleft unchanged • Metadata model modifications are planned in relation to the outcome of the EU-project METAFOR and CIM (Common Information Model) • WDCC metadataare still residing in Oracle database tables which build the searchable data catalogue

  12. Contact Coverage Reference Entry Status Parameter Spatial Reference Distribution Local Adm. Data Org Data Access CERA-2 Data Model • METAFOR / CIM: • Data provenance information • Searchable Earth system model description Unchanged since 1999

  13. WDCC and HLRE2 • Field-based data access is changing from Oracle BLOB data tables into CERA container files for two reasons: • Financial aspect: Oracle license costs for an Internet accessible database system of the size of PB are out of DKRZ‘s scope.

  14. WDCC and HLRE2 • Technical aspect: The BLOB data concept in the range of TB and PB requires seamless data transition between disk and tape in order to keep the RDBMS restartable. This worked for Oracle and UniTree but it could not be guaranteed for the future neither by Oracle nor by HPSS. • Requirement for BLOB data replacement: Transfer to CERA container files has to be transparent for CERA-2 and user data access.

  15. Earth System Model datamatrixiscoveredby CERA Container Files Model variables • CERA Container Files • are LOBs plus index for random data access • are transparent for field-based data access in WDCC • include basic security mechanisms of Oracle BLOBs Model Run Time Each columm is one data table in CERA-2 2 D: small LOBs (180 KB) 3 D: large LOBs (3 MB)

  16. WDCC and HLRE2 • Motivated by long-term archive strategy and scientific applications like CMIP5/AR5 the WDCC data access is extended: • CERA Container Files: transparent field-based data access from tapes and disks (substitution of Oracle BLOB data tables) • Oracle B-Files: transparent file-based data access from disk and tapes • Thredds Data Server: field-based data access from files on disks (CMIP5/AR5) • Intransparent data access: URLs provide links to data which are not directly/transparently accessible by WDCC/CERA (e.g. remote data archives)

  17. WDCC Field-based Data Access Midtier Storage@DKRZ TDS (or the like) Archive: files Appl. Server HPSS LobServer Container: Lobs DB Layer CERA • When • How • What • Where • Who

  18. Summary Three major decisions are made in connetion with long-term archiving in transistion to HLRE2 and HPSS: • Limitation of annual growth rates • File archive: 10 PB/year • CERA Container Files: 1 PB/year • Development of CERA Container File infrastructure with emphasis on field-based data access from tapes • Integration of transparent file-based data access into WDCC/CERA in addition to traditional field-based data access

More Related