1 / 28

Long-term Archiving of Climate Model Data at WDC Climate and DKRZ

Long-term Archiving of Climate Model Data at WDC Climate and DKRZ. Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology , Hamburg Data Management Workshop (Köln, 29.-30.09.09). Structure 2009. DKRZ: Earth system model development

step
Télécharger la présentation

Long-term Archiving of Climate Model Data at WDC Climate and DKRZ

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute forMeteorology, Hamburg Data Management Workshop (Köln, 29.-30.09.09)

  2. Structure 2009 • DKRZ: • Earth system model development • Simulations of past, present and future climate • WDC Climate: • Long-term data archiving • Inter-disciplinary data dissemination

  3. Diagram of Climate System

  4. Diagram of the Hamburg IPCC-Climate Model ECHAM5/MPI-OM

  5. Forcing of Climate Projetions for IPCC AR4

  6. Near surface temperature change for the scenarios A1B und B1. Presented is the difference of the 30-year-means 2071-2100 minus 1961-1990.

  7. Comparison of the present-day sea ice cover In March and September (oben) with the climate projection for the scenario A1B (unten) in 2100. Additionally the snow over land can be obtained.

  8. HLRE-II Architecture(http://www.dkrz.de/dkrz/about/hardware) StorageTek Silos Total Capacity: 60000 Tapes Approx. 60 PB (LTO and Titan) xtape: „get /hpss/arch/<prjid>/<myfile>“ (sftp xtape.dkrz.de) ssh blizzard HPSS (10 Pbyte /a ) IBM Power6 2 x Login 250 x Compute 150 TFlopspeak pftp GPFS (3 Pbyte) tape:/hpss/arch /hpss/doku /dxul/ut /dxul/utf /dxul/utd blizzard: /work /pf /scratch

  9. Development ofdataarchiveat DKRZ (German Climate Computing Centre) • Data production on IBM-P6: 50 PB/year • Limit formassstoragearchive (HPSS): 10 PB/year • Scientific projectdataarchivewithexpirationdate • Limit long-termdataarchive (WDCC): 1 PB/year • Requiredis a completedatacatalogueentry in WDCC (metadata) • Decisionprocedureforlong-termarchivetransitionis not finallyimplemented (datastoragepolicy). • Accessible via WDCC infrastructure • Searchabledatacatalogue (GUI) • Field-basedandfile-baseddataaccess (Internet) • Storage time period: at least 10 years (noexpirationdate)

  10. Development ofmassstoragearchive Mid of 2009: 10 PB Oct. 2008

  11. Data documentation requirements are accomplished by using the WDCC infrastruture • CERA-2 metadata model developed in 1999 • Catalogue interface: cera.wdc-climate.de • Input interface: input.wdc-climate.de • CERA-2 metadata content is complete with respect to browse, to discover and to use climate data which are stored in the database system or outside in flat files • The WDCC matches international description standards like ISO 19115, Dublin Core or GCMD and is integrated in international data federations • Data storage structure assembles field-based storage of climate time series per variable in database tables. This allows for web-based data catalogue search and data access in small data granules.

  12. Contact Coverage Reference Entry Status Parameter Spatial Reference Distribution Local Adm. Data Org Data Access CERA Data Model

  13. Coloured columns correspond to BLOB data tables in WDCC. Collections of matrix rows represents storage in model raw data files (complete model output storage time step by storage time step).

  14. WDCC Developement Future annualgrowth rate: 1 PB / year

  15. WDCC Users (authorisedfordatadownload) 2008

  16. WDCC Data Downloads in 2008

  17. WDCC / CERA: General Statistics at 30-09-2009 00:00:10 • Database Size (TByte): 404 • Number of blobs: 8194476663 (8.2 billion) • Number of experiments: 1378 • Number of datasets: 165376 • Total size divided by number of BLOBs gives the average size of data access granules:50 kB/BLOB (field-based data access)

  18. IPCC WOCE GEBCO BALTEX HOAPS CEOP COSMOS CARIBIC Regional Climate Scenarios IPCC-AR4 (CCLM + REMO) EH5/MPI-OM IPCC-AR4 ERA15/40 NCEP Simulations @ MPI, GKSS,… WDCC Content Data from Earth System Modellingand Related Observations ERA40

  19. Oracle BLOB-DB: dataaccess via http and Java-API

  20. WDCC Catalogue searchanddataaccessinterface (URL: cera.wdc-climate.de) Access to 97 model experiments

  21. WDCC Project-based Data Access (IPCC AR4 Hamburg, ResultsfromIntroduction)

  22. WDCC major accomplishments • Offering many TB of data by a standard web-browser interface and a Java API for direct data download. • Entering the interdisciplinary e-science environment by the primary data publication service. • Independent data entities of more general interest are placed in library catalogues in order to make them searchable with and citable in classical scientific literature • WDCC has more than 50 data entities registered in TIBORDER which are connected to appr. 1.5 TB data volume. • Networking with other topic related WDCs and long-term data archives. • German WDC Cluster Earth System Research (WDC MARE, WDC RSAT and WDCC) • Data sharing with British Atmospheric Data Centre (BADC) • Offering data management services to scientific research projects for long-term archiving and dissemination of research results

  23. Primary data publication service • Following the STD-DOI concept (Scientific and Technical Data – Digital Object Identifier, URL: www.std-doi.de) • Important aspects of the publication process are • The identification of independent data entities which are suitable for publication at the level of scientific literature, • The execution of an elaborated review process for metadata and climate data (quality control), • The assigment of additional metadata for electronic publication (ISO 690-2) and of persistent identifiers (DOI / URN) and • The integration of publication metadata and persistent identifiers into the TIB-Order library catalogue (German National Library of Science and Technology, Hannover) so that primary data entities are searchable and citable together with scientific literature. • Quality characteristic is presently “approved by author”, could be “peer reviewed” with ESSD (Earth System Science Data Journal). • Published data entities cannot be modified any longer. • They are freely available via Internet..

  24. TIB WDCC

  25. Data infrastructure integrates data stewardship in the long-term archive • Bit-stream preservation • Quality assurance • Usability enabling

  26. Long-term archive data stewardship • Bit-stream preservation • Secondary tape copies on different tapes and technology at separate location • Copy to new tapes after maximum number of tape accesses are reached (Refreshment) • Quality assurance • Semantic examinations: behavior of a numerical model compared to observations and to other models, part of the scientific evaluation process • Syntactic examinations: formal aspects of data archiving and ensurance that data archiving is free of errors as far as possible • Consitency between metadata and climate data • Completeness of climate data • Standard range of values • Spatial and temporal data arrangement

  27. Long-term archive data stewardship (continued) • Usability enabling • Complete and searchable documenation of climate data entities (database tables and flat files) in the catalogue system of the WDCC • WDCC offers web-based data access to small data granules (individual entries in BLOB DB tables) • Archive technology transfer must be downward compatible to keep old data technically readable • Data processing tools and data format access libraries must be migrated to new architectures

  28. Summary long-term archiving services at WDCC/DKRZ: • Long-term data storage at WDCC/DKRZ is thematically focused to Earth system research (modeling and related observations) • WDCC provides a fully documented data archive including a web-based searchable data catalogue and web-based data access • WDCC supports field-based data access including server side data processing (extraction of geographical regions and single time steps, format conversion) • WDCC is integrated in national (WDC-Cluster Germany, C3-Grid) and international data federations (IPCC AR5). • WDCC/DKRZ offer within the existing infrastructure long-term data storage for topic related external data entities at net cost basis.

More Related