230 likes | 356 Vues
World Data Center Climate: Status and Portal Integration. Michael Lautenschlager, Hannes Thiemann and Frank Toussaint ICSU World Data Center Climate Model and Data / Max-Planck-Institute for Meteorology Hamburg, Germany. GO-ESSP at LLNL Livermore, June 19th – 21st, 2006.
 
                
                E N D
World Data Center Climate: Status and Portal Integration Michael Lautenschlager, Hannes Thiemann and Frank Toussaint ICSU World Data Center Climate Model and Data / Max-Planck-Institute for Meteorology Hamburg, Germany GO-ESSP at LLNL Livermore, June 19th – 21st, 2006 WDCC Home: www.wdcc-climate.de / WDCC Contact: data@dkrz.de
Content: WDCC Status CERA Concept Portal Integration
IPCC WOCE GEBCO BALTEX HOAPS CEOP COSMOS CARIBIC EH5/MPI-OM IPCC-AR4 ERA15/40 NCEP Simulations @ MPI, GKSS,… WDCC Content June 2006: 590 Experiments / 79.000 Data Sets Data from Earth System Modelling and Related Observations ERA40 Start: Approved in January 2003 Maintenance: Model and Data (M&D/MPI-M) and German Climate Computing Centre (DKRZ)
Data Export from WDC Climate Corresponds to 2 – 10 TB/month
Geographical Distribution of WDCC Users Total number of registered users: 750 (Mai 2006)
6 * 10**9 BLOBs Data Import into WDC Climate ECHAM5/MPI-OM IPCC AR4 Scenarios (ca. 110 TB)
(I) Data catalogue and Pointer to Unix files Enable search and identification of data Allow for data access as they are (coarse granularity raw data files) (II) Application-oriented data storage in BLOB tables Time series of individual variables are stored as BLOB entries in DB Tables (fine granularity data products) Allow for fast and selective data access Storage in standard data format (GRIB, NetCDF/CF) Allow for application of standard data processing routines (PINGOs, CDOs) CERA1) Concept: Semantic Data Management 1)Climate and Environmental data Retrieval and Archiving
Experiment Description Pointer to Unix-Files Dataset 1 Description Dataset n Description BLOB Data Table BLOB Data Table WDCC Data Topology Level 1 - Interface: Metadata entries (XML, ASCII) + Data Files Level 2 – Interf.: Separate files containing BLOB table data in application adapted structure (time series of single variables) BLOB DB Table corresponds to scalable, virtual file at the operating system level.
Contact Coverage Reference Entry Status Parameter Spatial Reference Distribution Local Adm. Data Org Data Access CERA Data Model
Data matrix of model experiment Model variables Model Run Time Raw data file inDKRZ Archive 2 D: small BLOBS (180 KB) 3 D: large BLOBS (3 MB) Raw data file: direct model output (1.3 – 16.2 GB) Each columm is one BLOB Table in CERA-DB
Climate Model Data Structures Preferred DB-storage structure for web-based access: • single variable • single level • time series of 2D gridded data records • Formats: GRIB-1 – NetCDF/CF (- GRIB-2) original data structure (4-D) Application related data structure (2-D)
DKRZ Architecture TX7: Intel Itanium-2 with Linux
Portal Integration Two strategies: One way integration: discovery and use metadata are integrated in a central data portal in one step Example: C3Grid data catalogue (refer to presentation from Heinrich Widmann) Two way integration: discovery metadata are integrated in central data portal, use metadata are extracted from remote archive when they are needed for data download and processing Example: Primary data publication in TIB library catalogue (STD-DOI) WDCC integration in NDG (NERC Data Grid)
Primary data publication (STD-DOI) URL: http://www.std-doi.de/ Primary Data Publication Process Data Review ISO 690-2: Metadata for citation of electronic media
DOI URN
Ident.-DOI Data retrieval procudure is given at the end (user identification is required)
WDCC Metadaten und OAI-PMH O p e n A r c h i v e s I n i t i a t i v e Protocol for Metadata Harvesting
WDCC OAI server at: • (Software: dlese (www.dlese.org) + apache-tomcat 5.5.12 + Java 1.5) • http://uranus.dkrz.de:8080/oai/provider • - 35 IPCC experiments with more than 11000 datasets • Metadata Format: ISO 19115 • C3Grid (http://gsphere.awi.de:8080/gridsphere/gridsphere) • - 40 STD-DOI experiments with more than 1700 datasets • Metadata Format: DIF • GO-ESSP (NDG, http://ndg.badc.rl.ac.uk/) Ü
NDG OAI Harvesting (Pull or Notification) Ü DIF XMLs WDCC OAI Server WDCC (Software: dlese) OAI Client NDG (dlese) Catalog NDG record 1...n Discovery Portal NDG DIF XMLs Provider 2 OAI Server 2 Process OAI Server n Delivery
URL: http://glue.badc.rl.ac.uk/discovery/ Keyword: ECHAM4