150 likes | 258 Vues
Explore the essential components of the C3Grid project within the German Collaborative Climate Community Data and Processing Grid. Learn about data analysis workflows, architecture, metadata, discovery techniques, and access methods. Discover how the C3Grid infrastructure supports climate analysis workflows through stepwise integration of grid technologies.
E N D
Data Discovery and Basic Processing within the German Collaborative Climate Community Data and Processing Grid (C3Grid) Project Heinrich Widmann and Stephan Kindermann Model and Data / DKRZ / Max-Planck-Institute for Meteorology Hamburg, Germany GO-ESSP at LLNL Livermore, June 19th – 21st, 2006 C3Grid Home: www.c3grid.de
Overview • C3Grid Background • Data Analysis Workflows • C3Grid Architecture and Interfaces • Data Discovery and Metadata in C3-Grid • Data Information Service with Lucene • Data Access and Preprocessing • Summary
C3Grid Background • C3Grid • Status : month 10 of 36 (phase 1) • is the earth system science community grid within the German D-Grid initiative • D-Grid includes five further community grid projects (AstroGrid, HEP-Grid, InGrid, MediGrid, TextGrid) • is a community driven grid • Goal is to develop a grid infrastructure appropriate for typical climate analysis workflows • Stepwise introduction and integration
C3Grid Data Analysis Workflow Requirements Grid technologies ISO19115 / ISO19139 OAI-PMH + Lucene community webservice Shibboleth Globus Toolkit 4 WS-GRAM Requirements Metadata Discovery Data access (+ preprocessing) Security Scheduling Complex processing
C3Grid Architecture and Interfaces Data Access and Basic Processing Data Discovery
C3Grid Data Discovery and Data Access C3 Metadata catalog Portal ISO 19115 / 19139 OAI harvester Discovery - Discovery - Workflow composition Use Data request OAI-PMH Scheduling Data Management Service Grid Infrastructure Metadata Data Access Web Service resource provider Web server / OAI provider Prop. Xml Prop. Rel. job submission • oids • time/space constraints • processing constraints preprocessing DB Files WS-GRAM World Data Centers (Climate,Mare,RSAT), DWD PIK, IFM-Geomar,.. analysis job data data data workspace workspace workspace workspace
gridded data Data Items: <MD_Metadata http://www.isotc211.org/xxx"> <fileIdentifier ../> <resourceConstraints ../> <extent … spatial+temporal bounding box .. /> <contentInfo ..> <attributeDescription ../> <distributionInfo ..> <DS_Series> <composed_of> <composed_of> </MD_Metadata> Metadata Metadata Metadata Database “implicit” Metadata • Raw Experiment Data • 3D multi variable • files • Postprocessed • Experiment Data • 2D single variable • time series <MD_Metadata …. > Post-processing <MD_Metadata …. > Archive Database C3 ISO 19139 Metadata “Profile”
C3Grid Data Information Service with Lucene inverted index Portal Webserver Apache Axis + Servlet Container Web service frontend indexing of selected fields full-text index DIS Apache Lucene harvesting backend <MD_Metadata>...</MD_Metadata> <MD_Metadata>...</MD_Metadata> <MD_Metadata>...</MD_Metadata> <MD_Metadata>...</MD_Metadata> OAI-PMH Archiv Pangaea CERA cache for ISO19139 documents [T. Langhammber, ZIB, Berlin]
C3Grid Data Access and Preprocessing • Data access interface • Community-specific webservice (WSDL) • Solutions of the individual institutes will be adapted to support the webservice • e.g. triggering of local data processing tools • Support data base and file based storage types • More detailed use metadata will be provided during the extraction process with the data
C3Grid Data Access/Preprocessing Interface data data data CF standard names Local variable names Stage file webservice request contains : • ObjectList of OIDs requested • CFList of standard names • Space constraints • Time constraints • Target directory • File format, e.g. netCDF or grib • … Constraints necessary processing SOAP-XML StageFile Request Files Data Access Web service DB Access CDO processing
Summary • Grid development is application driven • Discovery is based on • ISO 19115/19139 based metadata catalog • Hierarchical, two-leveled metadata scheme • Text based search in the catalog • Data access is implemented by • Proprietary C3Grid data access interface (webservice) • Part of the use data are provided along with the data extraction
C3Grid Architecture User User Interface API (Web Services) GUI Monitoring Job Submission • DistributedGrid Infrastructure • GT4 based • new Metadata-Service Search Workflow Scheduler DMS (global) Matchmaking DIS ResourceInformationService Staging Data Transfer Service Harvesting Task Execution Site C3Grid Components OAI / WS File Management DMS (local) Resource Scheduler Base Data & Meta Data Pre-Proc Data Job Meta Data ArchiveInterface Grid Workspace AvailableResources DBMS/File DistributedData Archives Distributed Processing Resources