1 / 28

GFDL Data Portal

GFDL Data Portal. Current Status, Achievements and Future Development. K.Dixon, V.Balaji, S.Nikonov GFDL, Princeton. NOAATECH-2006. History. Data Portal was launched in 1995 as simple ftp server. The idea and the term “Data Portal” arose 3 years ago.

olina
Télécharger la présentation

GFDL Data Portal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GFDLDataPortal Current Status, Achievements and Future Development K.Dixon, V.Balaji, S.Nikonov GFDL, Princeton NOAATECH-2006

  2. History • Data Portal was launched in 1995 as simple ftp server. • The idea and the term “Data Portal” arose 3 years ago. • Originally it served data by occasional requests. • Now the main assets are IPCC data. NOAATECH-2006

  3. Common technical characteristics Software • Red Hat Linux • Apache Web Server • DODS Aggregation Server • THREDDS • LAS Server • GrADS-DODS NOAATECH-2006

  4. Hardware • Dell Power Edge 2650 machine • Dual Processor Intel Xeon 2.4 GHz • 3 GB RAM • 7 Dell Power Vault 220S with • 14 HDs in each, 19 TB total (expansion pending up to 35 TB) • Network bandwidth: internet – 9 Mbit/s internet-2 – 100 Mbit/s NOAATECH-2006

  5. WEB Site Structure NOAATECH-2006

  6. Basic Metadata • Model description • Experiment description • Institution • Extra metadata for treating tripolar grids (including ferret scripts for their visualization) • Metadata is compliant with standard CF • Metadata accompanies each data file NOAATECH-2006

  7. Basic features GFDL LAS server • Dynamic data presentation chosen by user • Spatial/time subsampling with included metadata • Defining on a fly new variables calculated by given formula • ferret visualization NOAATECH-2006

  8. General Statistics01-Oct-2004 to 01-Oct-2005 • Total amount of CM2 Climate Model Data: 12 TB • More then 10000 NetCDF files, average file size: 1 GB • Successful requests: ~62,000 • Average successful requests per day: ~200 • Distinct files requested: 5,000 • Distinct hosts served: ~850 • Data transferred: 15 TB • Average data transferred per day: ~42 GB • Number of journal articles submitted that include analyses of GFDL CM2 model output: > 100 NOAATECH-2006

  9. Current standard procedure of publishing data • Climate Model Output Rewriter (CMOR) processing • manual configuring for different models, experiments, variables • triggered manually • Quality Control • made by scientist, includes checking metadata, time ranges, values diapasons, etc. • Splitting up CMORized, QC-ed data into small (<2GB) NCDF files and pushing them out of firewall to Data Portal • manual configuring scripts doing this • starting scripts manually • Preparing checksum report on Data Portal • running cron started script • Configuring Aggregation Server and LAS • made manually NOAATECH-2006

  10. Current Data Portal workflow NOAATECH-2006

  11. Desirable Features of Data Portal • Relational Database storing metadata with description of • model components and model configuration • scenarios • postprocessing (model output and CMOR) • experiments • variables • formulized rules of Quality Control • data locations in Archive • task scheduler • users and groups accounts • XML as data exchange format • for compliance with FMS Runtime Environment (FRE) • working format of existing third party software • good fitted for hierarchical metadata description • prevalent in world, easy to exchange with others Data Portals • Publisher Control Center (PCC) • controls CMOR subsystem • controls Data Publisher Manager • controls data quality (QAC) NOAATECH-2006

  12. Desirable Features of Data Portal(continue) • Climate Model Output Rewriter (CMOR) subsystem • prepares data consistently with specific project requirements • Data Publisher Manager • transfers data to target destination in accordance to settings from DB • Front-end Data Portal Software Package • Configuration Manager (configures Aggregation Server and Data Portal Interface) • Search Catalog Engine • Data Subsampling Engine • Data Computation Engine • Data Visualization • Data Delivery Manager NOAATECH-2006

  13. Proposed functionality schema of ‘GFDL Data Factory’ NOAATECH-2006

  14. Standard scenario of functioning Model Data Factory (ideal picture) • Scientist builds model in existing GFDL FMS Runtime Environment System (FRE) using available model components, datasets and forcing scenario. • FRE puts metadata about built model, scenario, experiment into “curator” DB and runs experiment; • Postprocessing subsystem extracts metadata about postprocessing plan from “curator” DB and executes it, and on finish puts metadata about processed experiment back into DB. • Data Publisher (DP) regularly checks “curator” DB for new experiments marked as “public” and if finds any invokes CMOR. • CMOR goes to “curator” DB for metadata and processes needed data following metadata instructions. • DP calls QAC and then transfers data to Data Portal storage. • Configuration Manager configures Aggregation Server and Data Portal Interface and puts records about new public data in “curator” DB. • End of process, data is ready to go. NOAATECH-2006

  15. Database Compartments: Database ‘curator’design • Model Metadata Compartment contains models’ descriptions, allows to build coupled model of needed configuration • Variables Compartment List of all related physical variables • Workflow Compartment contains scenarios, experiments, institutions, projects and users info • Postprocessing Compartment defines postprocessing plan for conducting experiment • Data Portal Compartment contains info about experiment data NOAATECH-2006

  16. Interaction between compartments NOAATECH-2006

  17. Coupled_Models Model_List Component_Medias Models Variables Model Metadata Compartment(in development) Workflow Compartment Experiments Variables Compartment NOAATECH-2006

  18. Components_Medias Coupled_Models Model_List Models Data Samples from Model Compartment NOAATECH-2006

  19. Variables Variable_Bundles Variable_Lists Variable_List_Contents Projects Proj_Var_Names Variables Compartment Workflow Compartment NOAATECH-2006

  20. Proj_Var_Names Variables Variable_List_Contents Variable_Lists Variable_Bundles Data Sample from Variables Compartment NOAATECH-2006

  21. GFDL_USERS Institutions Experiment_Status Realization Projects Experiments Scenarios Workflow Compartment NOAATECH-2006

  22. Scenarios Experiments Data Samples from Workflow Compartment NOAATECH-2006

  23. Post_Proc PP_Units Coupled_Models Projects GFDL_USERS PP_Content Average_Periods Variable_Lists PP_Content PP_Units Postprocessing Compartment Data Samples from Postprocessing Compartment NOAATECH-2006

  24. Data_Files Data_Grids Variables MissedData_Descriptors Experiments Coupled_Models Variable_Bundles Data Portal Compartment NOAATECH-2006

  25. Data_Files MissedData_Descriptors Data_Grids Data Samples from Data Portal Compartments NOAATECH-2006

  26. Curator DB on Data Portal stream • Curator DB is already used on GFDL Data Portal. • JSP technology with servlets on backend was applied • New data transferred onto Data Portal is automatically registered in Curator DB with all accompanied metadata. • It turned out the fastest way to search for data on Data Portal: CM2.0 CM2.1 NOAATECH-2006

  27. Another Aspects of Future Development • Set up model metadata schema standards in scientific community and develop SQL metadata schema. • Populate Curator with real metadata extracted from GFDL models. • Conjugate Curator DB with GFDL FMS Modeling System • Customize LAS server to use the Curator DB • Design user interfaces NOAATECH-2006

  28. END Questions? Thanks! NOAATECH-2006

More Related