1 / 10

SCD Research Data Archives; Availability Through the CDP

SCD Research Data Archives; Availability Through the CDP. About 500 distinct datasets, 12 TB Diverse in type, size, and format Serving 900 different investigators per year. Enhanced Service through the CDP. What data is best for the CDP?

nelsonbrian
Télécharger la présentation

SCD Research Data Archives; Availability Through the CDP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SCD Research Data Archives; Availability Through the CDP • About 500 distinct datasets, 12 TB • Diverse in type, size, and format • Serving 900 different investigators per year

  2. Enhanced Service through the CDP • What data is best for the CDP? • Datasets that are needed by the largest group of scientists. • Datasets which are typically large (10’s of Gigabytes) and from which spatial, temporal, and parameter subsets are normally preferred. • Other relevant datasets that are often required to support research using the datasets defined above. Global Atmospheric Reanalyses

  3. CDP project, NCEP Reanalysis-2 • About Reanalysis-2 • Proper full name: NCEP/DOE AMIP-II Reanalysis • Experimental follow-on to the popular NCEP/NCAR Global Atmospheric Reanalysis • For the CDP we have chosen one popular product • “Pressure stack”, global 2.5°, 7 variables on 17 pressure levels, 4x daily, and a few surface only grids. • There are other products, e.g. surface flux fields, climatologies • Using a one year sample for CDP study • 1460 file, 2.2 Gbytes • We have data for 1979-1999, continuing. • Total pressure stack data is 45 Gbytes, and growing Data provided by M. Kanamitsu, NCEP

  4. Successes and outlook • It works, we can do it! • Access based on LAS, NCL (NCAR Command Language), and a local file system. • The important key was NCL • NCL can read many file formats (netCDF, GrIB, HDF) • The native format produced at the weather centers (NCEP and ECMWF) is GrIB, a WMO standard.

  5. Outlook • NCL can do much more! • It is a powerful analysis tool • 50+ computational math functions • 10+ routines for scalar and vector regridding • Many atmospheric model specific function – Spherepack etc • We control the development of NCL – important functionality can be added • Through NCL we could offer more analysis capability as part of the CDP

  6. Outlook • Challenges • How can we sensibly scale this system up to handle 100 Gigabyte datasets and multiple users? • A certainty. Users will request large subsets and some will be orthogonal to whatever file structure is chosen • Result. Long computational run times, and large output data files • The requester may not know this in advance • This type of unexpected result => dissatisfactory service

  7. Outlook • Enhancements to avoid unexpected results • Construct algorithms to estimate the run time and output data volume. • For large output files or long running requests • offer delayed service through standard FTP procedures • E.g write the data to an FTP server and notify the user when it is ready. • Some requests will be too large for convenient FTP transfer. • In this case the requester should be referred to the SCD/DSS staff for assistance.

  8. Outlook • Need to enhance the interface to insure complete metadata access • A wealth of critical metadata • Model descriptions • Input data sources • Publications • Associated studies and derived datasets • Many related URL’s • Clear links throughout the CDP so users can find the metadata and get assistance, e.g. SCD/DSS information server. • Need mechanisms to get user feedback

  9. Outlook • May need restriction and authentication procedures for some datasets • Redistribution of some data is restricted, e.g. ECMWF analyses. • With simple registration we are able to provide these data to UCAR members in North America. • All others are excluded.

  10. Wrap-up • We have encouraging results so far and will continue the development • Measure of success – User satisfaction! • Public availability at the CDP will be announced on the SCD URL – scd.ucar.edu • Reanalysis-2 is available now from the MSS or through the SCD/DSS, see dss.ucar.edu/datasets/ds091.0 • Details about the model runs are at: wesley.wwb.noaa.gov/reanalysis2

More Related