210 likes | 336 Vues
The Research Data Archive at NCAR, established in 1965, has evolved to support diverse data needs across atmospheric and geosciences research. As the archive now stores over 600 terabytes of data, including core datasets on climate, oceanography, and renewable energy, the infrastructure and management tools are being updated to enhance accessibility. With over 7,000 registered users seeking immediate data access, investments in IT infrastructure, and improved archive management are critical for supporting researchers and data generators effectively.
E N D
The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR
Topic Outline • Introduction/History • Core Data Categories/Featured Datasets • Archive Management/Tools • New Supporting IT Infrastructure • Future Possibilities AMS 2011
Introduction/History • Data Support Section (Founded 1965) • Paper -> Punch Cards -> Tapes -> CD/DVD’s ->Hard Drives -> Network Based Storage and Transfer • KB of observations -> Terabytes of Model Generated Data (Total archive volume over 600 TB) • Weeks or months for a user to get data -> Users want data access now (over 7000 registered users) • Pay for Data -> Free and open access to all datasets that aren’t subject to source restrictions AMS 2011
Introduction/History • How do we evolve to support the growing needs of data users and generators? • Stay aware of current research uses • Strengthen datasets supporting core research data categories • Update archive management tools • Rebuild/Augment IT infrastructure • Educate supporting staff AMS 2011
Core Data Categories • Content to support atmospheric and geosciences research • Some research examples: • Climate • Oceanographic • Hydrologic • Weather Prediction • Renewable Energy (Wind/Solar) AMS 2011
Core Data Categories • Operational and Reanalysis model outputs Meteorological and Oceanographic Observations Remote Sensing Observations • Topography/Bathymetry, Vegetation, Land Use AMS 2011
Featured Datasets 1662 Global Platform Observations 2011 AMS 2011
Featured Datasets 1850 Analysis and Forecast Model Data 2011 AMS 2011
Featured Datasets 1870 High Resolution Re-Analysis 2011 AMS 2011
Archive Management How can we support an archive that continuously grows in volume and complexity with a fixed number of supporting staff? AMS 2011
Archive Management • Common Data Management Tools • Functionality Requirements • Scalable • Integrated –one call does all • Automatable AMS 2011
Archive Management • Common Data Management Tools • Task Completion Requirements • Data acquisition • Get Data (daily or irregularly) • Data Archival • Archive to disk and tape • Metadata Collection • Collect Metadata • Update Metadata Databases • Metadata Publishing • Update Web Server Pages • Update Internal Metadata Access Points AMS 2011
Step 1: Get Data Integrated Archival Tools Model Generated Data GRIB, NetCDF Automated dsupdt RDA/CISL Servers Obs Data BUFR, ASCII etc. Remote Sensing Data Binary Manual Tape, FTP, etc Topography Vector Image, Binary, etc AMS 2011
Step 2: Archive Data Integrated Archival Tools RDA/CISL Servers Model Generated Data GRIB, NetCDF RDA Database Model Generated Data Files GRIB-2 HPSS File attribute metadata: Name, Dataset, Location, Format Model Generated Data File Obs Data BUFR, ASCII etc. dsarch Remote Sensing Data Binary DISK Topography Vector Image, Binary, etc Model Generated Data File AMS 2011
Step 3: Collect File Content Metadata/Check Integrity Integrated Archival Tools RDA/CISL Servers Model Generated File, GRIB-2 Format Temperature (Center, Date, Time, Level, Location) RDA DB Humidity (Center, Date, Time, Level, Location) File attribute metadata: Name, Dataset, Location, Format Gather Meta data File content metadata: T(C,D,T,L,L) RH(C,D,T,L,L) Vort(C,D,T,L,L) Vis(C,D,T,L,L) PcpR(C,D,T,L,L) Vorticity (Center, Date, Time, Level, Location) Visibility (Center, Date, Time, Level, Location) Precip Rate (Center, Date, Time, Level, Location) AMS 2011
Step 4: Publish Metadata and Data Integrated Archival Tools RDA/CISL Servers RDA Web Server RDA DB -Dynamic File lists -Data Search tools -Detailed Content Metadata -Data Subsetting Interfaces File attribute metadata: Name, Dataset, Location, Format File content metadata: T(C,D,T,L,L) RH(C,D,T,L,L) Vort(C,D,T,L,L) Vis(C,D,T,L,L) PcpR(C,D,T,L,L) CISL Computational Node -Detailed Metadata for files on disk. -Data Subsetting AMS 2011
New Supporting IT/Infrastructure • Online Disk Upgrades • Larger Disk (450 TB) • Common Disk Interfaces (webserver and compute nodes) • Tape Archive Upgrades • High Performance Storage System (HPSS) • Computing Power Upgrades • Additional and more powerful servers AMS 2011
New Supporting IT/Infrastructure NCAR User Community Pros: -Access to full RDA. -Fast computing. Complete User Community Pros: -Fast access to online data. -Access to all RDA metadata. -Access to RDA data. processing services. NCAR User Community Cons: -No access to online data. -Forced to use MSS as a file server: access is too slow -No direct access to RDA metadata. Complete User Community Cons: -Small fraction of RDA online. -Slow access to offline data. -Data processing requests take a long time to finish. AMS 2011
New Supporting IT/Infrastructure Complete User Community Improvements: -Faster access to full RDA. -Expanded data processing services available. -Faster turnaround on data processing requests. NCAR User Community Improvements: -Faster access to full RDA. -Direct access to all RDA metadata. AMS 2011
Future Possibilities • Leverage New IT Infrastructure • Server side parameter and spatial sub-setting across multiple datasets • Model or In-Situ observations • Data provided in multiple output formats • Web services based requests (REST, etc.) • Addition of large and diverse data sets to the RDA. AMS 2011
http://dss.ucar.edu AMS 2011