Enhancing Climate Data Analysis through NcML Aggregations in GHRSST
100 likes | 229 Vues
This document provides an overview of NcML aggregations within the GHRSST framework, focusing on the process of creating virtual datasets for climate-related studies. It highlights that users globally can access and generate aggregations, which can then be shared, enhancing collaborative research. Key insights discuss the importance of time granularity, performance challenges, and the utility of tools like nccopy for managing data. Additionally, the document emphasizes the integration of heterogeneous datasets, supporting diverse disciplines and improving the representation of vector data for GIS users. **Relevant
Enhancing Climate Data Analysis through NcML Aggregations in GHRSST
E N D
Presentation Transcript
GHRSST Aggregations using NcML Upendra Dadi
Aggregation Process http://data.nodc.noaa.gov/opendap/ http://dods.jpl.nasa.gov/opendap/ ghrsst/ ghrsst/ L4/ L4/ /L2P_Gridded /L2P_Gridded L2/ L2/ ghrsst_combined.xml L4/ /L2P_Gridded L2/ (L3 will be addedin GDS v2) Time
Time granularity of originator data is not necessarily same as the granularity required for a data analysis task. NcML aggregations could help here. Ideal for climate related studies. Lessons Learned (not to any scale) hourly daily weekly monthly seasonal annual decadal centurial mellinial
Anyone with access to web could create the aggregations, one doesn't have to be inside NODC. Aggregations created by one user could be used by others. Having a shared repository of NcML files could be useful.
Performance is the biggest short coming. Large amount of time spent on decompressing the data. NetCDF-4 could help. Tools like nccopy are useful to the end user. Having tools to update the local physical version of the dataset when the NcML changes would be useful. Running time for retrieving time series at a point for a two month period for an L4 product is 90 sec Repeating the same query for another point but for the same time period took 2 sec
Issues with caching. It would be useful to have elements in the NcML to update the individual NcMLs in the cache periodically instead of entire cache.
Several interesting possibilities. Allows integration of data from heterogeneous sources over web to create virtual datasets. Datasets from different disciplines could be integrated. Ability to represent vector data using netCDF would make such integration more attractive to mainstream GIS users.
NODC has lot of in-situ(observational) data. Ability to aggregate not just 2d arrays but also individual profiles & trajectories into multi-profiles and multi-trajectories would be very useful. time time
Similar to ETL tools used in Data Warehousing. Equivalents in Relational World, but the data is more complex than most relational databases can handle.