130 likes | 232 Vues
Explore the journey of NERC Data Grid from its inception, modular design, metadata management, to future goals like ESG/VCDAT integration. Discover how GridFTP enhances large file transfers for NERC data collaboration.
E N D
The NERC DataGrid: Making data interdisciplinary! Bryan Lawrence (BADC) Roy Lowry (BODC), Kerstin Kleese van Dam, Kevin O’Neill, Andrew Woolf (CLRC), Dean Williams (PCMDI) and the rest of the NDG team.
Outline • Brief intro to the NDG goals: • Fat clients, and inter-disciplinarity … • Evolution: from BODC&BADC to the wider community. • Modular Design • Keeping track of the metadata • Harvesting • Where we are now …(we only started in Sep02) • What lies ahead?
ESG/VCDAT: Example of a Client Application • We will: • Provide python based classes for our observational data to complement the access to 3D gridded data. • It will be possible to overlay model and observational data using grid tools.
NDG: Required “Data” Metadata Need a tool to generate B!
Using Globus GridFTP Statement: GridFTP is a “faster-stronger” FTP. Is it? Yes: tests between DL and RAL and RAL and POL suggest an average factor of two in performance for large file transfers (although peak FTP rates can reach ¾ of GridFTP). And: network reliability and bandwidth at 2 Mbit/s is not good enough for sustained large file transfer without GridFTP! (500 Mbyte file requires 40 minutes cf 80 minutes). But: 2 Mbit/s is too slow to deal with the file sizes of interest! Big problem for NERC … The same 40 minute file would take under a minute between DL and RAL • Means a different trade off between client and server processing • Fat clients for Fat pipes, Thin clients for Thin pipes.
Under-the-hood, the power of XML … ECMWF ERA40 • Many TB in spectral format • Double that in NetCDF! • Want to avoid using tape-drives! We have: • Implemented a new caching system based on CDAT to do “on-demand” conversion from spectral-NetCDF • Information in LAS database drives the CDAT back-end, if possible use existing NetCDF otherwise convert on-the-fly. Next: • Will link this and other datasets to our intermediate schema. • Need to add Met Office data (working on new drivers).
Problems ahead? • Access control: technical issues, policy, social habits, trust. • Quality of existing metadata, how to collect what we need? • Joining it all together, Joining all us together … • Making sure we are OGC, ISO compliant. • Making it robust … • (Ten times as much effort as making it work, Tony Hey, June 2003)