40 likes | 46 Vues
Climate-SDM (1). Climate analysis use case Described by: Marcia Branstetter Use case description Data obtained from ESG Using a sequence steps in analysis, each running scripts in Ferret, CDAT, matlab, …, etc. Need to run the same sequence of steps over many files
E N D
Climate-SDM (1) • Climate analysis use case • Described by: Marcia Branstetter • Use case description • Data obtained from ESG • Using a sequence steps in analysis, each running scripts in Ferret, CDAT, matlab, …, etc. • Need to run the same sequence of steps over many files • sometimes changing the scripts • Sometimes adding/removing a step • Problem: Need workflow to run and track analysis process • Need to collect provenance • Provenance should be rich enough to have another person run the same analysis • Analysis scripts can be using various codes, such as Ferret, CDAT, matlab • Need to keep audit trail, and interaction with external tools • Task: workflow of steps of software versions, scripts, input files, etc. • Goal: repeatedly running workflows to be constructed. Each workflow run will write into a database a record of it, so anyone can reproduce the results or add to that, not necessarily on the same machine. • Tools to be used • Kepler – for composing workflow, and writing provenance to database • Vistrails – for keeping track of evolution of workflows and associated provenance data
Climate-SDM (2) • Scaling analysis process • Described by Marcia Branstetter • Use case description • Need to analyze 6-hourly data over 100 year for atmosphere component • In T85 grid resolution – total volume is in 10-100 TB range, • Data resides on HPSS, order of 12,00 files, a few GBs each • Few TBs for limited number of variables needed in the analysis • Problem: extracting one or a few of the variables from HPSS • Can this process be automated? • Task (longer term): automate process using workflow tools • Problem: Parallelize analysis of large data • Task: use parallel statistics tools • Goal: use Parallel R for such jobs • Task already in progress
Climate-SDM (3) • Earth System Grid • Described by: Dean Williams and Don Middleton • Use case description • 2 modes of getting data to users • Sets of files (using DataMover-Lite (DML)) • Using tools that perform aggregation on server side (OpenDap, CDAT, GRADS, LAS) • Currently only simple statistics needed on server side • Aggregation – hiding file structures on gateway searches is essential • Future needs as data scales • composite product across multiple data nodes • aggregation over multiple data nodes • Compare model runs from different sites • Tracking of precise provenance of how data was generated is needed • Task: using PnetCDF • CCSM4 on top of PnetCDF (already taking place) • netCDF4 has a new extended features – may require similar feature supported in PnetCDF • PnetCDF for post-processing (users still to be identified) • Other I/O bound groups?
Climate-SDM (4) • Earth System Grid (cont’d) • Described by: Dean Williams and Don Middleton • Tasks: improve DML + SRM • Improve DML interface • Use of GridFTP-ssh in DML to speed transfers to client • Explore use of GridFTP-ssh for SRMs • Potential task: Value-based searches • Very Large communities performing impact studies • New community yet to be introduced to ESG • E.g. No of days of temp > 120 F in some region • Currently they use GIS tools on highly summarized data • Potential for need to perform value-based searched at server side as data scales • Potential task: compare simulated to observed data • Currently, ARM data is being converted to be CF (Climate and Forecast) convention compliant in order to be added to ESG holdings • Need to move data to a single site for comparison will require large scale automated data movement