Best Practices for Environmental Data Sharing
This guide provides step-by-step instructions on preparing, validating, and manipulating environmental data for sharing and archival purposes. Learn about quality assurance techniques, data manipulation, analysis, and reproducibility methods.
Best Practices for Environmental Data Sharing
E N D
Presentation Transcript
Data Organization Quality Assurance and Transformations
Data Validation Hook, et al. 2010. Best Practices for Preparing Environmental Data Sets to Share and Archive. Available online: http://daac.ornl.gov/PI/BestPractices-2010.pdf. • Check for missing, impossible, anomalous values • Plotting • Mapping • Examine summary statistics • Verify data transfers from notebooks to digital files • Verify data conversion from one file format to another
Preserve & Record Information Processing Script (R) Keep Original (Raw) File • Do not include transformations, interpolations, etc. • Make the raw data “read-only” Save as a new file
Data Manipulation • You will need to repeat reduction and analysis procedures many times • You need to have a workflow that recognizes this • Scripted languages can help capture the workflow • You could just document all steps by hand • After the 20th iteration through your data set; however, you may feel more fondly towards scripted languages • Learn the analytical tools of your field • Talk to colleagues, etc. and choose at least one tool to master
Preserve Processing Information Temperature data (T) Data import into R Data in R format Salinity data (S) Quality control & data cleaning “Clean” T & S data Analysis Summary statistics Graph Production • Scripts used in file cleaning • Programs / algorithms • Document workflows or data file transformations
Preserving: Scripted Notes • Use a scripted language to process data • R Statistical package (free, powerful) • SAS • MATLAB • Processing scripts records processing • Steps are recorded in textual format • Can be easily revised and re-executed • Easy to document • GUI-based analysis may be easier, but harder to reproduce
Reproducibility Methods Do use version control Do document software environment Only save what cannot be reconstructed from original data + code