150 likes | 287 Vues
Three Flavors of Data. Science Data Simulations and Sensor Readings Catalog Data Metadata; descriptors of datasets, data products and other processing artifacts. Active Data Data associated with logging, monitoring and scheduling compute tasks. Three Flavors of Data (1). Science Data
E N D
Three Flavors of Data • Science Data • Simulations and Sensor Readings • Catalog Data • Metadata; descriptors of datasets, data products and other processing artifacts. • Active Data • Data associated with logging, monitoring and scheduling compute tasks.
Three Flavors of Data (1) • Science Data • Simulation Data: Solutions to partial differential equations governing the physics of the Columbia River Estuary • Sensor Data: measurements of the physical characteristics used to guide and validate simulations • Wanted: • Simple means for specifying new data products from these raw data and computing them efficiently • Approach: • Data manipulation language based on a GridField data model.
Three Flavors of Data (2) • Catalog Data • Explicit metadata to describe system artifacts • Wanted: • Tools to locate artifacts given descriptors (query) • A metadata collection facility that tolerates change • The metadata we wish to collect may change (eg, new product ‘lines’ are developed) • The source of the metadata may change (eg, file naming conventions or directory structures evolve.) • Approach: • Generic database; custom collection scripts
Three Flavors of Data (3) • Active Data • Data describing past, current, and future compute tasks. • Wanted: • Tools for scheduling, monitoring, and managing... • individual tasks (eg, a single data product derivation) • groups of interdependent tasks (eg, a daily forecast run) • campaigns (eg, a series of calibration runs followed by a re-computation of the runs of 2002 with a different implicitness) • Approach: • undecided
Simulation Data: GridFields • The data product suite exhibits recurring processing idioms • larger grids reduced to smaller grids Ex: ‘estuary’ data products vs. ‘far’ data products • grids mapped to other grids Ex: 3D grid mapped to a 2D slice • grids combined Ex: 1D depth grid ‘crossed’ with a 2D horizontal grid.
Simulation Data: GridFields (2) • We’re expressing these idioms as operators over a grid-based data model. Advantages: • Simpler recipes • 5 ops for all the data products (plus helper functions) • Flexible model; fewer maintenance troubles • N dimensions • uniform handling of space and time (maybe more...) • Any cell type • segments, triangles, quadrangles, arbitrary polytopes • Optimization opportunities • operators prescribe semantics, but not implementation • topological equivalences exposed and exploited
Simulation Data: GridFields (3) Status: • Core operators functional • Simple examples hooked to XMVIS for viewing • Todo: • Examples hooked to VTK • Write/Test examples from the current product suite • Support GridFields too large for memory • Expose a nice syntax for writing recipes
Catalog Data: Collection Where is the Metadata? /forecasts/2003-184/run/images/isosal_estuary7/anim-sal_estuary_7.gif File Path File Name 1_salt.63 File Content Version: 1.04 Variable: salt : Other Files?
Collection scripts • For each file type the meta-data collection mechanism is different. • gifs • binary output • Param.in • Use a script for each file type that will emit meta-data for that type of file. • Only these simple scripts need change as the system evolves
Example: gif animation Depth = “7” Variable = “Salinity” product line = “isoline” /forecasts/2003-184/.../isosal_estuary7/anim-sal_estuary_7.gif CorieDate = “2003-184” Type = “Animation” Region = “Estuary” Lat = xxxx Long = xxxx Here, a script can just parse the path and file name
Example: Binary output /forecasts/2003-184/run/1_salt.gif Variable= “Salinity” What about number of nodes? Mean Sea Level? 1_salt.63 nodes: 55817 msl: 4285 : : We need to access the file’s content Need a different mechanism than for gif animations; might be convenient to implement it in a different script.
Architecture invokes Reflector Collection Script • Reflector creates XML file containing meta-data for each file and also stores the meta-data into the database • Reflector determines file type (based on regular expressions) and calls appropriate collection script • Collection script uses an “AddItem” Perl function to return the meta-data back to the reflector Meta-data XML DB
Metadata in XML and DB? • These XML files give you filesystem-based access to the metadata for an artifact • Use “info” to present the XML in a readable form: /../run> info 1_salt.63 variable: salt version: 1.04 msl: 4285 nodes: 55817 • Also useful if DB is inaccessible.
Minor Technical Change • Previously we had suggested that the collection scripts should emit metadata on standard output • We have provided a perl function AddItem(Name,Value,Notes,Type)
How does this help ? • Find artifacts via descriptors (query) • ‘find animations showing the estuary where we used a constant bottom friction coefficient’ • where region = “estuary” and type = “animation” and ntau = “0” • Write robust metadata-driven programs • Chris’ low bandwidth zoom web app • Stay-Fresh Powerpoint Slides