ESM post processing workflow

ESM post processing workflow

ESM post processing (PP) tables on (http://esm.zmaw.de) • In these tables will be allinformation necessary for post processing (PP) : • What (which code, …) will be filled in CERA data base (and archived (=DISK?)) ? • Which PP of output is necessary for which code ? • What are the corresponding meta data (CF-names available ?, …) • See also : cdo –t <partab> <file> ? => Unify and completePP-tables for different models (see continuing slides) => Create or adapt corresponding CERA code tables(see below : new table ‘millenium’ or old ‘echam5’, … ???)

ESM post processing tables on (http://esm.zmaw.de) • Fields necessary for PP or/and CERA : • Code : has to be unique (within model) • Short name : name in model; used for temporary post processing files • Long name : same as CF-std.-name, if exists (goal : find for all var. CF names !?) • Unit, dimension • Raw=model out ; PP=postproc ; CERA=in DB !! (diff. Between PP and CERA ??) • ATTRB, DISC ?? • + Midrange/Longterm-ARCHIVE ??

Post processing steps • Write raw output of component models on work space (WS) /disk array (DA) (by expid.run) • Postprocessing on workspace WS (by expid.post) • Regrid, pressure levels, … (afterburner) • Convert output in grib format, if necessary (CDOs) • tar and szip files (prepare for archiving) • Further postprocessing, e.g. monthly means, … (CDOs) • Split raw files in (time series of) codes (for dbfill) • Archive output (by expid.arch) • Keep midrange storage data on disk DA • Longterm storage data on archive AR • Fill postproc. output in CERA (DB) (by expid.dbfill) • Quality assurance, controlling, clean up etc.

expid.run AR (long term) Experiment (Model run) Write raw files on DA expid.post Rawfiles: - multicode DA:mr_out/ archive WS Pp. data to archive Pp. data for DB Postprocessing Transfer DB files expid.dbfill CERA-DB Pp. data for DB Fill in DB General scheme

Questions to the modellers • Fill out PP- and code-tables ! • Which output has to be stored where ? • Archive raw (and postproc. ?) files as tar/szip/grib files on mid-range(DA)/long-term(/ut) ? -> ‘DISC’? • Store time series/monthly means of which codes in CERA data base ? -> see tables => Which temporary files can be (re)moved when ? • Is this changing for different experiments ? • Further infos PP (esp. DB fill) has to ‘know’ ?

Action items for IMDI (SRE) • Create SRE-scripts for each component model (called by expid.post) : • expid.echam.post (more or less ready ?) • expid.jsbach.post (open; as echam ?) • expid.mpiom.post (open) • expid.hamocc.post (open; as mpiom ?) • Trigger automatic data base filling • Quality assurance, Monitoring, Controlling, clean up etc.

ECHAM5 run AR expid.run Write raw files on DA expid.echam.post archive Monthly grib files (multicode) WS ATM.expid.grb afterburner BOT.expid.grb split in codes code1 : 1 code time series code2 : 1 code time series ………………….. tar and transfer to DB expid.dbfill DB Pp. data for DB Fill in DB ECHAM

JSBACH (as ECHAM ?) .post

MPI-OM run AR expid.run Write raw files on DA Expid.mpiom.post archive Runper. multicode files (grib-szip) Monthly multicode files (grib) Monthly multicode files (extra) Runper. multicode files (grib) WS conv2grib concatenate szip split in codes code1 : 1 code time series code2 : 1 code time series ………………….. Transfer DB files expid.dbfill DB Pp. data for DB Fill in DB MPI-OM

Monthly ? Rawfiles (netCDF) HAMMOC (as MPI-OM ??) .post

Monitoring and error handling • Check (automatically) file sizes, … after each PP-step (as well tar files) • If output checked in step m => • Status (‘ok’ or ‘error’) in corresponding log file • If error occurs => • Assure, that errors are detected in time and communicated to responsible persons ! • If error occurs => What are the necessary actions (stop step m-1 ? , …. ??) • Assure ‘restart’ of workflow, if status is set again of ‘ok’

Expid.run ?? ok ok Expid.post ok ok ERROR ! Expid.dbfill Synchronous workflow Model time [months]

( Visions or the future ?? ) ESM post processing workflow

Post processing steps (vision 1) • Write model output (timeseries of codes, means, …) directly by the model run (expid.run) in the data base (DB) • Optional : Archive output on disc/tape (AR)

expid.run optional Experiment (Model run) AR archive Write_sql ( Output from model … • … directly from model run as it should stored in DB ( e.g. timeseries of singular codes, monthly means) ) DB General scheme (vision 1)

Vision 1 : Counter arguments • Actual postprocessing (esp. afterburner (convert spectral to regular grid, pressure levels, ‘merging’ of codes etc. )) has to be implemented in the model itself • What happens if data base filling ‘hangs’ • On the compute server must exist a data base (Oracle) interface • ….

Post processing steps (vision 2) • Write model raw output (multicode and –level, model specific formated and gridded, …) on workspace (WS) (expid.run) • Postprocess and prepare data on DA and prepare for archiving and data base filling (by expid.post) • Write this postproc. data fromWS / DA in the data base (DB) (in expid.post) • Optional : Archive output on disc/tape (AR) (in expid.post)

expid.run Experiment (Model run) AR Write raw files on DA archive expid.post Rawfiles: - multicode, modelspecific WS Pp. data to archive Pp. data for DB Postprocessing Fill in DB DB General scheme (vision 2)

Vision 2 : Preconditions • Workspace DA must be mounted on a system accessible by the data base, • Or in other words : files should be directly written from WS in DB • Performance ? : Write and read on the same disc (but same problem with file transfer ?) • … further counter arguments ??

ESM post processing workflow