190 likes | 304 Vues
This documentation outlines strategies for improving forecasting systems through enhanced reliability, fault tolerance, and recovery mechanisms. It highlights the importance of quality measurement and change/configuration management, ensuring clear documentation and standards. The text discusses the performance monitoring of forecasting systems, collaboration with CORIE researchers, and rigorous on-call rotation practices to ensure quick response times to issues. Additionally, it delves into various forecasting environments, from production to experimental, and emphasizes the necessity for regular updates, error analysis, and system improvements.
E N D
Forecast Revision Goals • Improve Reliability, Fault Tolerance, Recovery • Measure and Improve Quality • Change Management, Configuration Management, Standards, Documentation • Performance • Flexibility • System Monitoring, Maintenance • Facilitate Collaboration with CORIE Researchers
Towards Reliable Forecasts • Forecast monitoring team Arun, Ethan, Paul Science, systems, software Team members cross-train in specialty Oncall rotation Monitoring and Alerting Big Brother Oversee Change Management
Managing Change • Change and Configuration Management Development, production environments Deploy products from development to production Version control using CVS • Standards Perl, C coding standards CORIE.pm libelio.a • Documentation
Oncall • 24/7 • Weekly rotation • Respond to alerts received via E-mail, pagers and resolve problems – whatever it takes. • Oncall procedures page
Monitoring and Alerting • In control and processing scripts Problems with model forcings Run fails to complete Processing problems • Big Brother Monitors network connectivity, ping Network protocols e.g. HTTP, SSH Disk, CPU Specific processes e.g. master_process.pl
Measure and Improve Quality • Error analysis • 3 and 7 day error analysis (model data comparisons using database) • Summarized values (averaged over all stations) to quantify forecast skill • Comparisons with external forcings (river, wind (TBD)) • Comparisons (TBD) • between forecasts • With near term hindcast • With field exercises • Comparisons with verified data
Databases • Postgresql • Amb105 – production DB server • Amb104 – backup production DB server • Amb36 – development DB server • Ease of access via Perl DBI • Automatic archiving of external data • Telemetry (parallel with process on amb24) • Verified data (TBD) • Performance issues
Forecasts • Reference (AKA Production) • Experimental • Development • Near term hindcast
Reference Forecast • Runs every day • Controlled, infrequent changes • Failure rate minimal, most stable forecast • Atmospheric forcings from eta+osu • Hosted on amb1018
Experimental Forecast • Runs like production mode • Changes managed but more frequently allowed than reference • Failure rate can be higher • Failed forecasts need to be updated • Atmospheric forcings from eta only • Hosted on amb1017
Development Forecast • Does not run in production mode • Minimal results stored (3 days) • Test changes to be incorporated in ref/exp forecasts, e.g. model forcings • Development environment for new products and scripts • Hosted on amb1019
Hindcasts • Runs once a week for past week • Parameter files based on previously set database (currently database06) • Runs based on week number • River forcings from relational database • Atmospheric forcings from locally stored NOAA archive • Hosted on amb1020
Forecast Forcings • River forcings amb1020 daily: 7:45,10:45,13:45,16:45 getforcings.pl (to DB) • Atmospheric forcings amb103 daily: 00:05 get_eta.csh (to NFS) 00:10 get_gfs_air.csh • Atmospheric forcings amb104 daily: 02:00 get_avn.csh (to NFS) 04:00 get_mrf.csh 09:30 get_osu.csh
Forecast execution • On each forecast system daily: 00:10 simlink.pl on local directory 00:10 simlink.pl on NFS directory 09:00 do_error_analysis.pl (processing) 11:00 place_hdf_files_new.csh 11:25 prep.pl 11:35 checkinputs.pl 12:00 start.pl
Forecast processing • Master process, runs continuously as a daemon. Executes on local disk looping over: do_isolines.pl do_ll_isolines.pl do_transects.pl do_hab_isolines.pl do_plumevol.pl do_intrusionlength.pl extract_station_ADP.pl (from DB) extract_station_CTD.pl (from DB) do_stationextraction.pl do_stationplots.pl rsync to NFS
Hindcast Processing • Uses same scripts as forecasts • Remove differences between hindcast and forecast processing (2 vs 7 days) • Some plot parameter file differences
Develop and Deploy • Checkout module from CVS • Modify, add codes on a local copy • CVS commit • Deploy to development environment • Deploy to experimental environment • Deploy to reference environment • Development web page
Going Forward • Improve monitoring in processing codes • Failover for forcings, climatology • Revise relational databases (per Bill H.) • Tune BB threshholds and start paging • Review current products • Document procedures and products • Migrate to new grid, quadrangles • Forecast/forecast forecast/hindcast comparisons using verified data • Comparisons with external forcings