1 / 71

Training session 2 : Advanced training course on modipsl and libIGCM November 14 th 2013, MdS

Training session 2 : Advanced training course on modipsl and libIGCM November 14 th 2013, MdS. Outline. IPSL climate modelling centre (ICMC) presentation IPSLCM history and perspective Mini how to use modipsl/libIGCM Post-processing with libIGCM Monitoring a simulation Hands-on.

lenci
Télécharger la présentation

Training session 2 : Advanced training course on modipsl and libIGCM November 14 th 2013, MdS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Training session 2 :Advanced training course on modipsl and libIGCMNovember 14th 2013, MdS

  2. Outline • IPSL climate modelling centre (ICMC) presentation • IPSLCM history and perspective • Mini how to use modipsl/libIGCM • Post-processing with libIGCM • Monitoring a simulation • Hands-on

  3. IPSL climate modelling centre (ICMC)http://icmc.ipsl.fr

  4. Modeling platform(IPSL-ESM)Arnaud Caubel (LSCE) - Marie-Alice Foujols (IPSL) Current and future climate changesJean-Louis Dufresne(LMD) - Olivier Boucher (LMD) Atmospheric and surface physics and dynamics (LMDZ)Frédéric Hourdin (LMD) - Laurent Fairhead (LMD) Paleoclimate and last millennium Pascale Braconnot - Masa Kageyama (LSCE) Ocean and sea ice physics and dynamics (NEMO, LIM)C Ethé (IPSL) - Claire Lévy - Gurvan Madec (LOCEAN) “Near-term” prediction (seasonal to decadal)Eric Guilyardi (LOCEAN) - Juliette Mignot (LOCEAN) Atmosphere and ocean interactions (IPSL-CM, different resolutions) Sébastien Masson (LOCEAN) - Olivier Marti (LSCE) Regional climatesRobert Vautard (LSCE), Laurent Li (LMD) Atmospheric chemistry and aerosols (INCA, INCA_aer, Reprobus)Anne Cozic (LSCE) - M. Marchand (LATMOS) Biogeochemical cycles (PISCES)Laurent Bopp (LSCE) - Patricia Cadule (IPSL) Evaluation of the models, present-day and future climate change analysis Sandrine Bony (LMD) - Patricia Cadule (IPSL) - Marion Marchand (LATMOS) - Juliette Mignot (LOCEAN) – Jérôme Servonnat (LSCE) Data Archive and Access RequirementsSébastien Denvil (IPSL) - Karim Ramage (IPSL) ICMC organisation PI: J-L Dufresne; Office: L. Bopp, MA Foujols, J. Mignot Steering committee Continental processes (ORCHIDEE)Philippe Peylin (LSCE) - Josefine Ghattas (IPSL)

  5. Outline • IPSL climate modelling centre (ICMC) presentation • IPSLCM history and perspective • Mini how to use modipsl/libIGCM • Post-processing with libIGCM • Monitoring a simulation • Hands-on

  6. IPSLCM history

  7. IPSLCM history and scientific articles IPCC reports FAR AR5 SAR TAR AR4 1990 1995 2001 2007 2013 CMIP projects CMIP3 CMIP 1 & 2 CMIP5 few articles IPSL-CM1 some articles IPSL-CM2 10+ articles IPSL-CM4 30+ articles IPSL-CM5 IPSL-CM6

  8. LMDZ : atmospheric componenthttp://lmdz.lmd.jussieu.fr/?set_language=en Next LMDZ training session : 9-11 December 2013inscription before 15th November http://studs.unistra.fr/studs.php?sondage=1wgk8t9v44nsml27

  9. Introduction to LMDZ

  10. NEMO: oceanic componenthttp://www.nemo-ocean.eu

  11. Short history of IPSL modelhttp://icmc.ipsl.fr/index.php/icmc-models

  12. 1979 : 1st Linpack performance list 80 Mflops

  13. Supercomputers timeline : top500.org *10/4 years

  14. Complexity and resolution of models IPCC, AR4, WG1, Chap. 1, fig 1.2 and 1.4

  15. top500.org : number of CPUS/cores 100 000 1 000 10 1993 2003 2013

  16. Technical challenges : HPC • More parallelism in component : • MPI : messages programming • hybrid ie MPI/OpenMP : directives and shared memory • More parallelism in coupled model • 2 or 3 executables • each with MPI or MPI/OpenMP • more executables with XIOS : IO servers • Huge amount of data produced, to be analysed

  17. On the road to IPSL-CM6 • New physical package : LMDZ, NEMO, ORCHIDEE • Increased H and V resolutions • Ensembles of simulations • Longer simulations : paleo • More complexity : INCA chemistry added • More processors used in parallel • New dynamical core : DYNAMICO • Optimization in IO and coupling • Improvement and Reliability of libIGCM

  18. Outline • IPSL climate modelling centre (ICMC) presentation • IPSLCM history and perspective • Mini how to use modipsl/libIGCM • Post-processing with libIGCM • Monitoring a simulation • Hands-on

  19. Summary : Extract, compile and launch a simulation of _v5 configuration • Download MODIPSL svn co http://forge.ipsl.jussieu.fr/igcmg/svn/modipsl/trunk modipsl • Extract a configuration (ex: IPSLCM5_v5)cd modipsl/util ; ./model IPSLCM5_v5 • Compilation cd modipsl/config/IPSLCM5_v5 ; gmake [resol] • Create submission directory cp EXPERIMENT/IPSLCM5/piControl/config.card . vi config.card ### Modify at least JobName=MYEXP ../../util/ins_job ### copy of piControl directory in MYEXP with COMP, DRIVER, PARAM • Launch simulation cd modipsl/config/IPSLCM5_v5/MYEXP; ccc_msub Job_MYEXP / llsumbmit Job_MYEXP

  20. IPSL sources of components cvs/svn servers Connection Specific configuration dowloading Modipsl Compilation Simulation set up LibIGCM Physical package choice and set up Job set up and submission LibIGCM Front End Computing

  21. Generical job: AA_Job PeriodLength

  22. libIGCM library : schematic description EXP00/DRIVER EXP00 driver EXP00/COMP card

  23. Job_EXP00 Job_EXP00 Job_EXP00 Job_EXP00 Computing job PackFrequency pack_debug PackFrequency pack_restart RebuildFrequency rebuild pack_output Post-processing jobs SeasonalFrequency create_se atlas atlas create_ts TimeSeriesFrequency create_ts monitoring

  24. TGCC computers and file system in a nutshell Computers airainfront-end curie hybrid nodes-q hybrid airainnodes curiefront-end curiethin nodes -q standard curielarge nodes -q xlarge login compute File system Small precious filesSaved space $HOME $CCCWORKDIR sources small results IGCM_OUT : MONITORING/ATLAS cp dods/work dods_cp temporary REBUILD IGCM_OUT : files to be packed outputs of post-proc jobs $SCRATCHDIR cp quotas $CCCSTOREDIR IGCM_OUT : Packed resultsOutput, Analyse SE and TS dods/store ccc_hsm get dods_cp HPSS : Robotic tapes Temporary space Non saved space Saved space Space on tapes Visible from www October 2013

  25. curie Job_EXP00 Job_EXP00 Job_EXP00 Compute TGCC PeriodLength PeriodLength $SCRATCHDIR/IGCM_OUT/.../REBUILD RebuildFrequency rebuild Post curie $SCRATCHDIR/IGCM_OUT/XXX/Output $SCRATCHDIR/IGCM_OUT/XXX/Restart Debug PackFrequency PackFrequency pack_restart pack_debug ncrcat tar pack_output Post curie $CCCSTOREDIR/IGCM_OUT/.../RESTART DEBUG $CCCSTOREDIR/IGCM_OUT/XXX/Output TimeSeriesFrequency SeasonalFrequency create_ts create_se Post monitoring atlas curie TS et SE : $CCCSTOREDIR/IGCM_OUT/…  dods/storeMONITORING et ATLAS : $CCCWORKDIR  dods/work DodsCopy=TRUE/FALSE

  26. IDRIS computers and file system in a nutshell turingfront-end turingcalcul adappfront-end adappcompute adacompute login compute Small precious filesSaved space $HOME File system $HOME sources small results temporary REBUILD IGCM_OUT : files to be packed outputs of post-proc jobs $WORKDIR $WORKDIR $TMPDIR $TMPDIR $TMPDIR mfput/mfget mfput/mfget gaya dods $HOME dmput/dmget IGCM_OUT :Output, Analyse MONITORING/ATLAS dods_cp Robotic tapes Temporary space Non saved space Saved space Space on tapes Visible from www October 2013

  27. ada Job_EXP00 Job_EXP00 Job_EXP00 Compute IDRIS PeriodLength PeriodLength $WORKDIR/IGCM_OUT/.../REBUILD RebuildFrequency rebuild Post adapp $WORKDIR/IGCM_OUT/XXX/Output $WORKDIR/IGCM_OUT/XXX/Restart Debug PackFrequency PackFrequency pack_restart pack_debug ncrcat tar pack_output Post adapp gaya:IGCM_OUT/.../RESTART DEBUG gaya:IGCM_OUT/XXX/Output TimeSeriesFrequency SeasonalFrequency create_ts create_se Post monitoring atlas adapp DodsCopy=TRUE/FALSE gaya:IGCM_OUT/…  dods.idris.fr

  28. Outline • IPSL climate modelling centre (ICMC) presentation • IPSLCM history and perspective • Mini how to use modipsl/libIGCM • Post-processing with libIGCM • Monitoring a simulation • Hands-on

  29. Time Series : create_ts.job • A Time Series is a file which contains a single variable over the whole simulation period (ChunckJob2D = NONE) or for a shorter period for 2D (ChunckJob2D = 100Y) or 3D (ChunckJob3D = 50Y) variables. • The write frequency is defined in theconfig.cardfile: TimeSeriesFrequency=10Yindicates that the time series will be written every 10 years and for 10-year periods. • The Time Series are set in the COMP/*.card files by the TimeSeriesVars2D and TimeSeriesVars3D options. • The Time Series coming from monthly (or daily) output files are stored on the file server in the IGCM_OUT/TagName/[SpaceName]/[ExperimentName]/JobName/Composante/Analyse/TS_MO and TS_DA directories. • Bonus : TS_MO_YE (for annual mean time series) are produced for all TS_MO variables • You can add or remove variables to the TimeSeries lists according to your needs. [Post] ... #D- If you want to produce time series, this flag determines #D- frequency of post-processing submission (NONE if you don't want) TimeSeriesFrequency=10Y config.card • [OutputFiles] • List= (histmth.nc, ${R_OUT_ATM_O_M}/${PREFIX}_1M_histmth.nc, Post_1M_histmth),\ • ... • [Post_1M_histmth] • Patches= () • GatherWithInternal = (lon, lat, presnivs, time_counter, time_counter_bnds, aire) • TimeSeriesVars2D = (bils, cldh, ... • ... • ChunckJob2D = NONE • TimeSeriesVars3D = (upwd, lwcon, ... • ... • ChunckJob3D = OFF COMP/lmdz.card

  30. MONITORING : dods

  31. Intermonitoring : http://webservices.ipsl.jussieu.fr/monitoring/ More details in Appendix

  32. How to add a new variable in MONITORING • You can add or change the variables to be monitored by editing the configuration files of the monitoring. Those files are defined by default for each component. • The monitoring is defined here: ~shared_account/atlas For example for LMDZ on curie : ~p86ipsl/monitoring01_lmdz_LMD9695.cfgFor example for LMDZ on adapp : ~rpsl035/monitoring01_lmdz_LMD9695.cfg • You can change the monitoring by creating a POST directory which is part of your configuration. Copy a .cfg file and change it the way you want. • use ferret language • You can monitor variables produced in time series and stored in TS_MO • More information (in French): wiki.ipsl.jussieu.fr/IGCMG/Outils/ferret/Monitoring POST/monitoring01_lmdz_LMD9695.cfg • #-------------------------------------------------------------------------------------------------------- • # field | files patterns | files additionnal | operations | title | units | calcul of area • #-------------------------------------------------------------------------------------------------------- • nettop_global | "tops topl" | LMDZ4.0_9695_grid.nc | "(tops[d=1]-topl[d=2])" | "TOA. total heat flux (GLOBAL)" | "W/m^2" | "aire[d=3]"

  33. Seasonal mean : create_se.job • A seasonal means files (SE) contain averages for each month of the year (jan, feb,...) for a frequency defined in the config.card files • SeasonalFrequency=10Y The seasonal means will be computed every 10 years. • SeasonalFrequencyOffset=0 The number of years to be skipped for calculating seasonal means. • All files with a requested Post (Seasonal=ON in COMP/*card) are then averaged within the ncra script before being stored in the directory: • IGCM_OUT/IPSLCM5A/DEVT/pdControl/MyExp/ATM/Analyse/SE. There is one file per SeasonalFrequency • ATLAS are launched by create_se. ATLAS sources are : ~rpsl035 ~p86ipsl/atlas • More information (in French): wiki.ipsl.jussieu.fr/IGCMG/Outils/ferret/Atlas #======================================================================== #D-- Post - [Post] ... #D- If you want to produce seasonal average, this flag determines #D- the period of this average (NONE if you don't want) SeasonalFrequency=10Y #D- Offset for seasonal average first start dates ; same unit as SeasonalFrequency #D- Usefull if you do not want to consider the first X simulation's years SeasonalFrequencyOffset=0 config.card • [OutputFiles] • List=(histmth.nc, ${R_OUT_ATM_O_M}/${PREFIX}_1M_histmth.nc, Post_1M_histmth),\ • ... • [Post_1M_histmth] • ... • Seasonal=ON COMP/lmdz.card

  34. Outline • IPSL climate modelling centre (ICMC) presentation • IPSLCM history and perspective • Mini how to use modipsl/libIGCM • Post-processing with libIGCM • Monitoring a simulation • Hands-on

  35. Monitoring the simulation Verification and Correction

  36. Monitoring a simulation • We strongly encourage you to check your simulation frequently during run time. First of all, check job status : ccc_mstat llq • Real time limit exceeded : jobs are killed without any message on ada • RunChecker.job : This tool, provided with libIGCM, allows you to find out your simulations' status. • One historical simulation, 156 years : 1850-2005 is composed by 50 computing jobs and 1000 post-processing jobs Documentation http://forge.ipsl.jussieu.fr/igcmg/wiki/platform/en/documentation

  37. RunChecker.job • RunChecker.job helps you to monitor all the jobs produced by libIGCM for a simulation

  38. RunChecker.job OK

  39. RunChecker.job KO

  40. RunChecker.job : usage and options This script can be launched from anywhere. Usage: path/to/libIGCM/RunChecker.job [-u user] [-q] [-j n] [-s] job_name path/to/libIGCM/RunChecker.job [-u user] [-q] [-j n] -p config.card_path path/to/libIGCM/RunChecker.job [-u user] [-q] [-j n] -r Options : -h : print this help and exit -u user : owner of the job -q : quiet -j n : print n post-processing jobs (default is 20) -s : search for a new job in $WORKDIR and fill in the catalog before printing information -p path : give the absolute path to the directory containing the config.card instead of the job name (needed only once) -r : check all running simulations. 1) path/to/libIGCM/RunCkecker.job –p $CCCWORKDIR/CURIE/CMIP5/R1414/IPSLCM5A_20120731/modipsl/config/IPSLCM5A/v5.rcp45CMR2 2) path/to/libIGCM/RunCkecker.job v5.rcp45CMR2

  41. Monitoring a simulation : mail • You receive a message at the end of the simulation • The simulation could be completed or failed De : rpsl003@idris.fr Objet : COURSNIV2 completed Date : 22 octobre 2013 18:29:24 UTC+02:00 À : rpsl003@idris.fr Dear rpsl003, Simulation COURSNIV2 completed on supercomputer ada027 Simulation started : 20000101 Simulation ended : 20000102 Output files are available in /u/rech/psl/rpsl003/IGCM_OUT/IPSLCM5A/DEVT/pdControl/COURSNIV2 Files to be rebuild are temporarily available in /workgpfs/rech/psl/rpsl003/IGCM_OUT/IPSLCM5A/DEVT/pdControl/COURSNIV2/REBUILD Pre-packed files are temporarily available in /workgpfs/rech/psl/rpsl003/IGCM_OUT/IPSLCM5A/DEVT/pdControl/COURSNIV2 Script files, Script Outputs and Debug files (if necessary) are available in /gpfs5r/workgpfs/rech/psl/rpsl003/ADA/COURS/NIV2/IPSLCM5_v5/modipsl/config/IPSLCM5_v5/COURSNIV2 Greetings! Check this out for more information : https://forge.ipsl.jussieu.fr/igcmg/wiki/platform/documentation Mail Début du message réexpédié : De : rpsl003@idris.fr Objet : MyJobTest failed Date : 22 octobre 2013 17:17:41 UTC+02:00 À : rpsl003@idris.fr Dear rpsl003,

  42. Monitoring a simulation : run.card • When the simulation has started, the file run.card is created by libIGCM using the template run.card.init. • run.cardcontains information of the current run period and the previous periods already finished. • This file is updated at each run period by libIGCM. • You can find here information of the time consumption of each period. • The status of the job is set to OnQueue, Running, Completed or Fatal. [Configuration] #last PREFIX OldPrefix= COURSNIV2_20000103 #Compute date of loop PeriodDateBegin= 2000-01-04 PeriodDateEnd= 2000-01-04 CumulPeriod= 4 # State of Job "Start", "Running", "OnQueue", "Completed" PeriodState= Completed SubmitPath= /gpfs5r/workgpfs/rech/psl/rpsl003/ADA/COURS/NIV2/IPSLCM5_v5/modipsl/config/IPSLCM5_v5/COURSNIV2 #======================================================================== [PostProcessing] TimeSeriesRunning=n TimeSeriesCompleted= #======================================================================== [Log] # Executables Size LastExeSize= ( 88011086, 0, 0, 19956686, 0, 0, 1523952 ) #----------------------------------------------------------------------------------------------------------------------------------- # CumulPeriod | PeriodDateBegin | PeriodDateEnd | RunDateBegin | RunDateEnd | RealCpuTime | UserCpuTime | #----------------------------------------------------------------------------------------------------------------------------------- # 1 | 20000101 | 20000101 | 2013-10-22T17:53:48 | 2013-10-22T17:55:10 | 82.01000 | 4.21000 | # 2 | 20000102 | 20000102 | 2013-10-22T18:28:03 | 2013-10-22T18:29:17 | 74.19000 | 4.09000 | # 3 | 20000103 | 20000103 | 2013-10-23T17:28:50 | 2013-10-23T17:30:26 | 95.21000 | 4.30000 | run.card

  43. Verification and correction 1/6 • Where did the problem occur ? • 1 "failed" email : Main computation job => gaya stopped at IDRIS, hardware problem ? Check Script_output_xxxx. => When gaya restarted, or if there isn't any clear error message, try relaunching (after a clean_month): path/to/libIGCM/clean_month.job ccc_msub (llsubmit) Job_...

  44. Verification and correction 2/6 • Where did the problem occur ? • 1 "failed" email : Main computation job : analyse Script_output_xxxx ####################################### # ANOTHER GREAT SIMULATION # ####################################### 1ère partie (copying the input files) ####################################### # DIR BEFORE RUN EXECUTION # ####################################### 2ème partie (running the model) ####################################### # DIR AFTER RUN EXECUTION # ####################################### 3ème partie (post-processing) ####################################### http://forge.ipsl.jussieu.fr/igcmg/wiki/platform/en/documentation/G_suivi#AnalyzingtheJoboutput:Script_Output

  45. Verification and correction 3/6 --> analyse Script_output_xxxx : In general, if your simulation stops you can look for the keyword "IGCM_debug_Exit" or ERROR in this file. This keyword will come after a line explaining the error you are experiencing. ===================================================================== EXECUTION of : /usr/bin/time ccc_mprun -E-K1 -f ./run_file Return code of executable : 153 IGCM_debug_Exit : EXECUTABLE !!!!!!!!!!!!!!!!!!!!!!!!!! !! ERROR TRIGGERED !! !! EXIT FLAG SET !! !------------------------! IGCM_sys_Mkdir : …/modipsl/config/IPSLCM5_v5/COURSNIV2KO/Debug IGCM_sys_Cp : out_execution …/modipsl/config/IPSLCM5_v5/COURSNIV2KO/Debug/COURSNIV2KO_20050401_20050430_out_execution_error ===================================================================== Updated 19/11/2013

  46. Verification and correction 4/6 --> Check closely the sub directory Debug (if it exists) Check file xxxxx_error in Debug/ : • contains LMDZ standard output. LMDZ often fails in hgardfou. Stopping in hgardfou • contains abends (abnormal termination / exception) of each and every component. Check standard outputs for NEMO, ORCHIDEE, INCA, OASIS • Debug/xxxx_ocean.output • Debug/xxxx_output_orchidee • Debug/xxxx_inca.out • Debug/xxxx_cplout

  47. Debug examples • Segmentation fault : check file xxxxx_error in Debug : information on the model which crashes. forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source p25mpava_lmdz.x_2 0000000000EF005B Unknown Unknown Unknown p25mpava_lmdz.x_2 00000000006F293D Unknown Unknown Unknown p25mpava_lmdz.x_2 00000000006BB58F Unknown Unknown Unknown p25mpava_lmdz.x_2 0000000000477A6F Unknown Unknown Unknown p25mpava_lmdz.x_2 0000000000457C99 Unknown Unknown Unknown p25mpava_lmdz.x_2 00000000004568BC Unknown Unknown Unknown libc.so.6 00000034AB81ECDD Unknown Unknown Unknown p25mpava_lmdz.x_2 00000000004567B9 Unknown Unknown Unknown • Compilation and run in « debug mode »

  48. Debug examples • Compilation in « debug mode » • Default mode = « prod mode » (i.e optimized mode to run production runs) • Help of the compiler : compiler options may help to find : • -traceback (to have stack details) • -check bounds (to check array bounds,…) • -fp-stack-check (to check NaN,…) • -g (in order to use a debugger) • other : see compiler documentation... • Where do I have to add these options ? Depends on the model : • ORCHIDEE and IOIPSL : « modipsl/util/AA_make.gdef » (+ ins_make command) #-Q- curie F_O = -DCPP_PARA -xHost -O3 -g -traceback -fp-stack-check $(F_D) $(F_P) -I$(MODDIR) -module $(MODDIR) • LMDZ and INCA : « Makefile » in config/xxx/ by adding « -debug » ou « -dev » in the compilation line : (cd ../../modeles/INCA3; ./makeinca_fcm -debug -chimie CH4 -resol (...) ../../bin/inca.dat ; ) (cd ../../modeles/LMDZ; ./makelmdz_fcm -cpp ORCHIDEE_NOOPENMP -debug -d (..) ../../bin/gcm.e;) • NEMO : « Makefile » in « modeles/NEMO/WORK/Makefile » F_O = -O3 -i4 -r8 –xHost -traceback -module $(MODDIR)/oce -I$(MODDIR) -I$(MODDIR)/oce -I$(NCDF_INC) $(USER_INC) => Work on progress to make it easier !

  49. Debug examples • Strange values (or not as expected) in output files or other pb… • Runtime ,1st debug level : outputs files always available (not migrated) in output directory IGCM_OUT/… • Space name=TEST in config.card ( i.e no packing, eveything is on the $SCRATCHDIR(curie) or $WORKDIR(ada)). • Put « Rebuildfrequency » to 1 period (ex : 1M) in config.card • Runtime, 2nd debug level : outputs files quickly • Space name=TEST in config.card ( i.e no packing, eveything is on the $SCRATCHDIR(curie) or $WORKDIR(ada)). • Rebuildfrequency to 1 period (ex: 1M) in config.card • On Curie : Use of « test » queue (limits : 2 jobs per user, 8 nodes and 1800s per job) • #MSUB -T 1800 # Time limit • #MSUB -Q test # test queue • On ada : no « test » queue (not needed because no waiting time so far) • (No rebuild (expert level !) : remove output files in cards) • Runtime, 3rd debug level : use of a debugger • Compilation option « -g » (intel compiler on curie and ada) • Use of « test » queue on Curie • see IDRIS or TGCC documentation on the use of « ddt » or « totalview » • use of « statistics » of variables, breakpoints,…

More Related