1 / 24

Running Jobs on Jacquard

Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005. Topics. Interactive Serial Parallel Limits Batch Serial Parallel Queues and Policies Charging Comparison with Seaborg. Execution Environment.

Télécharger la présentation

Running Jobs on Jacquard

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

  2. Topics • Interactive • Serial • Parallel • Limits • Batch • Serial • Parallel • Queues and Policies • Charging • Comparison with Seaborg

  3. Execution Environment • Four login nodes • Serial jobs only • CPU limit: 60 minutes • Memory limit: 64 MB • 320 compute nodes • “Interactive” parallel jobs • Batch serial and parallel jobs • Scheduled by PBSPro • Queue limits and policies established to meet system objectives • User input is critical!

  4. Interactive Jobs • Serial jobs run on login nodes • cd, ls, pathf90, etc. • ./a.out • Parallel jobs run on compute nodes • Controlled by PBSPro mpirun -np 16 ./a.out qsub -I -q interactive -l nodes=8:ppn=2 % cd $PBS_O_WORKDIR % mpirun -np 16 ./a.out qsub -I -q batch -l nodes=32:ppn=2,walltime=18:00:00

  5. PBSPro • Marketed by Altair Engineering • Based on open source Portable Batch System developed for NASA • Also installed on DaVinci • Batch scripts contain directives: #PBS -o myjob.out • Directives may also appear as command-line options: qsub -o myjob.out …

  6. Simple Batch Script #PBS -l nodes=8:ppn=2,walltime=00:30:00 #PBS -N myjob #PBS -o myjob.out #PBS -e myjob.err #PBS -A mp999 #PBS -q debug #PBS -V cd $PBS_O_WORKDIR mpirun -np 16 ./a.out

  7. Useful PBS Options (1) -Arepo Charge this job to repository repo Default: Your default repository -N jobname Provide name for job; up to 15 printable, non-whitespace characters Default: Name of batch script -q qname Submit job to batch queue qname Default: batch

  8. Useful PBS Options (2) -S shell Specify shell as the scripting language Default: Your login shell -V Export current environment variables into the batch job environment Default: Do not export

  9. Useful PBS Options (3) -o outfile Write STDOUT to outfile Default: <jobname>.o<jobid> -e errfile Write STDERR to errfile Default: <jobname>.e<jobid> -j [eo|oe] Join STDOUT and STDERR on STDOUT (eo)or STDERR (oe) Default: Do not join

  10. Useful PBS Options (4) -m [a|b|e|n] E-main notification a = send mail when job aborted by system b = send mail when job begins e = send mail when job ends n = do not send mail Options a, b, and e may be combined Default: a

  11. Batch Queues

  12. Batch Queue Policies • Each user may have: • One running interactive job • One running debug job • Four jobs running over entire system • Only one batch128 job is allowed to run at a time. • The batch256 queue usually has a run limit of zero. NERSC staff will arrange to run jobs of this size.

  13. Submitting Batch Jobs % qsub myjob 93935.jacin03 % • Record jobid for tracking!

  14. Deleting Batch Jobs % qdel 93935.jacin03 %

  15. Monitoring Batch Jobs (1) • PBS command qstat % qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 93295.jacin03-ib job5 einstein 00:00:00 R batch16 93894.jacin03 EV80fl02_3 legendre 0 H batch16 93330.jacin03 test.script laplace 00:00:23 R batch32 93897.jacin03 runlu8x8 rasputin 0 Q batch32 93334.jacin03-m mtp_mg_3wat_o2a fibonacci 00:00:11 R batch16 ... • Use -u option for single-user output % qstat -u einstein Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 93295.jacin03-ib job5 einstein 00:00:00 R batch16 %

  16. Monitoring Batch Jobs (2) • NERSC command qs % qs JOBID ST USER NAME NDS REQ USED SUBMIT 93939 R gauss STDIN 1 00:30:00 00:10:43 Oct 2 16:47:00 93891 R einstein runlu4x8 16 01:00:00 00:38:48 Oct 2 15:23:36 93918 R inewton r4_16 8 01:00:00 00:10:37 Oct 2 15:36:35 ... 93785 Q inewton r4_64 32 01:00:00 - Oct 2 08:42:36 93828 Q rasputin nodemove 64 00:05:00 - Oct 2 12:00:11 93897 Q einstein runlu8x8 32 01:00:00 - Oct 2 15:24:27 ... 93893 H legendre EV80fl02_2 4 03:00:00 - Oct 2 15:24:23 93894 H legendre EV80fl02_3 4 03:00:00 - Oct 2 15:24:24 93917 H legendre EV80fl98_5 4 03:00:00 - Oct 2 15:26:06 ... • Also provides -u option

  17. Monitoring Batch Jobs (3) • NERSC website has current queue look: http://www.nersc.gov/nusers/status/jacquard/qstat • Also has completed jobs list: http://www.nersc.gov/nusers/status/jacquard/pbs_summary • Numerous filtering options available • Owner • Account • Queue • Jobid

  18. Charging • Machine charge factor (cf) = 4 • Based on benchmarks and user applications • Currently under review • Serial interactive • Charge = cf • cputime • Always charged to default repository • All parallel • Charge = cf • 2 • nodes • walltime • Charged to default repo unless -A specified

  19. Things To Look Out For (1) • Do not set group write permission for your home directory; it will prevent PBS from running your jobs. • Library modules must be loaded at runtime as well as linktime. • Propagation of environment variables to remote processes is incomplete; contact NERSC consulting for help.

  20. Things To Look Out For (2) • Do not run more that one MPI program in a single batch script. • If your login shell is bash, you may see: accept: Resource temporarily unavailable done. In this case, specify a different shell using the -S directive, such as: #PBS -S /usr/bin/ksh

  21. Things To Look Out For (3) • Batch jobs always start in $HOME. To get to directory where job was submitted: cd $PBS_O_WORKDIR For jobs that work with large files: cd $SCRATCH/some_subdirectory • PBS buffers output and error files until job completes. To view files (in home directory) while running: -k oe

  22. Things To Look Out For (3) • The following is just a warning and can be ignored: Warning: no access to tty (Bad file descriptor). Thus no job control in this shell.

  23. LoadLeveler vs. PBS

  24. Resources • NERSC Website http://www.nersc.gov/nusers/resources/jacquard/running_jobs.php http://www.nersc.gov/vendor_docs/altair/PBSPro_7.0_User_Guide.pdf • NERSC Consulting 1-800-66-NERSC, menu option 3, 8 am - 5 pm, Pacific time     (510) 486-8600, menu option 3, 8 am - 5 pm, Pacific time consult@nersc.govhttp://help.nersc.gov/

More Related