1 / 38

Flux for PBS Users HPC 105

Flux for PBS Users HPC 105. Dr. Charles J Antonelli LSAIT ARS August, 2013. Flux. Flux is a university - wide shared computational discovery / high -performance computing service. Interdisciplinary Provided by Advanced Research Computing at U-M (ARC) Operated by CAEN HPC

may-ratliff
Télécharger la présentation

Flux for PBS Users HPC 105

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Flux for PBS UsersHPC 105 Dr. Charles J Antonelli LSAIT ARSAugust, 2013

  2. Flux • Flux is a university-wideshared computational discovery / high-performance computing service. • Interdisciplinary • Provided by Advanced Research Computing at U-M (ARC) • Operated by CAEN HPC • Hardware procurement, software licensing, billing support by U-M ITS • Used across campus • Collaborative since 2010 • Advanced Research Computing at U-M (ARC) • College of Engineering’s IT Group (CAEN) • Information and Technology Services • Medical School • College of Literature, Science, and the Arts • School of Information http://arc.research.umich.edu/resources-services/flux/ cja 2013

  3. The Flux cluster Login nodes Compute nodes Data transfernode Storage … cja 2013

  4. Flux node 48 GB RAM 12 Intel cores Local disk Ethernet InfiniBand cja 2013

  5. Flux Large Memory node 1 TB RAM 40 Intel cores Local disk Ethernet InfiniBand cja 2013

  6. Flux hardware • 8,016 Intel cores 200 Intel Large Memory cores632 Flux nodes 5 Flux Large Memory nodes • 48/64 GB RAM/node 1 TB RAM/ Large Memory node4 GB RAM/core (allocated) 25 GB RAM/Marge Memory core • 4X Infiniband network (interconnects all nodes) • 40 Gbps, <2 us latency • Latency an order of magnitude less than Ethernet • Lustre Filesystem • Scalable, high-performance, open • Supports MPI-IO for MPI jobs • Mounted on all login and compute nodes ES13

  7. Flux software • Licensed software • http://cac.engin.umich.edu/resources/software/flux-softwareet al • Compilers & Libraries: • Intel , PGI, GNU • OpenMP • OpenMPI cja 2013

  8. Using Flux • Three basic requirements to use Flux: • A Flux account • An MToken (or a Software Token) • A Flux allocation cja 2013

  9. Using Flux • A Flux account • Allows login to the Flux login nodes • Develop, compile, and test code • Available to members of U-M community, free • Get an account by visiting https://www.engin.umich.edu/form/cacaccountapplication cja 2013

  10. Flux Account Policies To qualify for a Flux account: • You must have an active institutional role • On the Ann Arbor campus • Not a Retiree or Alumni role • Your uniqname must have a strong identity type • Not a friend account • You must be able to receive email sent to uniqname@umich.edu • You must have run a job in the last 13 months http://cac.engin.umich.edu/resources/systems/user-accounts cja 2013

  11. Using Flux • An MToken (or a Software Token) • Required for access to the login nodes • Improves cluster security by requiring a second means of proving your identity • You can use either an MToken or an application for your mobile device (called a Software Token) for this • Information on obtaining and using these tokens at http://cac.engin.umich.edu/resources/login-nodes/tfa cja 2013

  12. Using Flux • A Flux allocation • Allows you to run jobs on the compute nodes • Current rates: (through June 30, 2016) • $18 per core-month for Standard Flux • $24.35 per core-month for Large Memory Flux • $8 cost-share per core-month for LSA, Engineering, and Medical School • Details at http://arc.research.umich.edu/resources-services/flux/flux-pricing/ • To inquire about Flux allocations please email flux-support@umich.edu cja 2013

  13. Flux Allocations • To request an allocation send email to flux-support@umich.eduwith • the type of allocation desired • Regular or Large-Memory • the number of cores needed • the start date and number of months for the allocation • the shortcode for the funding source • the list of people who should have access to the allocation • the list of people who can change the user list and augment or end the allocations http://arc.research.umich.edu/resources-services/flux/managing-a-flux-project/ cja 2013

  14. Flux Allocations • An allocation specifies resources that are consumed by running jobs • Explicit core count • Implicit memory usage (4 or 25 GB per core) • When any resource fully in use, new jobs are blocked • An allocation may be ended early • On the monthly anniversary • You may have multiple active allocations • Jobs draw resources from all active allocations cja 2013

  15. lsa_flux Allocation • LSA funds a shared allocation named lsa_flux • Usable by anyone in the College • 60 cores • For testing, experimentation, exploration • Not for production runs • Each user limited to 30 concurrent jobs https://sites.google.com/a/umich.edu/flux-support/support-for-users/lsa_flux cja 2013

  16. Monitoring Allocations • Visit https://mreports.umich.edu/mreports/pages/Flux.aspx • Select your allocation from the list at upper left • You’ll see all allocations you can submit jobs against • Four sets of outputs • Allocation details (start & end date, cores, shortcode) • Financial overview (cores allocated vs. used, by month) • Usage summary table (core-months by user and month • Drill down for individual job run data • Usage charts (by user) • Details & screenshots:http://arc.research.umich.edu/resources-services/flux/check-my-flux-allocation/ cja 2013

  17. Storing data on Flux • Lustre filesystem mounted on /scratch on all login, compute, and transfer nodes • 640 TB of short-term storage for batch jobs • Pathname depends on your allocation and uniqname • e.g., /scratch/lsa_flux/cja • Can share through UNIX groups • Large, fast, short-term • Data deleted 60 days after allocation expires • http://cac.engin.umich.edu/resources/storage/flux-high-performance-storage-scratch • NFS filesystems mounted on /home and /home2 on all nodes • 80 GB of storage per user for development & testing • Small, slow, long-term cja 2013

  18. Storing data on Flux • Flux does not provide large, long-term storage • Alternatives: • LSA Research Storage • ITS Value Storage • Departmental server • CAEN HPC can mount your storage on the login nodes • Issue df-kh command on a login node to see what other groups have mounted cja 2013

  19. Storing data on Flux LSA Research Storage • 2 TB of secure, replicated data storage • Available to each LSA faculty member at no cost • Additional storage available at $30/TB/yr • Turn in existing storage hardware for additional storage • Request by visitinghttps://sharepoint.lsait.lsa.umich.edu/Lists/Research%20Storage%20Space/NewForm.aspx?RootFolder= • Authenticate with Kerberos login and password • Select NFS as the method for connecting to your storage cja 2013

  20. Copying data to Flux • Using the transfer host:rsync-avz/your/cluster1/directory flux-xfer.engin.umich.edu:newdirnamersync-avz/your/cluster1/directoryflux-xfer.engin.umich.edu:/scratch/youralloc/youruniqname • Or use scp, sftp, WinSCP, Cyberduck, FileZilla http://cac.engin.umich.edu/resources/login-nodes/transfer-hosts cja 2013

  21. Globus Online • Features • High-speed data transfer, much faster than SCP or SFTP • Reliable & persistent • Minimal client software: Mac OS X, Linux, Windows • GridFTP Endpoints • Gateways through which data flow • Exist for XSEDE, OSG, … • UMich: umich#flux, umich#nyx • Add your own server endpoint: contact flux-support@umich.edu • Add your own client endpoint! • More information • http://cac.engin.umich.edu/resources/login-nodes/globus-gridftp cja 2013

  22. Connecting to Flux • ssh flux-login.engin.umich.edu • Login with token code, uniqname, and Kerberos password • You will be randomly connected a Flux login node • Currently flux-login1 or flux-login2 • Do not run compute- or I/O-intensive jobs here • Processes killed automatically after 30 minutes • Firewalls restrict access to flux-login.To connect successfully, either • Physically connect your ssh client platform to the U-M campus wired or MWireless network, or • Use VPN software on your client platform, or • Use ssh to login to an ITS login node (login.itd.umich.edu), and ssh to flux-login from there cja 2013

  23. Lab 1 Task: Use the multicore package The multicore package allows you to use multiple cores on the same node • module load R • Copy sample code to your login directorycd cp ~cja/hpc-sample-code.tar.gz . tar -zxvfhpc-sample-code.tar.gz cd ./hpc-sample-code • Examine Rmulti.pbsand Rmulti.R • Edit Rmulti.pbs with your favorite Linux editor • Change #PBS -Memail address to your own cja 2013

  24. Lab 1 Task: Use the multicore package • Submit your job to FluxqsubRmulti.pbs • Watch the progress of your job qstat -u uniqname where uniqname is your own uniqname • When complete, look at the job’s outputless Rmulti.out cja 2013

  25. Lab 2 • Task: Run an MPI job on 8 cores • Compile c_ex05cd ~/cac-intro-codemake c_ex05 • Edit file runwith your favorite Linux editor • Change #PBS -Maddress to your own • I don’t want Brock to get your email! • Change #PBS -Aallocation to FluxTraining_flux, or to your own allocation, if desired • Change #PBS -lallocation to flux • Submit your jobqsubrun cja 2013

  26. PBS resources (1) • A resource (-l) can specify: • Request wallclock (that is, running) time-l walltime=HH:MM:SS • Request C MB of memory per core-l pmem=Cmb • Request T MB of memory for entire job-l mem=Tmb • Request M cores on arbitrary node(s)-l procs=M • Request a token to uselicensed software-l gres=stata:1-l gres=matlab-l gres=matlab%Communication_toolbox cja 2013

  27. PBS resources (2) • A resource (-l) can specify:For multithreaded code: • Request M nodes with at least N cores per node-l nodes=M:ppn=N • Request Mcores with exactlyNcores per node (note the differencevis a visppn syntax and semantics!)-l nodes=M,tpn=N(you’ll only use this for specific algorithms) cja 2013

  28. Interactive jobs • You can submit jobs interactively: qsub-I -V -l procs=2 -l walltime=15:00 -A youralloc_flux-l qos=flux –q flux • This queues a job as usual • Your terminal session will be blocked until the job runs • When it runs, you will be connected to one of your nodes • Invoked serial commands will run on that node • Invoked parallel commands (e.g., via mpirun) will run on all of your nodes • When you exit the terminal session your job is deleted • Interactive jobs allow you to • Test your code on cluster node(s) • Execute GUI tools on a cluster node with output on your local platform’s X server • Utilize a parallel debugger interactively cja 2013

  29. Lab 3 Task: compile and execute an MPI program on a compute node • Copy sample code to your login directory:cd cp~brockp/cac-intro-code.tar.gz. tar -xvzfcac-intro-code.tar.gz cd ./cac-intro-code • Start an interactive PBS sessionqsub -I -V -l procs=2 -l walltime=30:00 -A FluxTraining_flux -l qos=flux -q flux • On the compute node, compile & execute MPI parallel code: cd $PBS_O_WORKDIRmpicc-O3 -ipo -no-prec-div -xHost -o c_ex01 c_ex01.c mpirun-np 2 ./c_ex01 cja 2013

  30. Lab 4 Task: Run Matlab interactively • module load matlab • Start an interactive PBS sessionqsub -I -V -l procs=2 -l walltime=30:00 -A FluxTraining_flux -l qos=flux -q flux • Run Matlab in the interactive PBS sessionmatlab -nodisplay cja 2013

  31. The Scheduler (1/3) • Flux scheduling policies: • The job’s queue determines the set of nodes you run on • flux, fluxm • The job’s account determines the allocation to be charged • If you specify an inactive allocation, your job will never run • The job’s resource requirements help determine when the job becomes eligible to run • If you ask for unavailable resources, your job will wait until they become free • There is no pre-emption cja 2013

  32. The Scheduler (2/3) • Flux scheduling policies: • If there is competition for resources among eligible jobs in the allocation or in the cluster, two things help determine when you run: • How long you have waited for the resource • How much of the resource you have used so far • This is called “fairshare” • The scheduler will reserve nodes for a job with sufficient priority • This is intended to prevent starving jobs with large resource requirements cja 2013

  33. The Scheduler (3/3) • Flux scheduling policies: • If there is room for shorter jobs in the gaps of the schedule, the scheduler will fit smaller jobs in those gaps • This is called “backfill” Cores Time cja 2013

  34. Job monitoring • There are several commands you can run to get some insight over your jobs’ execution: • freenodes : shows the number of free nodes and cores currently available • mdiag-a youralloc_name: shows resources defined for your allocation and who can run against it • showq-w acct=yourallocname: shows jobs using your allocation (running/idle/blocked) • checkjobjobid : Can show why your job might not be starting • showstart -e all jobid: Gives you a coarse estimate of job start time; use the smallest value returned cja 2013

  35. Job Arrays • Submit copies of identical jobs • Invoked via qsub –t: qsub –t array-spec pbsbatch.txt Where array-spec can be m-n a,b,c m-n%slotlimit e.g. qsub –t 1-50%10 Fifty jobs, numbered 1 through 50, only ten can run simultaneously • $PBS_ARRAYID records array identifier cja 2013 35

  36. Dependent scheduling • Submit jobs whose execution scheduling depends on other jobs • Invoked via qsub –W: qsub -W depend=type:jobid[:jobid]… Where depend can be after Schedule after jobids have started afterok Schedule after jobids have finished, only if no errors afternotok Schedule after jobids have finished, only if errors afterany Schedule after jobids have finished, regardless of status Inverted semantics for before,beforeok,beforenotok,beforeany cja 2013 36

  37. Some Flux Resources • http://arc.research.umich.edu/resources-services/flux/ • U-M Advanced Research Computing Flux pages • http://cac.engin.umich.edu/ • CAEN HPC Flux pages • http://www.youtube.com/user/UMCoECAC • CAEN HPC YouTube channel • For assistance: flux-support@umich.edu • Read by a team of people including unit support staff • Cannot help with programming questions, but can help with operational Flux and basic usage questions cja 2013

  38. Any Questions? • Charles J. AntonelliLSAIT Advocacy and Research Supportcja@umich.eduhttp://www.umich.edu/~cja734 763 0607 cja 2013

More Related