260 likes | 365 Vues
This document provides in-depth insights from the T3E Training hosted at NERSC in Argonne, IL, on April 28, 1999. Highlights include an overview of the T3E hardware configuration, programming environment, run planning, execution monitoring, and user accounting. It discusses the DEC Alpha EV-5 superscalar processor, programming options in Fortran and C/C++, and the various libraries and tools available for effective parallel computing. Additional resources and practical execution models for batch and interactive jobs are also covered.
E N D
Introduction to the T3E Mark Durst NERSC/USG ERSUG Training, Argonne, IL 28 April 1999
Outline • Hardware and Configuration • Programming Environment • Planning Runs • Monitoring Execution • Accounting • Additional Resources • Elvis Impression
NERSC T3E Configuration • Commodity DEC Alpha EV-5 superscalar processor • 450 MHz clock • 900 Mflops/PE peak (only 5-10% typically achieved) • Theoretical peak performance: 575 Gflops • 256 MB memory per PE • 692 PEs in 3 flavors • 644 Application • 33 Command (ideally) • 15 OS • Access via telnet, ssh, FTP • Connect to NERSC mass storage, AFS
Interactive Environment • UNICOS/mk • Available shells: sh/ksh, csh, tcsh • csh: no file completion • tcsh not Cray-supported • Home directories • 2 GB file quota (with possible data migration) • 3500 inode quota • /usr/tmp • Used both for batch and temporary user space • 75 GByte quota, 6K inode quota • Fastest transfer rates
modules • modules manages user environment • Paths • Environment variables • Aliases • Cray’s PrgEnv is modules-driven • Provided startup files are critical! • Add to them, don’t clobber them • Add to paths, don’t set them • If you mess up, no compilers, etc. • Largely automatic
More Fun with modules • module list (tells you what’s loaded) • module avail (lists them all) • Other module subcommands • load • unload • switch • help • Roll back compilers • Test new versions • http://home.nersc.gov/software/os/modules.html
Other modules • imsl (loads by default) • nag (loads by default) • scalapack (1.5) • GNU (prepends) and GNU.tools (appends) • tools (tcsh, bash) • netcdf • KCC (KAI C++ compiler) • USG • tedi
Programming Environment • f90 • cc/CC • cam (assembler) • cld (loader; usually unneeded) • pghpf • KCC (“module load KCC”) • totalview (debugger) • pat, apprentice (performance analysis)
f90 • Conforms to Fortran 90 standard • Much “standard” f77 wasn’t • User-defined and abstract types • Array syntax • Allocatable objects and pointers • Additional intrinsics • cpp-like preprocessor
Important f90 options • -f: source form (fixed or free) • Defaults: .f fixed, .f90 free • -c: Compile only • -oname: Name executable • Overrides -c (use -bname instead) • -g, -G0, -G1: debugging • -O[0-3]: general optimization • -Ra, -Rb: Argument/Bounds checking • -dp: Double precision 64-bit single precision • -i 32 / -s default32: 32-bit integers / numbers • -ev: Static memory allocation
Executables: Malleable or Fixed • -Xnpes (e.g., -X64) creates “fixed” executable • Always runs on same number of (application) processors • Type ./a.out to run • -Xm or no -X option creates “malleable” executable • ./a.out will run on command PE • mpprun -n npes ./a.out runs on npes APP PEs
Execution Model • In F90, C, C++, all processors execute same program • Can ask for: • Process number (from zero up) • MY_PE() (F90) • _my_pe() (C/C++) • Total number of PEs • NUM_PES() (F90) • _num_pes() (C/C++) • Above used to establish “master/slave” relationships • Libraries still needed for communication
Libraries • MPI (Message-Passing Interface) • PVM (Parallel Virtual Machine) • SHMEM (SHared MEMory; non-portable) • BLACS (Basic Linear Algebra Communication Subprograms) • ScaLAPACK (SCAlable [parts of] LAPACK) • NetCDF (NETwork Common Data Format) • HDF (Hierarchical Data Format) • LIBSCI (including parallel FFTs), NAG, IMSL
Archival Storage in HPSS • High-Performance Storage System • Designed for scalability & hierarchies • User storage quotas exist • Access via ftp or new hsi utility • Two systems: • hpss.nersc.gov (hsi hpss) • archive.nersc.gov (hsi, hsi archive) contains old CFS files • merger planned
Networking Issues • AFS • Accounts must be requested • Tiny local quotas • Available on Crays through NFS/AFS gateway • Non-trivial latencies • Remote logins • .rhosts access not permitted; no incoming “r- commands” • ssh available • xterm only “backwards”
Execution modes • Interactive serial • < 60 minutes • on command PEs • slightly reduced memory • Interactive parallel • < 30 minutes • < 64 processors • Batch
Batch queues on mcurie.nersc.gov • To see them: qstat -b • pe16 through pe512 • 4 hours “on the torus” • Routine parallel jobs • serial_short: 4 hours on a single command PE • debug_small: ½ hour, up to 32 PEs • long128, gc128, gc256: 12-hour queues • 64 PEs • gc queues restricted • Largest queues shuffled in at night • Other jobs checkpointed out • Subject to change
Batch submission • Jobs are shell scripts • cqsub submits, returns task ID; cqdel deletes • cqstatl/qstat gets status (many options) • NQS parameters determine queue • #QSUB -l mpp_p=… (number of PEs) • #QSUB -l mpp_t=… (“parallel” time) • for serial jobs: • use #QSUB -q serial • not#QSUB -l mpp_p=1
Pipe Queues • You submit to pipe queues, not batch queues • Use only pipe names in directives like:#QSUB -q serial • Group batch queues: • serial = serial_short • debug = debug_small • production = pe128 through pe512 • long = long128, gc128, gc256 • 3 jobs per user in production + long • 3 in serial, one in debug • To see them: qstat -p
Scheduling Information • Lots of NQS-related limits • Queue run limits • Queue “complex” run limits • Global Resource Manager • Fits jobs into contiguous sets of PEs • Once started, jobs run to completion (mostly) • First-fit algorithm lets small jobs trample big ones • grmview shows PE status, waiting jobs
Scheduling Information (cont’d) • pslist gives summary of GRM data • No man page; pslist -h instead • Checkpointing • For system maintenance • To run test and “grand challenge” jobs • Shows “Hop” in qstat/cqstatl (held by operator) • mppview more nuts-and-bolts
Accounting and allocations • T3E allocations are in node-minutes • setcub view repo=reponame • setcub view user=username • newacct reponame switches repos interactively • One login name per user; multiple repos • #QSUB -A reponame charges batch jobs • Charging updated daily; enforcement manual
On-line Resources • T3E pages under “Computers” at home.nersc.gov • Read overview once, check “Changes” monthly • Docs in Cray on-line system • http://www.cray.com/swpubs/ • “Topics” to T3E collection • Many other docs (e.g., F90, C manual sets) • Cray Web site, www.cray.com • Technical documents, additional on-line docs • NERSC T3E tutorials • “Training” “NERSC Tutorials”
More on-line resources • Other NERSC tutorials • Using the Cray f90 compiler at NERSC • Introduction to make • NQE: Using the batch system • Look over NERSC Web generally
man pages • cqsub • cqstatl • f90 • cc • CC