1 / 43

High Performance Computing

High Performance Computing. John Zaitseff September 2014. High Performance Computing. High Performance Computing architecture. Massively Parallel Distributed Computational Cluster Many individual servers (“nodes”): dozens to thousands Multiple processors per node: between 8 and 64 cores

neron
Télécharger la présentation

High Performance Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Performance Computing John Zaitseff September 2014 High Performance Computing

  2. High Performance Computing architecture Massively Parallel Distributed Computational Cluster • Many individual servers (“nodes”): dozensto thousands • Multiple processors per node: between 8and 64 cores • Interconnected by fast networks • Almost always run Linux • In our case: Rocks Linux Distributionon top of CentOS 6.x The Trentino clusterImage credit: John Zaitseff, UNSW

  3. High Performance Computing architecture Internet Head Node Storage Node Internal Network Switch Chassis m Chassis 1 Compute Node 1 Compute Node 2 Compute Node 3 Compute Node 4 Compute Node n Compute Node 1-1 Compute Node m-1 Compute Node 1-2 Compute Node m-2 Compute Node 1-3 Compute Node m-3 Compute Node m-4 Compute Node 1-4 Compute Node 1-n Compute Node m-n

  4. The Newton cluster: newton.mech.unsw.edu.au • 10 × Dell R415 server nodes • Head node: newton • Compute nodes: newton01 to newton09 • 160 × AMD Opteron 4386 3.1GHz processor cores • Two physical processors per node • Eight CPU cores per processor • Only four floating-point units per processor • 320 GB of main memory (32 GB per node) • 12 TB of storage: 6 × 3 TB drives in RAID 6 • 1Gb Ethernet network interconnect http://cfdlab.unsw.wikispaces.net/ The Newton cluster Image credit: John Zaitseff, UNSW

  5. The Trentino cluster: trentino.mech.unsw.edu.au • 16 × Dell R815 server nodes • Head node: trentino • Compute nodes: trentino01 to trentino15 • 1024 × AMD Opteron 6272 2.1GHz processor cores • Four physical processors per node • Sixteen CPU cores per processor • Only eight floating-point units per processor • 2048 GB of main memory (128 GB per node) • 30 TB of storage: 12 × 3 TB drives in RAID 6 • 4×1Gb Ethernet network interconnect http://cfdlab.unsw.wikispaces.net/ The back of the Trentino cluster Image credit: John Zaitseff, UNSW

  6. The Leonardi cluster: leonardi.eng.unsw.edu.au • 7 × HP BladeSystem c7000 blade enclosures • 1 × HP ProLiant DL385 G7 server: leonardi • 56 × HP BL685c G7 compute nodes • Compute nodes: ec01b01-ec07b08 • 2944 × AMD Opteron 6174 2.2GHz processor coresand Opteron 6276 2.3GHz processor cores • Four physical processors per node • Twelve or sixteen CPU cores per processor • 5888 GB of main memory (96 or 128 GB per node) • 95 TB of storage: 60 × 2 TB drives in RAID 60 • 2×10Gb Ethernet network interconnect http://leonardi.unsw.wikispaces.net/ Nodes in the Leonardi cluster Image credit: John Zaitseff, UNSW

  7. The Raijin cluster: raijin.nci.org.au • 3592 × Fujitsu blade server nodes • Multiple login nodes • Multiple management nodes • 57,472 Intel Xeon E5-2670 2.60GHzprocessors • 160 TB of main memory • 10 PB of storage using the Lustredistributed file system • 14Gb Infiniband FDR networkinterconnect http://nci.org.au/nci-systems/national-facility/peak-system/raijin/ Image credit: National Computational Infrastructure

  8. High Performance Computing architecture Internet Do not run your jobs here! Head Node Storage Node Internal Network Switch Chassis 1 Chassis m Compute Node 1 Compute Node 2 Compute Node 3 Compute Node 4 Compute Node n Compute Node m-1 Compute Node 1-1 Compute Node m-2 Compute Node 1-2 Compute Node 1-3 Compute Node m-3 Compute Node m-4 Compute Node 1-4 Compute Node 1-n Compute Node m-n

  9. Connecting to a HPC system • Use the Secure Shell protocol (SSH) • Under Linux: ssh username@hpcsystemname • Under Windows: PuTTY (Start » All Programs » PuTTY » PuTTY) • Can install Cygwin: “that Linux feeling under Windows” • Command line prompt • Will look something like: z9693022@newton:~ $ • May be different in different systems; may be customised • Try it now: PuTTY, Host name newton.mech.unsw.edu.au • RSA2 fingerprint: 69:7e:64:75:57:67:ad:4c:21:8e:90:7d:8e:97:70:ce • User name: your zID; Password: your zPass • To exit: exit

  10. Simple Linux commands • List files in a directory: ls [pathname ...] • [] indicates optional parameters, ... indicates one or more parameters • Italic fixed-width font indicates replaceable parameters • To show the current directory: pwd • To change directories: cd directory • ~ is the home directory • .. is the directory above the current one • ~user is the home directory of user user • Try it now: cd ~z9693022/src/trader-7.6 ls # List files in current directory cd src pwd; ls # More than one command at a time! cd ..; pwd # You don’t have to enter the comments...

  11. Directories and files: paths and pathnames • Files and directories are organised into a hierarchical tree structure • The top of the tree is called the root directory (or simply root), and is denoted as / (slash) • The root directory contains directories, which in turn contain files and directories of their own:

  12. Absolute pathnames • Any file or directory can be represented as an absolute pathname: • gives the full name of the file or directory • starts with the root “/” • lists each directory along the way • has a “/” to separate each path (or pathname) component • For example: the directory /share/apps/ansys/15.0

  13. Relative pathnames • Second way of denoting a file or directory (a pathname) • Relative to the current working directory • Does not start with the root directory “/” • Path components are still separated with slashes “/” • Current directory is denoted by “.” (dot) • Going up a level is denoted by “..” (dot-dot) • Often just contains a filename with no directories listed • Examples: Assume current directory is /home/z9693022/src/trader-7.6: README → /home/z9693022/src/trader-7.6/README src/trader.c → /home/z9693022/src/trader-7.6/src/trader.c ../trader-7.6.tar.xz → /home/z9693022/src/trader-7.6.tar.xz src/.././README → /home/z9693022/src/trader-7.6/README ./README → /home/z9693022/src/trader-7.6/README

  14. Important directories • Home directory: /home/user (e.g., /home/z9693022) • Scratch directory for temporary files: /share/scratch/user(but not available on Newton!) • Binary directories for utility programs: • /bin — for essential utilities • /usr/bin — for other utilities and some applications • /usr/local/bin — for local utilities and applications • /home/user/bin — for your own utilities • On our clusters, applications: /share/apps • On our clusters, module files: /share/apps/Modules • Note synonyms: path, pathname, filename

  15. More with pathnames • To change directories: cd dir • To change to your home directory: cd ~ or cd $HOME or cd (by itself) • To get current working directory: pwd • To show the directory tree structure: tree, tree -d (directories only) • To view a file page by page: less filename, “q” to quit, “h” for help • Try it now: cd /home/z9693022/src/trader-7.6 tree -d less README less src/trader.c cd src; pwd less README less ../README # Different from README!

  16. Getting help • Many commands have a myriad of command line options • For a brief summary of command line options, try command --help • For a full explanation, try man command • For some commands, try pinfo command • To search for a keyword in the manual: man -k keyword • Remember, “Google is your friend”  • Try it now: ls --help cd --help # Does this work? man ls # See “See Also” section at end pinfo coreutils # “q” to quit man less # 1571 lines! man cd # What is “BASH_BUILTINS”?

  17. The Bourne Again (Bash) shell • Official manual page entry: Bash is an sh-compatible command language interpreter that executes commands read from the standard input or from a file. Bash also incorporates useful features from the Korn and C shells (ksh and csh).Bash is intended to be a conformant implementation of the Shell and Utilities portion of the IEEE POSIX specification (IEEE Standard 1003.1). Bash can be configured to be POSIX-conformant by default. • Interprets your typed commands and executes them • Just another Linux program: nothing special about it! • Started by the system when you log in • You can then start another shell, if you like (e.g., ksh, tcsh, even python) • You can start a subshell by running bash • To exit a subshell (or the main shell): exit

  18. Some features of Bash • Powerful command line facilities (shortcuts): • Tab completion (press the TAB key to complete commands and pathnames, TAB TAB to list all possibilities) • Command line editing: try ↑ (Up-Arrow) to recall previous commands, CTRL-R (C-R or ^R) to search for previous commands, ← and → to move along current command line • A full programming and scripting language: • Variables and arrays • Loops (for; while; until), control statements (if ... then ... else; case) • Functions and coprocesses • Text processing (“expansion” and “parameter substitution”) • Simple arithmetic calculations • Input/output redirection (e.g., redirect output to different files) • Much, much more! (The man page runs to over 5,300 lines)

  19. Trying out some features of Bash • Try it now: • cd ~z9693022/src/trader-7.6/src • Type “less”, then space, but do not press ENTER yet • Press TAB once: nothing appears • Press TAB a second time: all relevant completions appear • Type “f”, then press TAB: the filename is completed to “fileio.” • Press TAB TAB again: two files are listed • Type “h” to select the second file, then press ENTER (and “q” to quit) • Try it now: • Press CTRL-R, then type “ls” (but do not press ENTER): previous commands with “ls” in them are listed • Press CTRL-R again a few times: will even list “pinfo coreutils” • Press ENTER when you get to the command you wish to execute • Press CTRL-C if you do not wish to execute any command

  20. Listing files and directories • Already know the ls command: List directory contents • In full: ls [options] [pathname ...] • Some options: • “-a” for all files (including those starting with “.”) • “-l” for long (detailed) listing • Options sometimes can be combined: “-alF” • Try it now:ls -laF or dir (an alias to “ls -laF”); ll (“ls -lF”) • Example of a line in a long listing: -rw-r--r-- 1 z9693022 unsw 1266 May 24 07:59 README • The columns of information are: file permissions, number of links (usually 1 for files, 2 or more for directories), file owner, group owner, size in bytes (here, 1266), date last modified, the actual filename (README), with perhaps a trailing “*” for executable files and “/” for directories.

  21. File and directory patterns • The Bash shell interprets certain characters in the command line by replacing them with matching pathnames • Called pathname expansion, pattern matching, wildcards or globbing • For existing pathnames: “*” matches any string, “?” matches any single character, “[...]” matches any one of the enclosed characters • Try it now: cd ~z9693022/src/trader-7.6/src; echo 1 2 3 echo *c # All filenames ending in “c”: “.” is not special echo ????.c # All filenames six characters long (4 + “.c”) echo M*m # All filenames starting with “M” and ending with “m” echo [it]* # All filenames starting with either “i” or “t” echo ../lib/uni* # All filenames in ../lib starting with “uni” echo ../*/*.c

  22. More file and directory patterns • Glob patterns “*”, “?” and “[...]” only match existing pathnames • Even for pathnames that do not exist: “{alt1,alt2,...}” lists alternatives, “{n..m}” lists all numbers between n and m, “{n..m..s}” in steps of s • Technically called brace expansion • Try it now: ls test-* # “No such file or directory” echo test-* # What happens? echo test-{one,two,three} echo newdir/{one,two,three} echo test-{1..100} echo test-{001..100} # Zero-padding echo test-{1..100..3} # By steps of three echo test-{100..1..-3} # By steps of negative three

  23. Naming files and directories • Linux allows any characters in filenames except “/” and the NUL byte • You may create filenames with “weird” characters in them: • spaces and tabs • starting with “-”: conflicts with command line options • question marks “?”, asterisks “*”, brackets and braces • other characters with special meanings: “!”, “$”, “&”, “#”, “"”, etc. • Just because you can does not mean you should! • To match such files: use the glob characters “*” and “?” • Linux file systems are case-sensitive: README.TXT is different from readme.txt, which is different from Readme.txt and ReadMe.txt! • File type suffixes (e.g., “.txt”) are optional but recommended • Filenames starting with “.” are usually hidden from globs and ls output. • Recommendation: Use “a” to “z”, “A” to “Z”, “0” to “9”, “-”, “_” and “.” only.

  24. Managing directories • To create a directory: mkdir dir... • To create parent directories as well: mkdir -p dir... • To remove an empty directory: rmdir dir... • Try it now: cd ~; ls mkdir gsoe9400/dir{1,2,3} # Why does this fail? mkdir -p gsoe9400/dir{1,2,3,99} gsoe9400/x ls gsoe9400 rmdir gsoe9400/dir? ls gsoe9400 # Should list dir99 and x only rmdir gsoe9400/* # Be careful...

  25. Managing files • To output one or more file’s contents: cat filename... • To view one or more files page by page: less filename... • To copy one file: cp source destination • To copy one or more files to a directory: cp filename...dir • To preserve the “last modified” time-stamp: cp -p • To copy recursively: cp -pr sourcedestination • To move one or more files to a different directory: mv filename...dir • To rename a file: mv oldnamenewname • To remove files: rm filename... • Recommendation: use “ls filename...” before rm or mv: what happens if you accidentally type “rm *”? or “rm * .c”? (note the space!)

  26. Managing files and directories, continued • To copy whole directory trees: cp -pr filename...destination • To copy to and from another Linux system (e.g., from Leonardi to Trentino), use Secure Copy: scp [-p -r]source...destination • Either source or destination (but not both) can contain a remote system identifier followed by a colon: [user@]system: • Can also use rsync or insync: insync [-d]sourcedestination • Examples: cp -pr ~z9693022/src/trader-7.6 . scp -p ~/file1.txt leonardi:file2.txt scp -p john@zap.org.au:src/README . mkdir dir1; insync ~/orig dir1 insync /share/scratch/$USER/data1 $HOME/data1 insync leonardi:/share/scratch/$USER/data2 .

  27. Managing files and directories, continued • Try it now: cd ~/gsoe9400 cp -pr ~z9693022/src/trader-7.6 .; ls cd trader-7.6; pwd cat build-aux/bootstrap ls */*.c rm */*.c; ls */*.c # What is the output of ls? insync ~z9693022/src/trader-7.6 . mkdir ../new; cp src/trader.c ../new cd ../new; ls mv trader.c new.c; rm new.c cp -p ../trader-7.6/src/trader.* . cp trader.c new.c ls -l trader.c new.c # What is the difference between these files?

  28. Transferring files • To copy files to another Linux system: use scp, rsync or insync • To copy files to and from a Windows machine: use WinSCP or scp, rsync or insync under Cygwin • Try it now: • Start WinSCP (Start » All Programs » WinSCP » WinSCP) • Host name newton.mech.unsw.edu.au • RSA2 fingerprint: 69:7e:64:75:57:67:ad:4c:21:8e:90:7d:8e:97:70:ce • User name: your zID; Password: your zPass • Copy ~/gsoe9400/new/new.c to the Windows desktop • Rename it to newnew.c (using the usual Windows right-click or F2) • Copy it back • Under PuTTY: ls newnew.c

  29. More Linux commands • What machine am I on? hostname • What is the date and time? date • Who is logged in? who • But who is user z1234567? finger [username...] • What is the user name for someone? finger part-of-name • What files contains a particular string? grep 'pattern' filename... • What is the difference between two files? diff [-u]file1file2 • How do I rename multiple files at once? rename or prename • Where is a file named filename? find dir... -name filename • How big is a file or directory? du -h [filename...] • How much space is available in a directory? df -h [dir...] • How much disk quota do I have? quota -s • “Blocks” is how many disk blocks you are using, in chunks of 1 kB • On Newton: “limit” is 10240M = 10 GB

  30. Redirecting input and output • The terminal is treated as just another file (/dev/tty); use CTRL-D to signify the end of file • Other special files: /dev/null (an empty file), /dev/zero (an infinite number of binary zeros—can use up your quota in a hurry!) • Input and output from a program can be redirected to a file or even piped to another program • To redirect output to filename, use “>filename” • To append output to filename, use “>>filename” • To redirect input from filename, use “<filename” • To connect the output from one program to the input of another (pipes), use “program1|program2” • Multiple pipes are allowed: “program1|program2|...|programn” • Many utility programs are designed to be used in this way, as filters • Output can be substituted into a command line: $(commandline)

  31. Redirecting input and output, continued • Try it now: cd ~/gsoe9400/trader-7.6 ls > ../dir-list1 cat ../dir-list1 cat ../dir-list1 | wc -l # How many lines in ../dir-list1? ls ~/gsoe9400/trader-7.6 | wc -l # Same as above rm ../dir-list1 ls -l | grep May # How many files were last modified in May? ls -l | grep May | sort -nk4 # Same, but sort by file size (4th field) who | awk '{print $1}' # Just list first field of “who” output finger $(who | awk '{print $1}') # Full details of who is logged in finger $(who | awk '{print $1}') | less # One page at a time

  32. Simple scripting • Shell scripts are just files containing a list of commands to be executed • First line (“magic identifier”) must be #!/bin/bash • Comments are introduced with “#” • The script file must be made executable: chmod a+x filename • Variables: • To set a variable, use varname=value (no spaces!) • To use a variable, use $varname or ${varname} • Variable names start with a letter, may contain letters, numbers and “_” • Variable names are case-sensitive (as with most things Linux) • Functions (parameters are accessed using $1, $2, ...): funcname() {body of function}

  33. Simple scripting, continued • For loops: for varname in list...; doprocess using ${varname}done • Control statements (multiple “elif” allowed; “elif” and “else” clauses are optional): if [ comparison ]; thenif-true statementselif [ second-comparison ]; thenif-second-true statementselseif-false statementsfi • Example of comparisons: string1 = string2 (is equal) • See the manual page for test (“man test”) for more information

  34. Simple scripting, continued • While loops: while [ comparison ]; dowhile-true statementsdone • Until loops: until [ comparison ]; dowhile-false statementsdone • Many, many other programming features available! • Read the manual page: man bash • Some books: • Cameron Newham, Learning the bash Shell, 3rd Edition, O’Reilly Media, March 2005. ISBN 9780596009656, 9780596158965 • William E. Shotts Jr., The Linux Command Line, No Starch Press, January 2012. ISBN 9781593273897, 9781593274269

  35. Editing files under Linux • Use an editor to edit text files • Many choices, leading to “religious wars”! • Some options: GNU Emacs, Vim, Nano • Nano is very simple to use: nano filename • CTRL-X to exit (you will be asked to save any changes) • GNU Emacs and Vim are highly customisable and programmable • For example, see the file ~z9693022/.emacs • Debra Cameron et al., Learning GNU Emacs, 3rd Edition, O’Reilly Media, December 2004. ISBN 9780596006488, 9780596104184 • Arnold Robbins et al., Learning the vi and Vim Editors, 7th Edition, O’Reilly Media, July 2008. ISBN 9780596529833, 9780596159351 • Try it now: cd ~/gsoe9400; nano script1

  36. Creating a simple script file • Try it now, continued: Enter the following text: #!/bin/bash # How much disk quota am I using? # (We want only the last line of "quota" output: # use the "tail" utility) blocks_used=$(quota | tail -n 1 | awk '{print $1}') blocks_limit=$(quota | tail -n 1 | awk '{print $3}') percent=$(( ${blocks_used} * 100 / ${blocks_limit} )) echo "I am using ${blocks_used} blocks (${percent}%)" • Save the file and exit the editor, then: chmod a+x ./script1 ./script1 # Execute the script! (Note the use of “./”)

  37. Creating a script with loops • Try it now: • Create and run the file script2, containing the following. What is the output? (Hint: remember “chmod a+x ./script2; ./script2”) #!/bin/bash module load matlab/2014a for n in {01..10}; do echo "n = $n;" >script${n}.m echo "sqrtn = sqrt(n);" >>script${n}.m echo "save('data${n}.txt', 'sqrtn', '-ascii');" \ >>script${n}.m echo "quit" >>script${n}.m matlab -nojvm -r script${n} >/dev/null cat data${n}.txt done

  38. Applications on the cluster • Applications are managed using the module system • Applications are stored in /share/apps • Module files are stored in /share/apps/Modules • Module files set shell environment variables such as PATH • PATH controls where applications are searched (the search path) • Try it now:echo $PATH • To see all available applications: module avail • To see currently loaded applications: module list • To load an application: module load application[/version] • To unload an application: module unload application[/version]

  39. Submitting jobs to the cluster • So far, everything has been run on the head node: a very bad idea! • To submit a job to the cluster compute nodes: • Create a shell script file as per normal • Add #PBS directives as required directly after “#!/bin/bash” • Add “cd $PBS_O_WORKDIR” • Execute qsub ./scriptfile • Wait for the job to run, checking its status as required • Common #PBS directives (“man qsub” for full details): • #PBS -N scriptname — Set a name for the script • #PBS -M email — Send notifications to an email address • #PBS -m abe — What notifications to send • #PBS -l walltime=hh:mm:ss — How much time is required • #PBS -l vmem=sizegb — How much memory is required (GB) • #PBS -l nodes=1:ppn=n — Request n processors on one node • #PBS -q queuename — Which queue to submit to

  40. Checking your job status • Submit your jobs using “qsub” • You will be given a job number in the form jobnumber.systemname • Check job status: qstat [jobnumber] • Another way: showq • Yet another way: pestat or pestat | less -S • Use ← and → keys to scroll left and right (or expand your terminal!) • Show which nodes are reserved: showres -n | less -S • Get overall information about the cluster: visit http://systemname/ganglia/ • e.g., http://newton.mech.unsw.edu.au/ganglia/ • Currently only available within UNSW • Try it now: view the Ganglia page for the Newton cluster.

  41. Managing your jobs • To see which nodes exist on the cluster: rocks list host or pestat • To see jobs belonging to you: qstat | grep $USER • To see when your job will start: showstart jobnumber • For more detailed information: checkjob jobnumber • To delete a queued job (whether running or not): qdel jobnumber... • To place a job on hold: qhold jobnumber... • To release a job currently on hold: qrls jobnumber... • To rerun a job (kill it and then restart it): qrerun jobnumber... • To move a job from one queue to another:qmove destqueuejobnumber...

  42. Submitting and checking a job • Try it now: • Create and change to the directory ~/gsoe9400/job1: mkdir ~/gsoe9400/job1; cd ~/gsoe9400/job1 • Copy the previously created script file script2: cp ../script2 job1 • Edit the file job1 and add the following lines just after “#!/bin/bash”: #PBS -N job1 #PBS -M J.Zaitseff@unsw.edu.au # Do not replace—used to #PBS -m abe # assess you for this class! #PBS -l walltime=00:10:00 #PBS -l vmem=2gb #PBS -l nodes=1:ppn=1 cd $PBS_O_WORKDIR • Submit the script: qsub ./job1

  43. Conclusion You have begun your journeyto using High PerformanceComputing clusters effectively. Well done! John Zaitseff J.Zaitseff@unsw.edu.au Available for consultationson Tuesdays 9:30am–4pmby appointment only. http://www.engineering.unsw.edu.au/hpc Image credit: John Zaitseff, UNSW

More Related