Bioinformatics Programs at PSC: Accessing High-Performance Computing Resources
320 likes | 449 Vues
This document outlines the computing resources available for biomedical research at the Pittsburgh Supercomputing Center (PSC). It includes details on specialized clusters and machines, such as the Opteron cluster and JONAS, which are dedicated to bioinformatics. Furthermore, it explains how to access these resources through a grant process, offers guidelines for secure password selection, and shares essential SLURM commands for managing job submissions. Overall, it's a comprehensive guide for researchers seeking to utilize high-performance computing for bioinformatics applications.
Bioinformatics Programs at PSC: Accessing High-Performance Computing Resources
E N D
Presentation Transcript
Introduction to PSC Computing Systems Alex Ropelewski ropelews@psc.edu MARC: Developing Bioinformatics Programs July 17-28, 2006
Computers Available for Biomedical Use • PSC operates two platforms exclusively for biomedical use: • A 20 compute node Opteron cluster • Contains one dual-cpu 1.4 Ghz AMD Opteron processor per node • 4 Gbytes of memory per node. • Two front ends: CODON and BIOINFORMATICS. • A 64 processor SMP machine • 64 1.15 GHz EV7 processors • 256 Gbytes of shared memory. • Machine is called JONAS National Resource for Biomedical Supercomputing - An NIH Supported Research Center
Computers Available for General Use (Including Biomedical) • bigben, a Cray XT3 MPP machine with 2068 compute processors. • lemieux, an HP Alphaserver Cluster comprising 750 4-processor compute nodes. • rachel, an SMP machine. Each machine has 64 1.15 GHz EV 7 processors and 256 Gbytes of shared memory. • ben, an HP Alphaserver cluster comprising 64 4- processor, 4-Gbyte compute nodes. • Front end machines running Linux and VMS • A file archiver National Resource for Biomedical Supercomputing - An NIH Supported Research Center
Access to PSC High Performance Computing Systems • Access for academic research and coursework use is through a grant process. • To apply for a grant visit: • http://www.psc.edu/nrbsc/resources/ • One grant per project • Additional users can be added to a grant National Resource for Biomedical Supercomputing - An NIH Supported Research Center
Consulting • All active PSC users have access to PSC consulting resources: • 800-221-1641 • Phones are staffed Monday - Friday, 9 a.m. to 8 p.m. and Saturday, 9 a.m. to 4 p.m. (EST). • For best service, call for critical problems. • remarks@psc.edu • There is also documentation available at www.psc.edu National Resource for Biomedical Supercomputing - An NIH Supported Research Center
General Policies • The PSC has policies on computing related topics such as: • Passwords • File retention after grant expiration • Email addresses • To review these policies please see: • http://www.psc.edu/general/policies/policyoverview.html National Resource for Biomedical Supercomputing - An NIH Supported Research Center
Passwords • Computer security depends heavily on maintaining secrecy of passwords • Most machines use a common Kerberos password: • Must be at least 6 characters long. • Longer than 8 characters can prevent you from logging in certain machines. National Resource for Biomedical Supercomputing - An NIH Supported Research Center
Selecting Secure Passwords • Do NOT • simply add numbers to words that can be found in a dictionary, such as "helper01", "amoeba1", "1license" • simply substitute "1" for "L" or "0" for "o" or "1" for "I" in common words to get passwords like "he1per" or "am0eba" or "11cense" • Creating good passwords: • use first letter from an uncommon sentence/phrase that you can easily remember: • I married Sandie on July 2nd in Greentree (ImSoJ2iG) • My 4thgrade teacher was Sister Cyrilla: (M4gtwSC) National Resource for Biomedical Supercomputing - An NIH Supported Research Center
Connecting and Transferring Files • Connect to the PSC machines using ssh • http://www.psc.edu/general/net/ssh/ssh.html • Transfer files between PSC and your home institution using kftp, scp or sftp National Resource for Biomedical Supercomputing - An NIH Supported Research Center
Opteron Cluster • Contains bioinformatics software and databases • To log into the cluster, ssh to: • bioinformatics.psc.edu • codon.psc.edu • The cluster uses a UNIX operating system • SLURM is used to run serial and parallel programs on the clusters nodes National Resource for Biomedical Supercomputing - An NIH Supported Research Center
SLURM scripts • A file containing a series of instructions for the computer • SLURM scripts are submitted by the user and run when the system has resources available to run the script • SLURM scripts can run parallel programs or serial programs • A SLURM script will be created for you for sequence analysis codes when you run the program makseq National Resource for Biomedical Supercomputing - An NIH Supported Research Center
SLURM commands • srun – submit a script file to the SLURM scheduling queue • squeue – show status of the SLURM scheduling queue • scancel – remove a running National Resource for Biomedical Supercomputing - An NIH Supported Research Center
SLURM - srun % srun –b –o test.log test.d srun: jobid 3197 submitted % squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2773 all pgy347_t jshen3 R 3-02:13:43 1 operon20 3194 all test.a ropelews R 2:21 1 operon11 3195 all test.b ropelews R 2:21 1 operon13 3196 all test.c ropelews R 2:21 1 operon14 3197 all test.d ropelews R 2:10 1 operon16 National Resource for Biomedical Supercomputing - An NIH Supported Research Center
SLURM - scancel % squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2773 all pgy347_t jshen3 R 3-02:13:43 1 operon20 3194 all test.a ropelews R 2:21 1 operon11 3195 all test.b ropelews R 2:21 1 operon13 3196 all test.c ropelews R 2:21 1 operon14 3197 all test.d ropelews R 2:10 1 operon16 % scancel 3195 % squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2773 all pgy347_t jshen3 R 3-02:14:35 1 operon20 3194 all test.a ropelews R 3:13 1 operon11 3196 all test.c ropelews R 3:13 1 operon14 3197 all test.d ropelews R 3:02 1 operon16 National Resource for Biomedical Supercomputing - An NIH Supported Research Center
UNIX • To use UNIX, for sequence analysis one needs to become familiar with three basic areas: • General information on UNIX • UNIX commands and syntax • Text editor (such as vi, emacs, pico) • This talk presents the minimum that one needs to know in those areas National Resource for Biomedical Supercomputing - An NIH Supported Research Center
General Information • Commands are organized into “shells”: • sh, csh, ksh, tcsh • Shells can have different commands and different command syntax • Core UNIX commands work the same regardless of shell • Commands are case sensitive • General command syntax is: command -options parameters • Some commands can be listed in special files, which are executed when conditions warrant such as: .login and .cshrc and .profile National Resource for Biomedical Supercomputing - An NIH Supported Research Center
UNIX File and Directory Structure • Hierarchical (absolute) • No Special Filename Format • Filenames are case sensitive • Single dot . refers to the current directory • Double dots .. refers to the parent directory • $HOME refers to the login directory National Resource for Biomedical Supercomputing - An NIH Supported Research Center
Special Characters • Wildcard characters: * ? [letters] • Home/user Directory: ~ ~user • IO Redirection: <stdin;>stdout;>&stdin+stderr • Concatenate >> • Place job in background: & • Redirect output from a command as input into another command (pipe): | • Stop a job: [control] z • Stop executing: [control] c National Resource for Biomedical Supercomputing - An NIH Supported Research Center
Basic UNIX Commands • kpasswd (passwd) - Change your password • ls - List files in a directory • more - Display contents of a file • cp - Duplicate files • rm, rmdir - Remove a file or directory • mkdir - Create a directory • cd - Change directory • pwd - Show directory • man - Find Unix command usage information National Resource for Biomedical Supercomputing - An NIH Supported Research Center
Basic UNIX commands - kpasswd • kpasswd (passwd) – Change Kerberos Password % kpasswd ropelews@PSC.EDU's Password: New password: Verifying password - New password: % National Resource for Biomedical Supercomputing - An NIH Supported Research Center
Basic UNIX commands - ls • ls - List files in a directory • -l Long format • -a Show hidden files • -F Tag files with "/", "*", or "@" % ls a.doc a.cpr a.out FILE National Resource for Biomedical Supercomputing - An NIH Supported Research Center
Basic UNIX commands - more • more - View contents of file by page % more file.f program intro integer I, J, K real rr,vv,cc parameter (I = 5) : : National Resource for Biomedical Supercomputing - An NIH Supported Research Center
Basic UNIX commands - cp • cp - Duplicate files. % ls a.dat x.dat % cp x.dat xcopy.dat % ls a.dat x.dat xcopy.dat National Resource for Biomedical Supercomputing - An NIH Supported Research Center
Basic UNIX commands - rm • rm, rmdir - Remove a file or a directory • -i inquire before remove • -r recursive remove % ls x.dat xcopy.dat z.file % rm *.dat % ls z.file National Resource for Biomedical Supercomputing - An NIH Supported Research Center
Basic UNIX commands - directory • Directory navigation commands • mkdir - Create a directory • cd - Change directory • pwd - Show directory % mkdir sub1 % mkdir $HOME/sub2 % cd sub1 % pwd /usr/ue/2/ropelews/sub1 % cd $HOME/sub2 % pwd /usr/ue/2/ropelews/sub2 National Resource for Biomedical Supercomputing - An NIH Supported Research Center
Basic UNIX commands - man • man - Find Unix command information • man -k <keyword> - Find topics available • man <command> - Show command information % man -k directory mkdir (1) - make directories rm (1) - remove files or directories rmdir (1) - remove empty directories % man rmdir RMDIR(1) User Commands RMDIR(1) NAME rmdir - remove empty directories SYNOPSIS rmdir [OPTION]... DIRECTORY... DESCRIPTION Remove the DIRECTORY(ies), if they are empty. : National Resource for Biomedical Supercomputing - An NIH Supported Research Center
UNIX Text Editors • emacs – GNU UNIX editor • vi – Traditional UNIX editor • pico –A simple editor • To use full-screen capabilities, terminal type usually needs to be a set to a “vt100” • setenv TERM vt100; tset vt100 National Resource for Biomedical Supercomputing - An NIH Supported Research Center
Which Editor Should You Use? • Use the editor that you are most familiar with! • emacs: • Powerful, works on Unix and some non Unix systems • Moderately easy to master • vi • Powerful, will be on every Unix system • Not intuitive, fairly difficult to master. • pico • Simple, intuitive, easy to learn National Resource for Biomedical Supercomputing - An NIH Supported Research Center
emacs • To Edit a file named <filename> enter: • emacs <filename> • To navigate: • <arrows keys> - Move cursor 1 space • <delete> - Delete character • To quit with or without saving: • <cntrl> X <cntrl> C • Then answer Y or N • For more information see: • http://www.gnu.org/software/emacs/ National Resource for Biomedical Supercomputing - An NIH Supported Research Center
vi • To Edit a file named <filename> enter: • vi <filename> • vi has two modes “navigation” mode (default) and “insertion” mode • To insert text, one must be in “insertion” mode. Several keys (i,a,o) will place you into insertion mode. • To leave the insertion mode, hit [esc] key. National Resource for Biomedical Supercomputing - An NIH Supported Research Center
vi (continued) • Commonly used vi keys: [arrows] - Move cursor dd - delete line h - Move cursor left dl - delete letter l - Move cursor right dw - delete word k - Move cursor up [esc] - stop insertion j - Move cursor down :wq - write then quit i - insert at cursor :q! - quit a - insert after cursor o - insert below line National Resource for Biomedical Supercomputing - An NIH Supported Research Center
pico • Based on editor in the Pine email program • To edit a file named <filename> enter: • pico <filename> • To navigate: • <arrows keys> - Move cursor 1 space • <delete> - Delete character • To quit with or without saving: • <cntrl> X • Then answer Y or N National Resource for Biomedical Supercomputing - An NIH Supported Research Center