High Performance Computing

MPI and C-Language Seminars 2010 High Performance Computing

Seminar Plan • Week 1 – Introduction, Data Types, Control Flow, Pointers • Week 2 – Arrays, Structures, Enums, I/O, Memory • Week 3 – Compiler Options and Debugging • Week 4 – MPI in C and Using the HPSG Cluster • Week 5 – “How to Build a Performance Model” • Week 6-9 – Coursework Troubleshooting(Seminar tutors available in their office)

MPI in C

Introduction to MPI • MPI – Message passing interface, is an extension C to allow processors to communicate with each other. • No need for a shared memory space – All data passes via messages. • Every processor can send to every other processor but data must explicitly be received. • Processors are kept synchronised by barriers.

MPI Hello World (1/2) • The most basic of MPI programs: #include <stdio.h> #include <mpi.h> int main(intargc, char *argv[]) int rank, size; MPI_Init(&argc, &argv); /* starts MPI */MPI_Comm_rank(MPI_COMM_WORLD, &rank); /* get current process id */MPI_Comm_size (MPI_COMM_WORLD, &size); /* get processor count*/ printf( "Hello world from process %d of %d\n", rank, size ); MPI_Finalize();return 0; }

MPI Hello World (2/2) • The MPI environment is established via the MPI_Init call. • MPI_COMM_WORLD Is the default communicator. • Defined as a group of processors • MPI_Comm_size Is the number of processors in that communicator, for MPI_COMM_WORLD this represents all the processors. • MPI_Comm_rank Is the position of that processor within the communicator provided.

Compiling MPI • MPI has multiple different compilers for implementations in different languages we only need the C compiler. • mpicc – C based compiler – For us GCC • mpiCC / mpicxx / mpic++ – C++ based • mpif90 / mpif77 – Fortran based • Compiling is done in the same way as C. mpicc –o myprogramhelloworld.c

Running MPI • Once compiled an MPI program must be run with mpirun. mpirun –np 2 myprogram • Where 2 is the number of processors to run on. • As there is no synchronisation in the program the order of the print statements is non deterministic. • Note: Killing MPI jobs without letting them call MPI_Finalize may result in stray threads.

Environment Variables • MPI and GCC are installed remotely and their paths need to be added to your environment variables. • The Module package allows you to quickly load and unload working environments. • Module is installed on the cluster(Deep Thought) • ‘module avail’ – List all available modules. • ‘module load gnu/openmpi’ – Loads gcc-4.3 and openmpi. • ‘module list’ – Shows currently loaded modules. • ‘module unload gnu/openmpi’ – Unloads the module.

Message Passing in MPI

MPI_Send • MPI_Send- Basic method of passing data. • Each MPI_Send must have a matching MPI_Recv. • MPI_Send(message, length, data type, destination, tag, communicator); • Message – Actual data in the form of a pointer. • Length – Number of elements in the message. • Data Type – The MPI Data type of each element in the message. • Destination – Rank of the processor to receive the message. • Tag – Identifier for when sending multiple messages. • Communicator – Processor group (MPI_COMM_WORLD).

MPI_Recv • Required for MPI_Send. • MPI_Recv(message, length, data type, source, tag, communicator, status); • Message – Pointer to memory address to store the data. • Length – Number of elements in the message. • Data Type – The MPI Data type of each element in the message. • Source – Rank of the processor to sending the message. • Tag – Identifier for when sending multiple messages. • Communicator – Processor group (MPI_COMM_WORLD). • Status – A structure to hold the status of the send/recv.

Message Passing Example • Processors sending data from process 0 to 1. int size, rank, tag=0; intmyarray[3]; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); if(rank == 0){ myarray[0] = 1; myarray[1] = 2; myarray[2] = 3; MPI_Send(myarray, 3, MPI_INT, 1, tag, MPI_COMM_WORLD); }else{ MPI_Recv(myarray, 3, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); } MPI_Finalize();

Process Synchronisation • Need to ensure that all processes are at the same point of execution. • Implicit and explicit definitions; • Barriers or blocking communications. • MPI_Barrier(MPI_COMM_WORLD); • Waits for all processors before any continue. • MPI_Send / MPI_Recv • Wait for the other process to finish receiving before continuing.

Non-Blocking Communication • MPI_Isend / MPI_Irecv instead of MPI_Send / MPI_Recv. • ‘I’ stands for immediately – The calling process returns immediately regardless of the status of the actual operation. • MPI_Isend – Allows you to continue processing while the send happens. • MPI_Irecv – You must check the data has arrived before using it.

Accessing the Cluster

Deepthought – IBM Cluster • 42 nodes. • 2 Cores per node (Pentium III – 1.4Ghz). • 2GB RAM per node. • Myrinet fibre-optics interconnect. • ssh hpc06XXXXX@deepthought.dcs.warwick.ac.uk • scp ./karman.tar.gz hpc06XXXXX@deepthought.dcs.warwick.ac.uk:/path/ • Headnode ( Frankie ) – Not to be used for running jobs. • All MPI jobs on Frankie will be killed

PBS (1/3) • We use Torque(OpenPBS) and MAUI (Scheduler) . • Listing jobs in the queue: fjp@frankie:~$ qstat–a frankie:Req'dReq'dElapJob ID Username QueueJobnameSessID NDSTSK Memory TimeS Time------- ------- -------- ---------------- ------ ----- --- ------ ----- - -----27613.frankie sdhhpsg octave113631 ---- 3000: R 68:4427614.frankie sdhhpsgoctave114341 ---- 3000: R 68:41 • Status Flags: • Q – Queued. • R – Running. • E – Ending (Staging out of files) – NOT Error!!!! • C – Complete.

PBS (2/3) • Submitting a Job: • From file: qsub –V –N <name> -l nodes=x:ppn=y submit.pbs • An interactive Job: qsub –V –N <name> –l nodes=x:ppn=y -I • Submit Files: #!/bin/bash #PBS –V cd $PBS_O_WORKDIR mpirun ./myprog • Deleting a job: qdel <jobid>

PBS (3/3) • Node information: fjp@frankie:~$ pbsnodes–a vogon0.deepthought.hpsg.dcs.warwick.ac.ukstate = job-exclusivenp = 2properties = vogonntype = clusterjobs = 0/27613.frankie, 1/27614.frankiestatus = ......... • Standard Output and Error. • For interactive jobs is as normal. • Batch Jobs : • Output File - <jobname/ submit file name>.o<jobid> • Error File - <jobname/ submit file name>.e<jobid> • File I/O takes place as usual. • Concurrent file writes (same file) can be problematic – avoid.

Queues • Different queues are specified to have access to different resources with different priorities. • Debug queue – High priority low core count(~4) – need to use: qsub -q debug .... • Interactive queue – High priority medium core count(~8) - no need to specify a queue. • Batch queue – Normal priority high core count(~64).

Warning • Shared resource - Don’t leave it until the last minute. • The queue can get very busy. • Don’t leave interactive jobs running when not in use. • Once again – Do not run jobs on Frankie!

High Performance Computing