The OSCAR Cluster System

The OSCAR Cluster System Tarik Booker CS 370

Topics • Introduction • OSCAR Basics • Introduction to LAM • LAM Commands • MPI Basics • Timing • Examples

Welcome to OSCAR! • Welcome to the free Linux-based clustered system • Use multiple computers to create one powerful multi-processor system

Account Setup • Fill in the sign-in sheet • Receive account password (paper slip) • Log in: • Use SSH Only to log into: oscar.calstatela.edu

SSH (Secure Shell) Log In • Use cs370studentxx as your account (where xx = your account number) student30 example:

Environment (LAM) Setup • LAM (Local Access Minicomputer) is the implementation for MPI (Message Passing Interface). • To run your parallel programs, you need to have this running.

Environment (LAM) Setup (2) • After logging in to your account, type (assume ‘>’ is prompt): • >ls • You should have two files: hello.c and hosts • We need to run LAM. Do this by typing: • >lamboot hosts • Note: to see more in-depth loading, type: • >lamboot –v hosts • Both methods are perfectly fine.

Environment (LAM) Setup (3) • LAM should have taken a while to load. (We are starting a LAM process daemon on each node) • After done, verify LAM is running by typing: • >ps –ux • This is merely a list of the processes running on your account. • LAM is now setup and running on your account. (running lam process)

LAM Troubleshooting • If anything happens with your LAM process (i.e. LAM no longer shows up on your process list) use the previous steps to start your LAM process again. • If something is wrong with your LAM process (i.e. LAM is loaded, in the process list, but refuses to run, or runs indefinitely), use the “lamhalt” command, simply: • >lamhalt

Compiling a Parallel Program • Included in your account is the ‘hello.c’ program. We’ll use this as a test program for LAM/MPI. • We will be using the MPI C compiler. Use the command: • >mpicc hello.c • This will compile your parallel program. • To specify the output file, type: • >mpicc hello.c -o hello • This compiles hello.c into the executable called ‘hello.’

Running Your Parallel Program • Running a program through MPI is a bit different than other interfaces. You must use the ‘mpirun’ command and specify the number of nodes used. • The typical usage is: • >mpirun N hello • ‘hello’ is the previous executable from the last slide. The ‘N’ (UPPERCASE!) says to use all nodes. • Note that we don’t have to use all nodes. Try typing: • >mpirun n0-5 hello • (this uses only the nodes between 0 and 5) • (Also try >mpirun n4,5,6 hello)

The MPI Program • Let’s look at hello.c • The two most important functions are: • MPI_Init(MPI_COMM_WORLD); • MPI_Finalize(); • These functions initialize and close the parallel environment (respectively).

LAM Commands • LAM is our specific implementation of MPI • LAM comes with additional non-MPI commands (for node management) • Most not necessary, but useful

lamboot lamboot(hostfile) • Starts LAM Environment • Use –v flag for verbose boot • >lamboot –v hosts

lamhalt • Shuts down lam environment • >lamhalt

mpirun • Runs an mpi program • >mpirun N hello

lamclean • If your program terminates “badly,” use lamclean to delete old processes and allocated resources. • >lamclean

wipe • Stronger version of lamhalt that kills every node on lam • >wipe

laminfo • Detailed information list for LAM environment • >laminfo

lamnodes • List all nodes in the LAM environment • >lamnodes

lamshrink • Remove a node from the LAM environment (without rebooting) • Ex: >lamshrink n3 • (Note: This also invalidates node n3, or leaves an empty slot in its place)

lamgrow • Add a node to the LAM environment (without rebooting) • >lamgrow oscarnode3 • Also: >lamgrow –n 3 oscarnode3 • (Adds oscarnode3 to the previously empty n3 slot. Note the space between n and 3!)

lamexec • Run a non-MPI program in the LAM environment • >lamexec {non-MPI program}

Termination Order of Bad Programs • In the event of a bad termination (program takes up too much memory, doesn’t stop, etc.) use this order of termination: • >lamclean (good) • >lamhalt (better) • >wipe (severe) • >kill –9 [process_number] (nuclear)

Basic MPI Functions • We Covered MPI_Init and MPI_Finalize • MPI_Send • MPI_Recv • MPI_Bcast • MPI_Reduce • MPI_Barrier

Note: • MPI is a Message Passing Interface • Don’t necessarily use Shared Memory • Instead, information is passed around nodes

MPI_Send • Send a variable to another node • MPI_Send(variable, number of variables to send, MPI Data type, node that receives message, MPI Tag, group communicator) • Ex: • MPI_Send(&value, 1, MPI_INT, 2, 0, MPI_COMM_WORLD)

MPI_Recv • Receive a variable from another node • MPI_Recv(variable, number of variables to send, MPI Data type, node sending message, message tag, group communicator, MPI status indicator) • Ex: • MPI_Recv(&value, 1, MPI_INT, 0, 0,MPI_COMM_WORLD, &status • (Note: You must create an MPI_Status variable when using this function)

MPI_Bcast • Broadcasts a variable to all nodes • MPI_Bcast(variable, number of variables, Data type of variable, node that sent broadcast, nodes to send messages to) • Ex: • MPI_Bcast(&value, 1, MPI_INT, 0, MPI_COMM_WORLD)

MPI_Reduce • Collect data at a node • Converge information with a specific operation • MPI_Reduce(variable to send, variable that receives, number of values to receive, MPI Data type, reduction operation, node receiving data, communicator to use) • Ex: • MPI_Reduce(&nodePi, &pi, 1, MPI_FLOAT, MPI_SUM, 0, MPI_COMM_WORLD) • There are many types of reduction operators (not only summation); you can even create your own

MPI_Barrier • Use a barrier in MPI • MPI_Barrier(communicator) • Ex: • MPI_Barrier(MPI_COMM_WORLD)

Timing in MPI • Introduction • Timing functions • What to do

Timing Intro • MPI has timing features • Not computational time but “Wall time” • Ticks

Timing Functions • Wall time function • double MPI_Wtime(void) • Clock Tick Function • double MPI_Wtick(void)

What to do with timing • Select starting point and store time • Select end point and store time • Subtract start from end, and multiply by tick • Use “%.30lf” in printf to display time instance

Code Example start_time = MPI_Wtime(); /* Code that does something */ end_time = (MPI_Wtime() - start_time)*tick);

Programming Examples • Ring • Arctan (Using Gregory’s formula) • Pi (Using Euler’s formula) • Music Program • Mandelbrot Program

The Ring • Pass a variable, one at a time, to each node in the universe (environment) Value Value

Code Example • int main(int argc, char** argv) • { • int size, node; • int value; • MPI_Status status; • MPI_Init(&argc, &argv); • MPI_Comm_size(MPI_COMM_WORLD, &size); • MPI_Comm_rank(MPI_COMM_WORLD, &node); • if(node == 0) • { • printf("Value:"); • scanf("%d", &value); • MPI_Send(&value, 1, MPI_INT, node+1, 0, MPI_COMM_WORLD); • } • else • { • MPI_Recv(&value, 1, MPI_INT, node-1, 0, MPI_COMM_WORLD, &status); • if(node < size - 1) • { • MPI_Send(&value, 1, MPI_INT, node+1, 0, MPI_COMM_WORLD); • } • } • printf("Node %d has %d in value. \n", node, value); • MPI_Finalize(); • return 0; • }

Ring Code Example (2) #include <stdio.h> #include <mpi.h> int main(int argc, char** argv) { int size, node; int value; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &node);

Ring Code Example (3) if(node == 0) { printf("Value:"); scanf("%d", &value); MPI_Send(&value, 1, MPI_INT, node+1, 0, MPI_COMM_WORLD); } else{ MPI_Recv(&value, 1, MPI_INT, node-1, 0, MPI_COMM_WORLD, &status); if(node < size - 1) MPI_Send(&value, 1, MPI_INT, node+1, 0, MPI_COMM_WORLD); } printf("Node %d has %d in value. \n", node, value); MPI_Finalize(); return 0; } Parent node Everyone else receives All nodes but parent and last node send

Let’s Run Ring example…

Computing arctan (tan-1) of x • Using Gregory’s Formula • arctan(x) = x - x3/3 + x5/5 - x7/7 + x9/9 - … • Let’s use MPI to program this formula

Arctan code int main(int argc, char** argv) { int size, node; //MPI variable placeholders int i, j,x; // Loop counters double init_value; double angle = 0.0; double sum = 0.0; int terms; // Number of terms processed double finished_sum = 0.0; MPI_Status status; MPI_Init(&argc, &argv); // Start MPI environment MPI_Comm_size(MPI_COMM_WORLD, &size); //Get MPI size MPI_Comm_rank(MPI_COMM_WORLD, &node); //Get this node number

Arctan code (2) if(node == 0) { printf("Angle:"); scanf("%lf", &angle); printf("Number of arctan terms:"); scanf("%d", &terms); } MPI_Bcast(&terms, 1, MPI_INT, 0, MPI_COMM_WORLD); MPI_Bcast(&angle, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);

Arctan code (3) // Start processing arctan init_value = angle; double middle_sum; for(x=node; x<terms; x=x+size-1) { middle_sum = 0.0; double index = (double)x - 1.0; index = index + x; double temp = init_value; for(i = 0; i<(int)index - 1; ++i) temp = temp * init_value; middle_sum = temp / index; if(x % 2 == 0) middle_sum = middle_sum * -1.0; sum = sum + middle_sum; } if(node==0) sum = 0.0; MPI_Reduce(&sum, &finished_sum, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);

Arctan code (4) MPI_Barrier(MPI_COMM_WORLD); // Wait for all processes if(node == 0) printf(" Arctan of %lf = %.20lf\n",angle, finished_sum); MPI_Finalize(); return 0; }

Let’s run arctan example

Computing Pi • Using Euler’s formula • Pi/4 = arctan(1/2) + arctan(1/3) • Let’s use MPI to compute this value

Computing Pi (2) • Arctan Code is the same • Run twice • Set barrier, then compute 4 * [arctan(1/2) + arctan(1/3)]

The OSCAR Cluster System