1 / 56

The OSCAR Cluster System

The OSCAR Cluster System. Tarik Booker CS 370. Topics. Introduction OSCAR Basics Introduction to LAM LAM Commands MPI Basics Timing Examples. Welcome to OSCAR!. Welcome to the free Linux-based clustered system Use multiple computers to create one powerful multi-processor system.

natania
Télécharger la présentation

The OSCAR Cluster System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The OSCAR Cluster System Tarik Booker CS 370

  2. Topics • Introduction • OSCAR Basics • Introduction to LAM • LAM Commands • MPI Basics • Timing • Examples

  3. Welcome to OSCAR! • Welcome to the free Linux-based clustered system • Use multiple computers to create one powerful multi-processor system

  4. Account Setup • Fill in the sign-in sheet • Receive account password (paper slip) • Log in: • Use SSH Only to log into: oscar.calstatela.edu

  5. SSH (Secure Shell) Log In • Use cs370studentxx as your account (where xx = your account number) student30 example:

  6. Environment (LAM) Setup • LAM (Local Access Minicomputer) is the implementation for MPI (Message Passing Interface). • To run your parallel programs, you need to have this running.

  7. Environment (LAM) Setup (2) • After logging in to your account, type (assume ‘>’ is prompt): • >ls • You should have two files: hello.c and hosts • We need to run LAM. Do this by typing: • >lamboot hosts • Note: to see more in-depth loading, type: • >lamboot –v hosts • Both methods are perfectly fine.

  8. Environment (LAM) Setup (3) • LAM should have taken a while to load. (We are starting a LAM process daemon on each node) • After done, verify LAM is running by typing: • >ps –ux • This is merely a list of the processes running on your account. • LAM is now setup and running on your account. (running lam process)

  9. LAM Troubleshooting • If anything happens with your LAM process (i.e. LAM no longer shows up on your process list) use the previous steps to start your LAM process again. • If something is wrong with your LAM process (i.e. LAM is loaded, in the process list, but refuses to run, or runs indefinitely), use the “lamhalt” command, simply: • >lamhalt

  10. Compiling a Parallel Program • Included in your account is the ‘hello.c’ program. We’ll use this as a test program for LAM/MPI. • We will be using the MPI C compiler. Use the command: • >mpicc hello.c • This will compile your parallel program. • To specify the output file, type: • >mpicc hello.c -o hello • This compiles hello.c into the executable called ‘hello.’

  11. Running Your Parallel Program • Running a program through MPI is a bit different than other interfaces. You must use the ‘mpirun’ command and specify the number of nodes used. • The typical usage is: • >mpirun N hello • ‘hello’ is the previous executable from the last slide. The ‘N’ (UPPERCASE!) says to use all nodes. • Note that we don’t have to use all nodes. Try typing: • >mpirun n0-5 hello • (this uses only the nodes between 0 and 5) • (Also try >mpirun n4,5,6 hello)

  12. The MPI Program • Let’s look at hello.c • The two most important functions are: • MPI_Init(MPI_COMM_WORLD); • MPI_Finalize(); • These functions initialize and close the parallel environment (respectively).

  13. LAM Commands • LAM is our specific implementation of MPI • LAM comes with additional non-MPI commands (for node management) • Most not necessary, but useful

  14. lamboot lamboot(hostfile) • Starts LAM Environment • Use –v flag for verbose boot • >lamboot –v hosts

  15. lamhalt • Shuts down lam environment • >lamhalt

  16. mpirun • Runs an mpi program • >mpirun N hello

  17. lamclean • If your program terminates “badly,” use lamclean to delete old processes and allocated resources. • >lamclean

  18. wipe • Stronger version of lamhalt that kills every node on lam • >wipe

  19. laminfo • Detailed information list for LAM environment • >laminfo

  20. lamnodes • List all nodes in the LAM environment • >lamnodes

  21. lamshrink • Remove a node from the LAM environment (without rebooting) • Ex: >lamshrink n3 • (Note: This also invalidates node n3, or leaves an empty slot in its place)

  22. lamgrow • Add a node to the LAM environment (without rebooting) • >lamgrow oscarnode3 • Also: >lamgrow –n 3 oscarnode3 • (Adds oscarnode3 to the previously empty n3 slot. Note the space between n and 3!)

  23. lamexec • Run a non-MPI program in the LAM environment • >lamexec {non-MPI program}

  24. Termination Order of Bad Programs • In the event of a bad termination (program takes up too much memory, doesn’t stop, etc.) use this order of termination: • >lamclean (good) • >lamhalt (better) • >wipe (severe) • >kill –9 [process_number] (nuclear)

  25. Basic MPI Functions • We Covered MPI_Init and MPI_Finalize • MPI_Send • MPI_Recv • MPI_Bcast • MPI_Reduce • MPI_Barrier

  26. Note: • MPI is a Message Passing Interface • Don’t necessarily use Shared Memory • Instead, information is passed around nodes

  27. MPI_Send • Send a variable to another node • MPI_Send(variable, number of variables to send, MPI Data type, node that receives message, MPI Tag, group communicator) • Ex: • MPI_Send(&value, 1, MPI_INT, 2, 0, MPI_COMM_WORLD)

  28. MPI_Recv • Receive a variable from another node • MPI_Recv(variable, number of variables to send, MPI Data type, node sending message, message tag, group communicator, MPI status indicator) • Ex: • MPI_Recv(&value, 1, MPI_INT, 0, 0,MPI_COMM_WORLD, &status • (Note: You must create an MPI_Status variable when using this function)

  29. MPI_Bcast • Broadcasts a variable to all nodes • MPI_Bcast(variable, number of variables, Data type of variable, node that sent broadcast, nodes to send messages to) • Ex: • MPI_Bcast(&value, 1, MPI_INT, 0, MPI_COMM_WORLD)

  30. MPI_Reduce • Collect data at a node • Converge information with a specific operation • MPI_Reduce(variable to send, variable that receives, number of values to receive, MPI Data type, reduction operation, node receiving data, communicator to use) • Ex: • MPI_Reduce(&nodePi, &pi, 1, MPI_FLOAT, MPI_SUM, 0, MPI_COMM_WORLD) • There are many types of reduction operators (not only summation); you can even create your own

  31. MPI_Barrier • Use a barrier in MPI • MPI_Barrier(communicator) • Ex: • MPI_Barrier(MPI_COMM_WORLD)

  32. Timing in MPI • Introduction • Timing functions • What to do

  33. Timing Intro • MPI has timing features • Not computational time but “Wall time” • Ticks

  34. Timing Functions • Wall time function • double MPI_Wtime(void) • Clock Tick Function • double MPI_Wtick(void)

  35. What to do with timing • Select starting point and store time • Select end point and store time • Subtract start from end, and multiply by tick • Use “%.30lf” in printf to display time instance

  36. Code Example start_time = MPI_Wtime(); /* Code that does something */ end_time = (MPI_Wtime() - start_time)*tick);

  37. Programming Examples • Ring • Arctan (Using Gregory’s formula) • Pi (Using Euler’s formula) • Music Program • Mandelbrot Program

  38. The Ring • Pass a variable, one at a time, to each node in the universe (environment) Value Value

  39. Code Example • int main(int argc, char** argv) • { • int size, node; • int value; • MPI_Status status; • MPI_Init(&argc, &argv); • MPI_Comm_size(MPI_COMM_WORLD, &size); • MPI_Comm_rank(MPI_COMM_WORLD, &node); • if(node == 0) • { • printf("Value:"); • scanf("%d", &value); • MPI_Send(&value, 1, MPI_INT, node+1, 0, MPI_COMM_WORLD); • } • else • { • MPI_Recv(&value, 1, MPI_INT, node-1, 0, MPI_COMM_WORLD, &status); • if(node < size - 1) • { • MPI_Send(&value, 1, MPI_INT, node+1, 0, MPI_COMM_WORLD); • } • } • printf("Node %d has %d in value. \n", node, value); • MPI_Finalize(); • return 0; • }

  40. Ring Code Example (2) #include <stdio.h> #include <mpi.h> int main(int argc, char** argv) { int size, node; int value; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &node);

  41. Ring Code Example (3) if(node == 0) { printf("Value:"); scanf("%d", &value); MPI_Send(&value, 1, MPI_INT, node+1, 0, MPI_COMM_WORLD); } else{ MPI_Recv(&value, 1, MPI_INT, node-1, 0, MPI_COMM_WORLD, &status); if(node < size - 1) MPI_Send(&value, 1, MPI_INT, node+1, 0, MPI_COMM_WORLD); } printf("Node %d has %d in value. \n", node, value); MPI_Finalize(); return 0; } Parent node Everyone else receives All nodes but parent and last node send

  42. Let’s Run Ring example…

  43. Computing arctan (tan-1) of x • Using Gregory’s Formula • arctan(x) = x - x3/3 + x5/5 - x7/7 + x9/9 - … • Let’s use MPI to program this formula

  44. Arctan code int main(int argc, char** argv) { int size, node; //MPI variable placeholders int i, j,x; // Loop counters double init_value; double angle = 0.0; double sum = 0.0; int terms; // Number of terms processed double finished_sum = 0.0; MPI_Status status; MPI_Init(&argc, &argv); // Start MPI environment MPI_Comm_size(MPI_COMM_WORLD, &size); //Get MPI size MPI_Comm_rank(MPI_COMM_WORLD, &node); //Get this node number

  45. Arctan code (2) if(node == 0) { printf("Angle:"); scanf("%lf", &angle); printf("Number of arctan terms:"); scanf("%d", &terms); } MPI_Bcast(&terms, 1, MPI_INT, 0, MPI_COMM_WORLD); MPI_Bcast(&angle, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);

  46. Arctan code (3) // Start processing arctan init_value = angle; double middle_sum; for(x=node; x<terms; x=x+size-1) { middle_sum = 0.0; double index = (double)x - 1.0; index = index + x; double temp = init_value; for(i = 0; i<(int)index - 1; ++i) temp = temp * init_value; middle_sum = temp / index; if(x % 2 == 0) middle_sum = middle_sum * -1.0; sum = sum + middle_sum; } if(node==0) sum = 0.0; MPI_Reduce(&sum, &finished_sum, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);

  47. Arctan code (4) MPI_Barrier(MPI_COMM_WORLD); // Wait for all processes if(node == 0) printf(" Arctan of %lf = %.20lf\n",angle, finished_sum); MPI_Finalize(); return 0; }

  48. Let’s run arctan example

  49. Computing Pi • Using Euler’s formula • Pi/4 = arctan(1/2) + arctan(1/3) • Let’s use MPI to compute this value

  50. Computing Pi (2) • Arctan Code is the same • Run twice • Set barrier, then compute 4 * [arctan(1/2) + arctan(1/3)]

More Related