320 likes | 465 Vues
CS 179: GPU Programming. Lab 7 Recitation : The MPI/CUDA Wave Equation Solver. MPI/CUDA – Wave Equation. Big idea: Divide our data array between n processes!. MPI/CUDA – Wave Equation. Problem if we’re at the boundary of a process!. t+1. t. t -1. x.
E N D
CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver
MPI/CUDA – Wave Equation • Big idea: Divide our data array between n processes!
MPI/CUDA – Wave Equation • Problem if we’re at the boundary of a process! t+1 t t-1 x Where do we get ? (It’s outside our process!)
Wave Equation – Simple Solution • After every time-step, each process gives its leftmost and rightmost piece of “current” data to neighbor processes! Proc1 Proc4 Proc0 Proc2 Proc3
Wave Equation – Simple Solution • Pieces of data to communicate: Proc1 Proc4 Proc0 Proc2 Proc3
Wave Equation – Simple Solution • Can do this with MPI_Irecv, MPI_Isend, MPI_Wait: • Suppose process has rank r: • If we’re not the rightmost process: • Send data to process r+1 • Receive data from process r+1 • If we’re not the leftmost process: • Send data to process r-1 • Receive data from process r-1 • Wait on requests
Wave Equation – Simple Solution • Boundary conditions: • Use MPI_Comm_rank and MPI_Comm_size • Rank 0 process will set leftmost condition • Rank (size-1) process will set rightmost condition
Simple Solution – Problems • Communication can be expensive! • Expensive to communicate every timestep to send 1 value! • Better solution: Send some m values every mtimesteps!
Possible Implementation • Initial setup: (Assume 3 processes) Proc2 Proc0 Proc1
Possible Implementation • Give each array “redundant regions” • (Assume communication interval = 3) Proc2 Proc0 Proc1
Possible Implementation • Every (3) timesteps, send some of your data to neighbor processes!
Possible Implementation • Send “current” data (current at time of communication) Proc2 Proc0 Proc1
Possible Implementation • Then send “old” data Proc2 Proc0 Proc1
Then… • Do our calculation as normal, if we’re not at the ends of our array • Our entire array, including redundancies!
What about corruption? • Suppose we’ve just copied our data… (assume a non-boundary process) • . = valid • ? = garbage • ~ = doesn’t matter • (Recall that there exist only 3 spaces – gray areas are nonexistent in our current time
What about corruption? • Calculate new data… • Value unknown!
What about corruption? • Time t+1: • Current -> old, new -> current (and space for old is overwritten by new…)
What about corruption? • More garbage data! • “Garbage in, garbage out!”
What about corruption? • Time t+2…
What about corruption? • Even more garbage!
What about corruption? • Time t+3… • Core data region - corruption imminent!?
What about corruption? • Saved! • Data exchange occurs after communication interval has passed!
Boundary Conditions • Applied only at the leftmost and rightmost process!
Boundary corruption? • Examine left-most process: • We never copy to it, so left redundant region is garbage! (B = boundary condition set)
Boundary corruption? • Calculation brings garbage into non-redundant region!
Boundary corruption? • …but boundary condition is set at every interval!
Other details • To run programs with MPI, use the “mpirun” command, e.g. mpirun -np (number of processes) (your program and arguments) • CMS machines: Add this to your .bashrc file: alias mpirun=/cs/courses/cs179/openmpi-1.6.4/bin/mpirun
Common bugs (and likely causes) • Lock-up (it seems like nothing’s happening): • Often an MPI issue – locks up on MPI_Waitbecause some request wasn’t fulfilled • Check that all sends have corresponding receives • Your wave looks weird: • Likely cause 1: Garbage data is being passed between processes • Likely cause 2: Redundant regions aren’t being refreshed and/or are contaminating non-redundant regions
Common bugs (and likely causes) • Your wave is flat-zero: • Left boundary condition isn’t being initialized and/or isn’t propagating • Same reasons as previous
Common bugs (and likely causes) • General debugging tips: • Run at MPI with process number = 1 or 2 • Set kernel to write constant value