Performance Analysis Tools

Performance Analysis Tools

Performance Analysis Goal • Once we have a working parallel program, we want to tune it to run faster • Hot spot – An area of code that uses a significant amount of CPU time • Bottleneck – An area of code that uses resources inefficiently and slows the program down (e.g. communication)

Timers • One way to identify hot spots and bottlenecks is to use timers. • We’ve used it to measure the elapsed time of the entire algorithm, but this can be used to measure time spent on different parts of the algorithm

Timers

Time command • Usage: time mpirun –np # command • Result real 0m1.071s user 0m0.177s sys 0m0.045s

Time command • Meaning • Real time: the total wall clock (start to finish) time your program took to load, execute, and exit. • User time: the total amount of CPU time your program took to execute. • System time: the amount of CPU time spent on operating system calls in executing your program.

gettimeofday • gettimeofday is a system call that returns a structure that gives the time since Epoch (January 1 1970) int gettimeofday(struct timeval *tv, struct timezone *tz); • The timeval structure has seconds and microseconds: struct timeval { time_t tv_sec; /* seconds */ suseconds_t tv_usec; /* microseconds */ };

gettimeofday • Usage: #include <sys/time.h> struct timeval tv1, tv2; ... gettimeofday(&tv1, NULL); ... // Work to be timed gettimeofday(&tv2, NULL); // Convert time to seconds elapsed_time = (tv2.tv_sec - tv1.tv_sec) + ((tv2.tv_usec - tv1.tv_usec) / 1000000.0);

MPI_Wtime • Returns a single double-precision value that is the number of seconds since some time in the past (most likely Epoch) • MPI also provides a MPI_Wtick() routine that provides the resolution (most likely microseconds)

MPI_Wtime • Usage: #include "mpi.h" ... double start,end,resolution; MPI_Init(&argc, &argv); start = MPI_Wtime(); /* start time */ ... // Work to be timed end = MPI_Wtime(); /* end time */ resolution = MPI_Wtick(); printf("elapsed= %e resolution= %e\n", end-start, resolution);

MPI_Wtime • Sample output: Wallclock times(secs): start= 1384864000.067529 end= 1384864000.074005 elapsed= 6.475925e-03 resolution= 1.000000e-06 Elapsed time (seconds) Accurate to microseconds

read_real_time • read_real_time is a system call that returns a structure that gives the time since Epoch (January 1 1970) int read_real_time(timebasestruct_t *t, size_t size_of_timebasestruct_t); • Designed to measure time accurate to nanoseconds • Guarantee correct time units across different IBM RS/6000 architectures.

read_real_time #include <sys/time.h> ... timebasestruct_t start, finish; int secs, n_secs; read_real_time(&start, TIMEBASE_SZ); /* do some work */ read_real_time(&finish, TIMEBASE_SZ); /* Make sure both values are in seconds and nanoseconds */ time_base_to_time(&start, TIMEBASE_SZ); time_base_to_time(&finish, TIMEBASE_SZ);

read_real_time • Usage continued: ... /* Subtract the starting time from the ending time */ secs = finish.tb_high - start.tb_high; n_secs = finish.tb_low - start.tb_low; /* Fix carry from low-order to high-order during the measurement */ if (n_secs < 0) { secs--; n_secs += 1000000000; } printf("Sample time was %d seconds %d nanoseconds\n", secs, n_secs);

Profilers

prof • Compile your program with the –p option: • gcc –p <program>.c –o <program> • Run the program • Profile file created called mon.out • Run: • prof –m mon.out

Sample Output from prof

gprof • Compile your program with the –p option: • gcc –gp <program>.c –o <program> • Run the program • Profile file created called gmon.out • Run: • gprof <program> gmon.out

Sample Output from gprof ngranularity: Each sample hit covers 4 bytes. Time: 1.17 seconds called/total parents index %time self descendents called+self name index called/total children 0.44 0.72 1/1 .__start [2] [1] 99.1 0.44 0.72 1 .main [1] 0.59 0.13 1024/1024 .fft [3] 0.00 0.00 256/256 .cos [6] 0.00 0.00 256/256 .sin [7] 0.00 0.00 8/8 .gettimeofday [11] 0.00 0.00 7/7 .printf [16] 0.00 0.00 1/1 .atoi [31] 0.00 0.00 1/1 .exit [33]

xprofiler • X Windows profiler based on gprof • Compile and run the program as you would with gprof • Run: • xprofiler <program> gmon.out • Provides a graphical representation of the program execution

Library View

Function View

mpiP • Compile an MPI program with –g: • mpcc -g <program>.c –o <program> -L/usr/local/tools/mpiP/lib -lmpiP -lbfd • Run the MPI program as usual • A file is created called <program>.N.XXXXX.mpiP • Where N is the number of processors and XXXXX is the collector task processor id

Sample output from mpiP @ mpiP @ Command : sphot @ Version : 0.9 @ Build date : Mar 8 2001, 16:22:46 @ Start time : 2001 04 11 16:04:23 @ Stop time : 2001 04 11 16:04:51 @ Number of tasks : 4 @ Collector Rank : 0 @ Collector PID : 30088 @ Event Buffer Size : 100000 @ Final Trace Dir : . @ Local Trace Dir : /usr/tmp @ Task Map : 0 blue333.pacific.llnl.gov 0 @ Task Map : 1 blue334.pacific.llnl.gov 0 @ Task Map : 2 blue335.pacific.llnl.gov 0 @ Task Map : 3 blue336.pacific.llnl.gov 0

Sample output from mpiP ---------------------------------------------------------------- @--- MPI Time (seconds) ---------------------------------------- ---------------------------------------------------------------- Task AppTime MPITime MPI% 0 27.9 7.18 25.73 1 27.9 7.5 26.89 2 27.9 7.78 27.90 3 27.9 7.73 27.72 * 112 30.2 27.06

Sample output from mpiP ---------------------------------------------------------- @--- Callsites: 38 --------------------------------------- ---------------------------------------------------------- ID MPICall ParentFunction Filename Line PC 1 Barrier copyglob copyglob.f 65 10000b9c 2 Barrier copypriv@OL@1 copypriv.f 195 10001cd4 3 Barrier copypriv@OL@2 copypriv.f 237 1000213c 4 Barrier copypriv@OL@3 copypriv.f 279 10002624 5 Barrier copypriv@OL@4 copypriv.f 324 10002b04 6 Barrier sphot sphot.f 269 10008f2c 7 Bcast rdopac rdopac.f 49 10008638 8 Comm_rank copyglob copyglob.f 13 100003a8 9 Comm_rank copypriv copypriv.f 75 10000c38 10 Comm_rank genxsec genxsec.f 37 1000503c 11 Comm_rank rdinput rdinput.f 17 100071d4 …

Sample output from mpiP ----------------------------------------------------------------- @--- Aggregate Time (top twenty, descending, milliseconds) ------ ----------------------------------------------------------------- Call Site Time App% MPI% Bcast 7 1.54e+04 13.79 50.95 Barrier 1 1.42e+04 12.73 47.03 Barrier 2 563 0.50 1.87 Waitall 34 25.7 0.02 0.09 Reduce 25 7.4 0.01 0.02 Barrier 5 2.54 0.00 0.01 Barrier 6 1.55 0.00 0.01 Barrier 4 1.44 0.00 0.00 Comm_rank 13 1.22 0.00 0.00 Barrier 3 1.01 0.00 0.00 Comm_rank 9 0.967 0.00 0.00 …

Sample output from mpiP ------------------------------------------------------------------------- @--- Callsite statistics (all, milliseconds): 102 ----------------------- ------------------------------------------------------------------------- Name Site Rank Count Max Mean Min App% MPI% Barrier 1 0 1 0.087 0.087 0.087 0.00 0.00 Barrier 1 1 1 12.7 12.7 12.7 0.05 0.17 Barrier 1 2 1 7.09e+03 7.09e+03 7.09e+03 25.44 91.17 Barrier 1 3 1 7.09e+03 7.09e+03 7.09e+03 25.44 91.75 Barrier 1 * 4 7.09e+03 3.55e+03 0.087 12.73 47.03 Barrier 2 0 1 0.12 0.12 0.12 0.00 0.00 Barrier 2 1 1 0.29 0.29 0.29 0.00 0.00 Barrier 2 2 1 307 307 307 1.10 3.95 Barrier 2 3 1 255 255 255 0.92 3.31 Barrier 2 * 4 307 141 0.12 0.50 1.87 Send 31 1 1 0.169 0.169 0.169 0.00 0.00 Send 31 2 1 0.341 0.341 0.341 0.00 0.00 Send 31 3 1 0.184 0.184 0.184 0.00 0.00 Send 31 * 3 0.341 0.231 0.169 0.00 0.00 ...

Questions

Performance Analysis Tools

Performance Analysis Tools

Presentation Transcript

Windows Performance Analysis: Using Windows Performance Tools

Performance Analysis and Monitoring Facilities in CPN Tools

Analysis Tools

Performance Analysis

SQL Server Performance and Analysis Tools

Performance Analysis

Performance Tools

Performance Tools

Analysis Tools

Analysis Tools

Analysis Tools

OpenGL Performance Tools

Analysis Tools

Performance Analysis and Debugging Tools

Traditional Performance Tools

Analysis Tools

.NET Profiler Tools for Web Performance Analysis

Performance Analysis Tools

OpenGL Performance Tools

Analysis Tools

Performance Tools