Performance Measurement

Performance Measurement CSE, POSTECH

Program Performance • Recall that the program performance is the amount of computer memory and time needed to run a program. • Analytically - performance analysis • Experimentally - performance measurement • The performance of a program depends on • the number and type of operations performed, and • the memory access pattern for the data and instructions

Performance Analysis • Paper and pencil. • Do NOT need a working computer program or even a computer.

Some Uses of Performance Analysis Why do want to do a performance analysis of algorithms? • To determine the practicality of algorithm • To predict run time on large instance • To compare two algorithms that have different asymptotic complexity - e.g.,O(n) and O(n2)

Limitations of Performance Analysis • Does NOT account for constant factors. • But constant factors may dominate 1000n vs. n2 especially if we are interested only in n < 1000 • Modern computers have a hierarchical memory organization with different access times for memory at different levels of the hierarchy.

8-32 1C 32KB 2C 512KB 10C 512MB 100C MAIN Memory Hierarchy L2 L1 ALU R • C = CPU cycle • Read Sections 4.5.1 & 4.5.2

Limitations of Performance Analysis • Performance analysis does not account for this difference in memory access times. • Programs that do more work may take less time than those that do less work. • e.g., a program with a large operation count and a small number of accesses to slow memory may take less time than a program with a small operation count and a large number of accesses to slow memory

Performance Measurement • Concerned with obtaining the actual space and time requirements of a program • Actual space and time are dependent on • Compiler and options • Specific computer • We do not generally consider run-time space requirements (read the reasons on page 122)

Performance Measurement Needs (1) • programming language • working program • computer • compiler and options to use g++ –O, –O2, -O3 (see manual pages for g++)

Performance Measurement Needs (2) • data to use for measurement • worst-case data • best-case data • average-case data • What is the worst-case, best-case, average-case data for insertionSort and how do you generate them? • timing mechanism --- clock

Choosing Instance Size • We decide on which values of instance size (n) to use according to two factors: • the amount of time we want to perform • what we expect to do with the times • In practice, we generally need the times for more than three values of n (read the reasons on page 123)

Timing in C++ double clocksPerMillis = double(CLOCKS_PER_SEC) / 1000; // clock ticks per millisecond clock_t startTime = clock(); // code to be timed comes here double elapsedMillis = (clock() – startTime) / clocksPerMillis; // elapsed time in milliseconds

Shortcoming • See Program 4.1 and its execution times in Figure 4.1 (what is wrong with these execution times?)  the time needed for the worst case sorts is too small for clock() to measure • Clock accuracy • assume the clock is accurate to within 100 ticks • If the method returns the time of t, the actual time lies between max{0,t-100} and t+100 • For Figure 4.1, the actual time could be between 0-100

Shortcoming • Repeat work many times to bring total time to be >= 1000 ticks • See Program 4.2 • What is the difference between Prog 4.1 & 4.2? • See Figures 4.2 & 4.3 • See Figure 4.4 (overhead measurement)

Accurate Timing clock_t startTime = clock(); long numberofRepetitions; do { numberofRepetitions++; doSomething(); } while (clock() - startTime < 1000) double elapsedMillis = (clock()- startTime) / clocksPerMillis; double timeForCode = elapsedMillis/numberofRepetitions;

Accuracy • Now accuracy is 10%. • First reading may be just about to change to startTime + 100 • Second reading (final value of clock()) may have just changed to finishTime • so finishTime - startTime is off by 100 ticks

Accuracy • First reading may have just changed to startTime • Second reading may be about to change to finishTime + 100 • so finishTime - startTime is off by 100 ticks

Accuracy • Examining remaining cases, we get trueElapsedTime = finishTime - startTime +- 100 ticks • To ensure 10% accuracy, require elapsedTime = finishTime – startTime >= 1000 ticks

What is wrong with the following measurement? long numberOfRepetitions = 0; // Program 4.3 clock_t elapsedTime = 0; do { numberOfRepetitions++; clock_t startTime = clock( ); doSomething(); elapsedTime += clock( ) - startTime; } while (elapsedTime < 1000); // repeat until enough time has elapsed

Answer to Ch. 4, Exercise 1 In each iteration of the do-while loop, the amount added to elapsedTime may deviate from the actual run time of doSomething by up to 100 ms (or 100 ticks). This error is additive over the iterations and so does not decline as a fraction of total time. For example, suppose that doSomething takes almost 100 ms. to execute. In the worst case, the clock reading will change just before each execution of the assignment startTime = clock() and the amount added to elapsedTime is zero on each iteration of the do-while loop; the do-while loop does not terminate. • How do we fix this?

Time Measurement in Time Shared Systems • UNIX • time MyProgram • See man pages for time • Do Exercise 4.2 • Read Chapter 4

Performance Measurement