Algorithm Analysis

Algorithm Analysis

Algorithm Analysis • We will now see how we can examine code to determine the run time • We will look at the run time of: • operator statements • +, -, =, +=, ++, etc. • control statements • for, if, while, do-while, switch • functions • recursive functions

Algorithm Analysis • Programming languages such as Fortran, Pascal, Matlab, Java, and C# are sometimes called high-level programming languages • The term high refers to the distance away from the actual architecture of the computer

Algorithm Analysis • Assembly language, because there is a direct one-to-one translation between assembly instructions and machine instructions is therefore termed a low-level programming language • Again, the term low suggests that the language is close to the architecture of the computer

Algorithm Analysis • One property of high-level programming languages is that they contain statements which are not immediately translatable into a fixed number of machine instructions • For example, integer exponentiation is (usually) not implemented as a single machine instruction on RISC processors

Algorithm Analysis • The C programming language is called a medium-level programming language, as it was designed with machine instructions in mind • basically, it is a front end to assembly language programming • There is, generally, an onto relationship between operations in C (+, -, =, ++, etc.) and machine instructions

Algorithm Analysis • A programmer competent at both C and assembly language can often determine the corresponding assembly instructions which a statement will be translated to • You will be able, with some limitations, to do this after this course

Analysis of Operations • Because each machine instruction can be executed in a fixed number of cycles, we may assume each operation requires a fixed number of cycles

Analysis of Operations • Because each instruction can be executed in a constant amount of time, we can say that Top(n) = Q(1)

Analysis of Operations • For example, any of the following statements are Q(1): • retrieving or storing variables from memory • integer operations + - * / % ++ -- • logical operations && || ! • bitwise operations & | ^ ~ • relational operations == != < <= => > • function calls and returns • object creation and destruction new delete

Analysis of Operations • Of these, however, the slowest is object creation, an operation on average 100× slower than, say, bitwise and (&) which translates to a single machine instruction • The new operator requires a call to the operating system (OS) for new memory • The delete operator also requires a call to the OS, however, it’s slightly quicker

Analysis of Operations • Consequently, if we are running a fixed number of statements, all of which simply use operators, the run time is Q(1)

Analysis of Control Statements • We will look at: • conditional statements if statements • repetition statements for loops while loops

Analysis of Control Statements • Given if ( condition ) { // true body } else { // false body } • The run time of a conditional statement is: • the run time of the condition (the test), plus • the run time of the body which is run

Analysis of Control Statements • In most cases, the run time of the condition is Q(1), • Thus,

Analysis of Control Statements • In some cases, it is easy to determine which statement must be run: int factorial ( int n ) { if ( n == 0 ) { return 1; } else { return n * factorial ( n – 1 ); } }

Analysis of Control Statements • In others, it is less obvious • Suppose we are attempting to find the maximum entry in an array: int find_max( int * array, int n ) { max = array[0]; for ( int i = 1; i < n; ++i ) { if ( array[i] > max ) { max = array[i]; } } return max; }

Analysis of Statements • In this case, we don’t know • If we had information about the distribution of the entries of the array, we may be able to determine it • if the list is sorted (ascending) it will always be run • if the list is sorted (descending) it will be run once • if the list is uniformly randomly distributed, then???

Analysis of Repetition Statements • Next, we will look at for loops • We will look at a few cases: • the repetition statements are all Q(1) • the body does not depend on the variable • the body depends on the variable • the repetition statements are not Q(1)

Analysis of Repetition Statements • The initialization, condition, and increment statements are usually Q(1) • For example, for ( int i = 0; i < n; ++i ) { // ... } Thus, the run time is at W(1) , that is, at least the initialization and one condition must occur

Analysis of Repetition Statements • If the body does not depend on the variable (in this example, i), then the run time of for ( int i = 0; i < n; ++i ) { // code which is Theta(f(n)) } is Q( 1 + n(1 + f(n)) ) • If the body is O(f(n)), then the run time of the loop is O( 1 + n(1 + f(n)) )

Analysis of Repetition Statements • For example, int sum = 0; for ( int i = 0; i < n; ++i ) { sum += 1; Theta(1) } • This code has run time Q( 1 + n(1 + 1) ) = Q(n)

Analysis of Repetition Statements • Another example example, int sum = 0; for ( int i = 0; i < n; ++i ) { for ( int j = 0; j < n; ++j ) { sum += 1; Theta(1) } } • The previous example showed that the inner loop is Q(n), thus the outer loop is Q( 1 + n(1 + n) ) = Q(1 + n + n2) = Q(n2)

Analysis of Repetition Statements • Suppose with each loop, we search an array of size m: for ( int i = 0; i < n; ++i ) { binary_search( i, array, m ); } • The inner loop is O(m) and thus the outer loop is O( 1 + n(1 + m) ) = O(nm)

Analysis of Repetition Statements • Whenever a statement such as: the statement isO(nm) the assumption is that n and m are non-zero

Analysis of Repetition Statements • If the body does depends on the variable (in this example, i), then the run time of for ( int i = 0; i < n; ++i ) { // code which is Theta(f(i,n)) } is and if the body isO(f(i, n)), the result is

Analysis of Repetition Statements • For example, int sum = 0; for ( int i = 0; i < n; ++i ) { for ( int j = 0; j < i; ++j ) { sum += i + j; } } • The inner loop is O(1 + i(1 + 1) ) = Q(i) hence the outer is

Analysis of Repetition Statements • As another example: int sum = 0; for ( int i = 0; i < n; ++i ) { for ( int j = 0; j < i; ++j ) { for ( int k = 0; k < j; ++k ) { sum += i + j + k; } } } • From inside to out: Q(1) Q(j) Q(i2) Q(n3)

Analysis of Repetition Statements • If, however, either of the: • initialization, • condition, or • increment steps are not Q(1), then we must do a little more work...

Analysis of Repetition Statements • Given the loop for ( O(finit); O(fcond); O(fincr) ) { O(g) } which runs n times, the run time is: O(finit + (n + 1)fcond + n(fincr + g) )

Analysis of Repetition Statements • The justification for O(finit + (n + 1)fcond + n(fincr + g) ) is • the initialization finit occurs only once • the test must be performed n + 1 times (returning false on the last iteration), and • the increment and body are run n times

Analysis of Serial Operations • Suppose we run one block of code followed by another block of code • Such code is said to be run serially • If the first block of code is O(f(n)) and the second is O(g(n)), then the run time of two blocks of code is O( f(n) + g(n) ) which usually (for algorithms not including function calls) simplifies to one or the other

Analysis of Functions • A function (or subroutine) is code which has been separated out, either to: • and repeated operations • e.g., mathematical functions • group related tasks • e.g., initialization

Analysis of Functions • Because a subroutine (function) can be called from anywhere, we must: • prepare the appropriate environment • deal with arguments (parameters) • jump to the subroutine • execute the subroutine • deal with the return value • clean up

Analysis of Functions • Fortunately, this is such a common task for processors that all processors of today have instructions which allow perform most of these steps with a single instruction • Thus, we will assume that the overhead required to make a function call and to return is O(1) an • We will discuss this later

Analysis of Functions • Because any function requires the overhead of a function call and return, we will always assume that Tf = W(1) • That is, it is impossible for any function call to have a zero run time

Analysis of Functions • Thus, given a function f(n) (the run time of which depends on n) we will associate the run time of f(n) by some function Tf(n) • We may write this to T(n) • Because the run time of any function is at least O(1), we will include the time required to both call and return from the function in the run time

Analysis of Functions • Thus, if we have the function int f( int n ) { g( n ); for ( int i = 0; i < n; ++i ) { h(n); } if ( k(n) ) { m(); } return 0; }

Analysis of Functions • The run time would be: Tf(n) = Tg(n) + nTh(n) + Tk + pk() = true Tm(n) int f( int n ) { g( n ); for ( int i = 0; i < n; ++i ) { h(n); } if ( k() ) { m(n); } return 0; }

Recursive Functions • A function is relatively simple (and boring) if it simply performs operations and calls other functions • Most interesting functions designed to solve problems usually end up calling themselves • Such a function is said to be recursive

Recursive Functions • As an example, we could implement the factorial function recursively: int factorial( int n ) { if ( n <= 1 ) { return 1; } else { return n * factorial( n – 1 ); } } int factorial( int n ) { return (n <= 0) ? 1 : n * factorial( n – 1 ); }

Recursive Functions • Thus, we may analyze the run time of this function as follows: • We don’t have to worry about the time of the conditional (Q(1)) nor is there a probability involved with the conditional statement

Recursive Functions • The analysis of the run time of this function yields a recurrence relation: T!(n) = T!(n – 1) + Q(1) T!(1) = Q(1) • In your calculus courses, you have seen recurrence relations, however, you did not use Landau symbols

Recursive Functions • Fortunately, we can replace each Landau symbol with a representative from that equivalence class • The behaviour of that representative will be indicative of the behaviour of all functions in that equivalence class

Recursive Functions • Thus, we replace Q(1) with 1, and if we had Q(n) we could replace it with n • The asymptotic behaviour of the function would be would be big-Q of the result of the recurrence relation • If any of the times were big-O or big-W (exclusively one or the other) then the run time would be big-O or big-W of the result, respectively

Recursive Functions • Thus, to find the run time of the factorial function, we need to solve T!(n) = T!(n – 1) + 1 T!(1) = 1 • The easy way to solve this is with Maple: > rsolve( {T(n) = T(n – 1) + 1, T(1) = 1}, T(n) ); n • Thus, T!(n) = Q(n)

Recursive Functions • Unfortunately, you don’t have Maple on the examination, thus, we can examine the first few steps: T!(n) = T!(n – 1) + 1 = T!(n – 2) + 1 + 1 = T!(n – 2) + 2 = T!(n – 3) + 3 • From this, we see a pattern: T!(n) = T!(n – k) + k

Recursive Functions • If k = n – 1 then T!(n) = T!(n – (n – 1)) + n – 1 = T!(1) + n – 1 = 1 + n – 1 = n • Thus, T!(n) = Q(n)

Recursive Functions • Suppose we want to sort a array of n items • We could: • go through the list and find the largest item • swap the last entry in the list with that largest item • then, go on and sort the rest of the array

Recursive Functions void sort( int * array, int n ) { if ( n <= 1 ) { return; // special case: 0 or 1 items are always sorted } int posn = 0; // assume the first entry is the smallest int max = array[posn]; for ( int i = 1; i < n; ++i ) { // search through the remaining entries if ( array[i] > max ) { // if a larger one is found posn = i; // update both the position and value max = array[posn]; } } int tmp = array[n - 1]; // swap the largest entry with the last array[n - 1] = array[posn]; array[posn] = tmp; sort( array, n – 1 ); // sort everything else }

Algorithm Analysis