Download Presentation
## Algorithm Analysis

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Algorithm Analysis**• We will now see how we can examine code to determine the run time • We will look at the run time of: • operator statements • +, -, =, +=, ++, etc. • control statements • for, if, while, do-while, switch • functions • recursive functions**Algorithm Analysis**• Programming languages such as Fortran, Pascal, Matlab, Java, and C# are sometimes called high-level programming languages • The term high refers to the distance away from the actual architecture of the computer**Algorithm Analysis**• Assembly language, because there is a direct one-to-one translation between assembly instructions and machine instructions is therefore termed a low-level programming language • Again, the term low suggests that the language is close to the architecture of the computer**Algorithm Analysis**• One property of high-level programming languages is that they contain statements which are not immediately translatable into a fixed number of machine instructions • For example, integer exponentiation is (usually) not implemented as a single machine instruction on RISC processors**Algorithm Analysis**• The C programming language is called a medium-level programming language, as it was designed with machine instructions in mind • basically, it is a front end to assembly language programming • There is, generally, an onto relationship between operations in C (+, -, =, ++, etc.) and machine instructions**Algorithm Analysis**• A programmer competent at both C and assembly language can often determine the corresponding assembly instructions which a statement will be translated to • You will be able, with some limitations, to do this after this course**Analysis of Operations**• Because each machine instruction can be executed in a fixed number of cycles, we may assume each operation requires a fixed number of cycles**Analysis of Operations**• Because each instruction can be executed in a constant amount of time, we can say that Top(n) = Q(1)**Analysis of Operations**• For example, any of the following statements are Q(1): • retrieving or storing variables from memory • integer operations + - * / % ++ -- • logical operations && || ! • bitwise operations & | ^ ~ • relational operations == != < <= => > • function calls and returns • object creation and destruction new delete**Analysis of Operations**• Of these, however, the slowest is object creation, an operation on average 100× slower than, say, bitwise and (&) which translates to a single machine instruction • The new operator requires a call to the operating system (OS) for new memory • The delete operator also requires a call to the OS, however, it’s slightly quicker**Analysis of Operations**• Consequently, if we are running a fixed number of statements, all of which simply use operators, the run time is Q(1)**Analysis of Control Statements**• We will look at: • conditional statements if statements • repetition statements for loops while loops**Analysis of Control Statements**• Given if ( condition ) { // true body } else { // false body } • The run time of a conditional statement is: • the run time of the condition (the test), plus • the run time of the body which is run**Analysis of Control Statements**• In most cases, the run time of the condition is Q(1), • Thus,**Analysis of Control Statements**• In some cases, it is easy to determine which statement must be run: int factorial ( int n ) { if ( n == 0 ) { return 1; } else { return n * factorial ( n – 1 ); } }**Analysis of Control Statements**• In others, it is less obvious • Suppose we are attempting to find the maximum entry in an array: int find_max( int * array, int n ) { max = array[0]; for ( int i = 1; i < n; ++i ) { if ( array[i] > max ) { max = array[i]; } } return max; }**Analysis of Statements**• In this case, we don’t know • If we had information about the distribution of the entries of the array, we may be able to determine it • if the list is sorted (ascending) it will always be run • if the list is sorted (descending) it will be run once • if the list is uniformly randomly distributed, then???**Analysis of Repetition Statements**• Next, we will look at for loops • We will look at a few cases: • the repetition statements are all Q(1) • the body does not depend on the variable • the body depends on the variable • the repetition statements are not Q(1)**Analysis of Repetition Statements**• The initialization, condition, and increment statements are usually Q(1) • For example, for ( int i = 0; i < n; ++i ) { // ... } Thus, the run time is at W(1) , that is, at least the initialization and one condition must occur**Analysis of Repetition Statements**• If the body does not depend on the variable (in this example, i), then the run time of for ( int i = 0; i < n; ++i ) { // code which is Theta(f(n)) } is Q( 1 + n(1 + f(n)) ) • If the body is O(f(n)), then the run time of the loop is O( 1 + n(1 + f(n)) )**Analysis of Repetition Statements**• For example, int sum = 0; for ( int i = 0; i < n; ++i ) { sum += 1; Theta(1) } • This code has run time Q( 1 + n(1 + 1) ) = Q(n)**Analysis of Repetition Statements**• Another example example, int sum = 0; for ( int i = 0; i < n; ++i ) { for ( int j = 0; j < n; ++j ) { sum += 1; Theta(1) } } • The previous example showed that the inner loop is Q(n), thus the outer loop is Q( 1 + n(1 + n) ) = Q(1 + n + n2) = Q(n2)**Analysis of Repetition Statements**• Suppose with each loop, we search an array of size m: for ( int i = 0; i < n; ++i ) { binary_search( i, array, m ); } • The inner loop is O(m) and thus the outer loop is O( 1 + n(1 + m) ) = O(nm)**Analysis of Repetition Statements**• Whenever a statement such as: the statement isO(nm) the assumption is that n and m are non-zero**Analysis of Repetition Statements**• If the body does depends on the variable (in this example, i), then the run time of for ( int i = 0; i < n; ++i ) { // code which is Theta(f(i,n)) } is and if the body isO(f(i, n)), the result is**Analysis of Repetition Statements**• For example, int sum = 0; for ( int i = 0; i < n; ++i ) { for ( int j = 0; j < i; ++j ) { sum += i + j; } } • The inner loop is O(1 + i(1 + 1) ) = Q(i) hence the outer is**Analysis of Repetition Statements**• As another example: int sum = 0; for ( int i = 0; i < n; ++i ) { for ( int j = 0; j < i; ++j ) { for ( int k = 0; k < j; ++k ) { sum += i + j + k; } } } • From inside to out: Q(1) Q(j) Q(i2) Q(n3)**Analysis of Repetition Statements**• If, however, either of the: • initialization, • condition, or • increment steps are not Q(1), then we must do a little more work...**Analysis of Repetition Statements**• Given the loop for ( O(finit); O(fcond); O(fincr) ) { O(g) } which runs n times, the run time is: O(finit + (n + 1)fcond + n(fincr + g) )**Analysis of Repetition Statements**• The justification for O(finit + (n + 1)fcond + n(fincr + g) ) is • the initialization finit occurs only once • the test must be performed n + 1 times (returning false on the last iteration), and • the increment and body are run n times**Analysis of Serial Operations**• Suppose we run one block of code followed by another block of code • Such code is said to be run serially • If the first block of code is O(f(n)) and the second is O(g(n)), then the run time of two blocks of code is O( f(n) + g(n) ) which usually (for algorithms not including function calls) simplifies to one or the other**Analysis of Functions**• A function (or subroutine) is code which has been separated out, either to: • and repeated operations • e.g., mathematical functions • group related tasks • e.g., initialization**Analysis of Functions**• Because a subroutine (function) can be called from anywhere, we must: • prepare the appropriate environment • deal with arguments (parameters) • jump to the subroutine • execute the subroutine • deal with the return value • clean up**Analysis of Functions**• Fortunately, this is such a common task for processors that all processors of today have instructions which allow perform most of these steps with a single instruction • Thus, we will assume that the overhead required to make a function call and to return is O(1) an • We will discuss this later**Analysis of Functions**• Because any function requires the overhead of a function call and return, we will always assume that Tf = W(1) • That is, it is impossible for any function call to have a zero run time**Analysis of Functions**• Thus, given a function f(n) (the run time of which depends on n) we will associate the run time of f(n) by some function Tf(n) • We may write this to T(n) • Because the run time of any function is at least O(1), we will include the time required to both call and return from the function in the run time**Analysis of Functions**• Thus, if we have the function int f( int n ) { g( n ); for ( int i = 0; i < n; ++i ) { h(n); } if ( k(n) ) { m(); } return 0; }**Analysis of Functions**• The run time would be: Tf(n) = Tg(n) + nTh(n) + Tk + pk() = true Tm(n) int f( int n ) { g( n ); for ( int i = 0; i < n; ++i ) { h(n); } if ( k() ) { m(n); } return 0; }**Recursive Functions**• A function is relatively simple (and boring) if it simply performs operations and calls other functions • Most interesting functions designed to solve problems usually end up calling themselves • Such a function is said to be recursive**Recursive Functions**• As an example, we could implement the factorial function recursively: int factorial( int n ) { if ( n <= 1 ) { return 1; } else { return n * factorial( n – 1 ); } } int factorial( int n ) { return (n <= 0) ? 1 : n * factorial( n – 1 ); }**Recursive Functions**• Thus, we may analyze the run time of this function as follows: • We don’t have to worry about the time of the conditional (Q(1)) nor is there a probability involved with the conditional statement**Recursive Functions**• The analysis of the run time of this function yields a recurrence relation: T!(n) = T!(n – 1) + Q(1) T!(1) = Q(1) • In your calculus courses, you have seen recurrence relations, however, you did not use Landau symbols**Recursive Functions**• Fortunately, we can replace each Landau symbol with a representative from that equivalence class • The behaviour of that representative will be indicative of the behaviour of all functions in that equivalence class**Recursive Functions**• Thus, we replace Q(1) with 1, and if we had Q(n) we could replace it with n • The asymptotic behaviour of the function would be would be big-Q of the result of the recurrence relation • If any of the times were big-O or big-W (exclusively one or the other) then the run time would be big-O or big-W of the result, respectively**Recursive Functions**• Thus, to find the run time of the factorial function, we need to solve T!(n) = T!(n – 1) + 1 T!(1) = 1 • The easy way to solve this is with Maple: > rsolve( {T(n) = T(n – 1) + 1, T(1) = 1}, T(n) ); n • Thus, T!(n) = Q(n)**Recursive Functions**• Unfortunately, you don’t have Maple on the examination, thus, we can examine the first few steps: T!(n) = T!(n – 1) + 1 = T!(n – 2) + 1 + 1 = T!(n – 2) + 2 = T!(n – 3) + 3 • From this, we see a pattern: T!(n) = T!(n – k) + k**Recursive Functions**• If k = n – 1 then T!(n) = T!(n – (n – 1)) + n – 1 = T!(1) + n – 1 = 1 + n – 1 = n • Thus, T!(n) = Q(n)**Recursive Functions**• Suppose we want to sort a array of n items • We could: • go through the list and find the largest item • swap the last entry in the list with that largest item • then, go on and sort the rest of the array**Recursive Functions**void sort( int * array, int n ) { if ( n <= 1 ) { return; // special case: 0 or 1 items are always sorted } int posn = 0; // assume the first entry is the smallest int max = array[posn]; for ( int i = 1; i < n; ++i ) { // search through the remaining entries if ( array[i] > max ) { // if a larger one is found posn = i; // update both the position and value max = array[posn]; } } int tmp = array[n - 1]; // swap the largest entry with the last array[n - 1] = array[posn]; array[posn] = tmp; sort( array, n – 1 ); // sort everything else }