620 likes | 795 Vues
Tonga Institute of Higher Education Design and Analysis of Algorithms. IT 254 Lecture 2: Mathematical Foundations. Asymptotic Performance Review. Asymptotic performance : How do algorithms behave as the problem size gets very large? Running time Memory/storage requirements
E N D
Tonga Institute of Higher EducationDesign and Analysis of Algorithms IT 254 Lecture 2: Mathematical Foundations
Asymptotic Performance Review • Asymptotic performance: How do algorithms behave as the problem size gets very large? • Running time • Memory/storage requirements • We want to know the number of primitive steps (like MIPS instructions) that are executed • Except for time of executing a function call, most statements roughly require the same amount of time • We can be more exact if need be • Worst case vs. average case
Doing analysis (Asymptotic Notations) • Simplifications • You want to ignore as many things as possible to make things simple. Specifically, we can get rid of constants because they do not change. • Order of growth is what we want to measure • Highest-order term is what counts • As the input size grows larger it is the high order term that dominates • If the higher term dominates, that means we can throw out lower orders. Example: 2n4 + 193n2 + 159304 ==> n4 • When "n" becomes a big number, "n4" will be so much bigger than the other numbers, that they are unimportant
Upper Bound Notation: Big- O • We can say an algorithm's running time is O(n2) • Properly we should say run time is in O(n2) • Read O as “Big-O” (you’ll also hear it as “order”, as in “it runs in order n2”) • Big-O means: • A function f(n) is O(g(n)) if there exists positive constants c and n0so f(n) c g(n) for all n n0 • Formally (using math notation) • O(g(n)) = { f(n): positive constants c and n0such that f(n) c g(n) n n0
Upper-Bound (Big O) • A function f(n) is O(g(n)) if there exists positive constants c and n0so f(n) c g(n) for all n n0 • So, Big-O says that there is a function [g(n)] that is always bigger than your function [f(n)]. (Sometimes you need to multiply g(n) by a constant to make it bigger though) G(n) – the function that is always bigger after n0 F(n) – your function
Big O Information • Let's look at an example of a Big-O function • If we say there is a polynomial with degree k, then we can also say it is O(nk) • Proof: • Suppose f(n) = bknk + bk-1nk-1 + … + b1n + b0 • Let ai = | bi | • f(n) aknk + ak-1nk-1 + … + a1n + a0
Lower Bound Notation • As well as an upper-bound, we can also have a lower bound, called (n) • This is called omega notation (and is for a lower bound) and it says • f(n) is (g(n)) if positive constants c and n0such that 0 cg(n) f(n) n n0 • Proof: • Suppose "a" and "b" are constants and the running time is f(n) = a*n + b • Assume a and b are positive • Then we can also say that: an an + b • And since "a" is a constant we can throw it out and say that f(n) is (n)
Asymptotic Tight Bound • This is a way to bound the function f(n) by another function (called the Theta function) • A function f(n) is (g(n)) if positive constants c1, c2, and n0 so: c1 g(n) f(n) c2 g(n) n n0 • Theorem • f(n) is (g(n)) if and only if f(n) is both O(g(n)) and (g(n))
() is like = Other Notations • A function f(n) is o(g(n)) if positive constants c and n0such that f(n) < c g(n) n n0 • A function f(n) is (g(n)) if positive constants c and n0such that c g(n) < f(n) n n0 • So we can say that: () is like > () is like o() is like < O() is like
Illustrating Complexity We can see how functions react with large data sets. The first picture shows when x = 0..20 and y=0..250 for different functions
Order of Growth Questions • What is the Big-O for: • F(x) = x2 + 0.0034x5 + 1934x3 • F(a) = a! + 2a • What is the Omega Notation for: • F(t) = (t*(3t + 15))/t • If we look at the following: • F(x) = x2 + .001x10 + 200x5 • Which Big-O is correct: O(x2) or O(x5) or O(x10)
Other Math Notations • There are some other mathematical notations and formulas we need to know in order to discuss running times • Logarithms: • lg n = log2n the binary logarithm • ln n = loge n the natural logarithm • logb n => logarithm with base b of n • a = b logb a • lg (ab) = lg a + lg b • lg(a/b) = lg a – lg b • logb an = n logb a • log 1 = 0 • logb (1/a) = -logb a
Other Math Notations • Factorial: • Factorials take a number and multiply all the numbers before it: • Example: 5! = 5*4*3*2*1 = 120 • (n-1)! = (n-1)(n-2)(n-3)…(2)(1) • Conditional Notation • In math, if we want an if/else we write it like this This says that if x <= 2, then do an x2, and if x > 2 then do a x3
Summations • If a function has a while loop or a for loop, the time it takes is the sum of each time it spends inside the loop • Example: for (int k = 0; k < 20; k++) { // do something that takes 10 seconds } • If the time inside the loop takes 10 seconds and we do the loop 20 times, then the loop takes 10*20. • We can write this as a summation, like: The number on the bottom says were to start (like a variable). The number at the top says where to stop. The function is on the right
Summations • Example: • F(a) = a1 + a2 + a3 + a4 • F(a) = • F(x) = 5x + 10x2 + 15x3 + 20x4 + … • F(x) =
Arithmetic and Geometric Series • There are a few special summations that people found out a long time ago. It would be good to be able to recognize them: • Arithmetic Series:
Geometric Series • The geometric series is similar to the arithmetic
Looking at algorithms • With summations we have tools to analyze loops and other conditions. • When we want to look at and analyze an algorithm, we want to find the total number of primitive operations. This means we want to find the total number of adds, multiplies and so on • Many times, we will not know exactly how many there are because the input size is unknown. • In that case, we use the variable "n" to represent an input size. • When we have basic statements, we will often say they run in time O(c) which means that it is a constant time
Looking at Algorithms • Example 1: int main() { 2: int x = 60*200; 3: for (int k = 0; k < n; k++) { 4: x = x*x; 5: } 6: } • We want to look at each piece of the algorithm to find the total (usually big-O) running time • Lines 1,2,5,6 are all O(c) because they take a constant time. • The interesting lines are 3 and 4. How many times do lines 3 and 4 occur?
Looking at algorithms • We can see that lines 3 and 4 occur n times (and we assume n is an input) • So we say the total running time is the sum of all the different running times 3: for (int k = 0; k < n; k++) { 4: x = x*x; 5: } • T(n) = O(c) + O(n)*O(c) = O(c)*O(n) = O(n) • Remember, we are allowed to throw out the constants, because they are not important
Looking at algorithms • We should also see that this is a summation 3: for (int k = 0; k < n; k++) { 4: x = x*x; 5: } • The for loop does a constant expression n times. In summation form, that is:
Looking at algorithms • We also need to be careful of what happens when there is something special inside a loop. • For many sorting algorithms, there are loops inside of loops. • Example: 3: for (int k = 0; k < n; k++) { 4: for (int k = n; k > 0; k--) { 5: x = x*2; 6: } 7: } • What is the running time of this loop? • T(n) = n*(n*c) = cn2 = O(n2) • What is the summation?
Looking at Algorithms • It is very important not to get confused with operations inside the loop and with the number of times a loop runs. • Example: for (int k = 0; k < a; k++) { int n = n*n; } • Here, the loop depends on a, thus it would be O(a), and has nothing to do with n, which is just a variable in this example.
Can we analyze algorithms? • What is the Big-O running time for the following algorithms int main() { for (int k = b; b > 0; k--) { if (k % 2 == 0) { cout << 0 << endl; } else { cout << 1 << endl; } } } int main() { for (int k = 0; k < a; k++) { int i = a; while (i > 0) { int z = i*30+73+k*k*a; i--; } } }
Recurrences • When an algorithm contains a call to itself, it is called a recursive algorithm. • When that happens, there is a special way to analyze the algorithm, using what is called a recurrence. • There are generally three methods used in order to solve recurrences: Substitution, Iteration and the Master Method. • An important note to remember is that when we have a recursive function, it must end at some point. We must make use of that in our "recurrence," otherwise our analysis will take forever (just like the function)
Recurrences • The expression:is a recurrence. • It says that the function T(n) is equal to 2*T(n/2) + cn if n is • greater than 1. If n = 1, then T(n) = c only • What is happening here is that the input size (n) is being • cut in half each time. Thus, we will eventually get to the • point where n is equal to 1. Then we know to stop
Recursion Algorithm and Recurrence • Example: 1: int myFunction(int a) { 2: for (int k = 0; k < a; k++) { 3: myFunction(a/2); 4: } 5: return 1; 6: } • When we want to analyze this function, we will look at each piece at a time. • In line 2, we see we have a loop running at O(a), but there is something important inside the loop (a recursive call)
Recurrences Example: 1: int myFunction(int a) { 2: for (int k = 0; k < a; k++) { 3: myFunction(a/2); 4: } 5: return 1; 6: } • In order to analyze this we can write it like: • T(a) = a*T(a/2) • The T(a/2) is like a recursive call, but with half the input. We multiply it by a because the loop happens "a" times
Solving Recurrences • There are three ways to solve recurrences, so that we can determine the correct running time of a function • The substitution method – this way looks for pieces that are familiar and replaces them with good guesses • The iterative method - this way uses algebra and summations to expand the recursion until a pattern can be found • The master method - uses known formulas for finding the solution
Substitution Method • The substitution method involves guessing the answer and then using something called "mathematical induction" to prove it. • This is a good method, but can only be used when you think you know the answer. • "Mathematical Induction" is a formal way to prove something in math. • Induction says that if you can prove something for one case (like when n = 1) and you can prove it for something like the n+1 case, then it is true for all cases.
Induction • Example: Fibonacci Sequence • Claim: We claim that F(n) < 2n • Proof: We will show our claim is correct by induction • Base Case: • F(1) = 1 < 2 = 21 • F(2) = 2 < 4 = 22 • Induction Step: (Here's the hard part) • F(n) = F(n-1) + F(n-2). We also know n-1 < n and n -2 < n • So, by using the "inductive hypothesis" we can say that F(n) = F(n-1) + F(n-2) < 2n-1 + 2n-2 < 2n-1 + 2n-1 = 2*2n-1 = 2n • The inductive part allows us to substitute what we want to be true • We assumed that F(n) < 2n and so we could substitute in 2n-1
Induction: Example II • What if we want to prove the following • Proof: By using induction • Base Case: If n = 1, then 1 = 1*(1+1)/2 = 2/2 = 1 • Induction: Assume claim is true for (n-1). • We can then say that • And by the induction hypothesis, we can say • Which equals:
Substitution Example • Now that we know what induction is about, we can do substitution • Example: T(n) = 2T(n/2) + n • Remember, the hard part about substitution is guessing a good answer. Because this is an example, the answer is known to be O(n lg n). So we will guess that to be the answer • So our claim is that T(n) < cn lg n for a good constant "c" • So, to start the substitution we "substitute" in (cn lg n) into the equation and replace n by n/2
Substitution • Example: T(n) = 2T(n/2) + n • T(n) < 2(c (n/2) lg (n/2)) + n substitute here • T(n) < cn (lg n – lg 2) + n 2*c(n/2) & split lg • T(n) < cn lg n – cn lg 2 + n distribute cn • T(n) < cn lg n – cn + n lg 2 is constant • T(n) < cn lg n -cn + n = 0 • So this will be true as long as c > 1, but the Big-O needs us to show that there is a n0 so that for all n > n0, this is true. • So we can take for example, n0 =4 and c = 2 and see what happens. But we have a recursive thing here, so pay attention to what happens (We're also going to assume T(1) = 1 to make things easier.) • T(4) = 2T(4/2) + 4 = 2(2T(2/2) + 2) + 4 = 2(2*1 + 2) + 4 = 12 • So, T(4) = 12 and T(4) should < 2*4 lg 4 = 8 lg 4 = 8*2 = 16. True!! • This will actually be true for any n > 3 and any c > 2 (Like Base Case)
Substitution Example: II • Show T(n) = T(n/2) + 1 is O(lg n) • So, we need to show T(n) < c lg n • Assume that it works for n/2 and substitute • T(n) < c lg (n/2) + 1 • T(n) < c lg n – c lg 2 + 1 • T(n) < c lg n – c*1 + 1 • T(n) < c lg n – c + 1 = O(lg n) • So we know that it holds for n/2. Can we find a c and a n0 so that for any n > n0, it will be true?
Substitution Example: II • T(n) = T(n/2) + 1 is O(lg n) • So if we say c = 2 and n0 = 2 • T(n) = T(2/2) + 1 < 2 lg 2 • T(n) = T(1) + 1 < 2*1 • T(n) = 1 + 1 < 2 … True! • What if we try something bigger, say n = 64 • T(64) = T(64/2) + 1 < 2 lg 64 • T(64) = T(T(32/2) + 1) + 1 = T(T(T(16/2) + 1) + 1 + 1) • We should notice a pattern. The result is that T(64) = 7 • And 2 lg 64 = 2 * 6 = 12. So it is still true!
Iterative Method • The iterative method is another way to solve recurrences. • It does not rely so much on proofs, but instead on recognizing recursive patterns. • Example: 1. c + s(n-1) 2. c + c + s(n-2) iterate through steps 3. 2c + s(n-2) continue iterating 4. 2c + c + s(n-3) 5. 3c + s(n-3) iterate until you see a patter … kc + s(n-k) = ck + s(n-k) here we found a pattern
Iterative Method • So now that we found a pattern, we use the base case to find the upper bound (Big-O) • So our pattern so far is: • s(n) = ck + s(n-k) • What happens if k = n? Then: • s(n) = cn + s(0) = cn • Thus, in general, • s(n) = cn and s(n) = O(n) • Much easier than substitution. The only hard part is to identify the pattern.
Iterative Method Example: II S(n) = n + s(n-1) = n + n-1 + s(n-2) = 2n -1 + s(n-2) = 2n - 1 + n-2 + s(n-3) = 3n -3 + s(n-3) = 3n - 3 + n-3 + s(n-4) = 4n-6 + s(n-4) =4n – 6 + n-4 + s(n-5) = 5n-10 + s(n-5) What is at the "k" time? = s(n) = kn – [(k*(k+1)/2)-k] + s(n-k) We must use the arithematic series to change the constants = s(n) = k(n-1) – (k*(k+1)/2) + s(n-k)
Iterative Method Example = s(n) = k(n-1) – (k*(k+1)/2) + s(n-k) So what if we say that k = n? = s(n) = n*(n-1) – n*(n+1)/2 + s(0) = s(n) = n2-n – n2/2 - n/2 = s(n) = n2/2 – n/2 We can remove the constants of (1/2) and we are left with = s(n) = n2 – n = O(n2) This proves the running time of the recurrence
Iterative Method Example T(n) = 2T(n/2) + c 2(2T(n/2/2) + c) + c 22T(n/22) + 2c + c 22(2T(n/22/2) + c) + 3c 23T(n/23) + 4c + 3c 23T(n/23) + 7c 23(2T(n/23/2) + c) + 7c 24T(n/24) + 15c … 2kT(n/2k) + (2k – 1)c
Iterative Method Example • So far for n > 2k we have T(n) = 2kT(n/2k) + c(2k - 1) • What if k = lg n? T(n) = 2lg n T(n/2lg n) + c(2lg n - 1) No we use our log's = n T(n/n) + c(n – 1) = n T(1) + c(n-1) = nc + c(n-1) = c(2n – 1) so T(n) = O(n)
General Example: Iterative T(n) = • aT(n/b) + cn • a(aT(n/b/b) + cn/b) + cn • a2T(n/b2) + cna/b + cn • a2T(n/b2) + cn(a/b + 1) • a2(aT(n/b2/b) + cn/b2) + cn(a/b + 1) • a3T(n/b3) + cn(a2/b2) + cn(a/b + 1) • a3T(n/b3) + cn(a2/b2 + a/b + 1) • … • akT(n/bk) + cn(ak-1/bk-1 + ak-2/bk-2 + … + a2/b2 + a/b + 1)
General Example: Iterative • So we have • T(n) = akT(n/bk) + cn(ak-1/bk-1 + ... + a2/b2 + a/b + 1) • What if we pretend that k = logb n (to make things easier) • Then that would mean: n = bk Do we know why? • T(n) = akT(1) + cn(ak-1/bk-1 + ... + a2/b2 + a/b + 1) = akc + cn(ak-1/bk-1 + ... + a2/b2 + a/b + 1) = cak + cn(ak-1/bk-1 + ... + a2/b2 + a/b + 1) = cnak /bk + cn(ak-1/bk-1 + ... + a2/b2 + a/b + 1) = cn(ak/bk + ... + a2/b2 + a/b + 1)
General Example: Iterative • So with k = logb n • T(n) = cn(ak/bk + ... + a2/b2 + a/b + 1) • Now we need another what if. • What if a = b? Then: • T(n) = cn(k + 1) k because we had ak..a1 = cn(logb n + 1) put back our old k= logb n = (n log n)
General Example: Iterative • So with k = logb n • T(n) = cn(ak/bk + ... + a2/b2 + a/b + 1) • What if a < b? Recall that (xk + xk-1 + … + x + 1) = (xk+1 -1)/(x-1) T(n) = cn ·(1) = (n)