1 Data Structures and Algorithm Analysis

1 Data Structures and Algorithm Analysis • Programs = Algorithms + Data Structures • Algorithms: • Describe how to solve a problems step by step • Good characteristics: • robustness • efficiency • correctness • Data Structures: • provide a logical basis for organizing data • Examples: linked lists, queues, stacks, trees, graphs

2 The maximum contiguous subsequence sum problem Given integers A1 A2 .. An. Find the sequence that produces the max value jk=iAk The sum is a zero if all integers < 0 Example: -2 11 -4 13 -5 2 The sum of the sequence from the 2nd to the 4th number is 20, which is the largest possible value 1 -3 4 -2 -1 6 The sum of the sequence from the 3rd number to the 6th number (4 -2 -1 6) is the largest possible value: 7

3 Algorithm 1: conduct an exhaustive search(a brute force algorithm) It calculates all subsequences, such as the sequence of 1 number, the sequence of 2 numbers, and so on, starting at the first position, then at the 2nd position, until the last position Trace: (1) i=0 j=0 sum = -2 j=1 sum = -2+11 j=2 sum = -2+11-4 j=3 sum = -2+11-4+13 j=4 sum = -2+11-4+13-5 max = 18 i=0 j=3 max = 0 for(i=0; i<length; i++) for(j=i; j<length; j++) sum = 0 for(k=i; k<=j; k++) sum += a[k] if(sum > max) max = sum start = i end = j Time complexity: O(n3) Example: The following numbers are in the array a: -2 11 -4 13 -5

4 Cont. trace (2) i=1 j=1 sum = 11 j=2 sum = 11-4 j=3 sum = 11-4+13 j=4 sum=11-4+13-5 max = 20 i=1 j=3 (3) i=2 j=2 sum = -4 j=3 sum = -4+13 j=4 sum = -4+13-5 max, i and j remain unchanged (4) i =3 j=3 sum = 13 j=4 sum = 13-5 max, i and j remain unchanged (5) i=4 j=4 sum = -5 max, i an j remain unchanged

Observation: jk=iAk = Aj + j-1k=iAk Sequence: -2 11 -4 13 -5 Example: once we calculate -2+11-4+13=18, we need to perform only one addition to calculate the entire sequence that is: 18-5 = 13 The algorithm max=0 for(i=0; i<length; i++) sum =0 for(j=i; j<length; j++) sum += a[j] if sum > max max = sum;start=i;end=j; Time complexity: O(n2) The statement sum +=a[j] adds one additional number to the sum reducing redundant work 5 Algorithm 2

6 Algorithm 3 Observation: If a negative number at the beginning or at the end of a subsequence, it would never contribute to the max Let Ai,j be the sequence from ij and Si,j be its sum. Then if Si,j <0 and q>j, then Ai,q is not the max Example: 2 -1 -3 6 Ai,j=A0,2 S0,2 = -2 then S0,3 is not max Improvement: When a negative number is detected, we advance the index i. Using the above example, the value of i would be changed to 2

sequence: -2 11 -4 13 -5 (1) j=0 a[j] = -2 i=0 sum -2 max =0 i=1 (reset) (2) j=1 a[j] = 11 i=1 sum=11 max=11 start=1 end =1 (3) j=2 a[j]= -4 sum = 7 max =11 (4) j=3 a[j] =13 sum =20 max = 20 start =1 end = 3 (5) j=4 a[j]=-5 sum = 15 max =20 start =1 end =3 7 Cont. algorithm 3 i=0; max =0; sum = 0; for(j=0; j<length; j++) sum +=a[j] if sum > max max = sum start=i; end = j; else if(sum <0) i=j+1 sum=0 Time complexity: O(n) Basic idea: as long as the sum >0, add another number to it. If not, start over again

8 Summary: Algorithm 1 is O(n3) Algorithm 2 is O(n2) Algorithm 3 is O(n) and all of them are correct algorithms One important issue in the design of algorithm is efficiency

9 System life cycle • Specification • input, output, expected performance, features and • sample of execution • System Analysis • chose top-down design or OOD • Design • * create ADTs • * algorithm design • Refinement and Coding • choose representation and implementation • Verification • correctness analysis and testing • Documentation • manual and program comments

10 Algorithm Definition Definition: A specification of (1) the sequence of steps required to perform a task, and (2) the data objects used in performing each step Properties: (1) finite number of steps (2) must terminate (3) each step should be precise in meaning (4) has generality Representations: (1) flow chart (2) pseudocode: between English and a programming language

11 Algorithm Style Problem: compute a company’s weekly payroll read employee_id while employee_id do read rate read hours if hours < 40 then wage = hours * rate else wage = (40*rate)+(hours-40)*1.5*rate total += wage read employee_id

12 Algorithm Complexity Analysis • Objectives: • - estimate the running time of an algorithm • - improve the efficiency • Definition of the big-O notation • T(n) = O(f(n)) if there are constants C and n0 such that • T(n) <= C*f(n) when n => n0 • Discussion: • T(n) - execution time • n - input data size • f(n) - estimated time limit • The big-O notation is used to measure the growth of functions

13 Big-O Cont. Example: T(n) = n2 + 4n f(n) = n2 prove: T(n) = O(f(n)) find the constants c=2 and n0=4 When n > 4, n2+4n < 2n2 Conclusion: n2+4n = O(n2)

14 Remarks on the big-O • When considering the running time vs the size of input, there are average time, best time and worst time • In algorithm analysis, we are interested in the worst time analysis The big-O arithmetic Rule 1: If T(n) = O(f(n)) and f(n) = O(g(n)) Then T(n) = O(g(n)) Proof: Given T(n) <= C1 * f(n) for n > n0’ f(n) <= C2 * g(n) for all n > n0” Substitute f(n) with C2 *g(n) we have: T(n) <= C1*C2* g(n) for all n > max(n0’, n0”) = C3*g(n) where C3=C1*C2 Conclusion: T(n) = O(g(n))

15 Cont. big-O Arithmetic Rule 2: If T1(n) = O(f(n)) and T2(n) = O(g(n)) Then (a) T1(n) + T2(n) = max(O(f(n)),O(g(n))) Example: T1(n) = 2n2+4n f(n) = O(n2) T2(n) = 4n3+2n g(n)=O(n3) T1(n)+T2(n)= 2n2+4n + 4n3+2n =O(n3) (b) T1(n) * T2(n) = O(f(n) * g(n)) Example: T1(n) = 3n2 f(n) = O(n2) T2(n) = 4n5 g(n) =O(n5) T1(n) * T2(n) = 3n2 *4n5 = O(n7)

16 Cont. of big-O rules • Rule 3: If T(n) is a polynomial of degree x, then T(n) = O(nx) • where n is the variable and x is the highest power • Example: n5+4n4+8n2+9 = O(n5) • 7n3+2n+19 = O(n3) • Remarks: • 2n2 = O(n3) and 2n2 = O(n2) are correct, but O(n2) is more accurate • The following notations are wrong: • - O(2n2) • - O(5) • -O(n2+10n) • The big-O notation is an upper bound

17Big-O comparison Determine the relative growth rates by computing limit n-> f(n)/g(n) (1) if the limit is 0 then f(n) = O(g(n)) (2) if the limit is C  0 then f(n) = O(g(n)) and g(n) = O(f(n)) (3) if the limit is  then g(n) = O(f(n))

O(1) < O(logn) < O(n) < O(nlogn) < O(n2) <O(2n) < O(n!) < O(22n) Example: TSP has O(n!) time complexity 5! =120, 10! = 3,628,800 15! = 1.3 * 101220! = 2.43 * 1018 25! = 1.55*102530! = 2.65 * 1032 18Execution time comparison Size 10 30 50 O(logn)0.00004 sec 0.00009 sec 0.00012 sec O(n)0.0001 sec 0.0003 0.0005 O(n2)0.01 0.09 0.25 O(n5)0.1 24.3 5.2 minutes O(2n)0.001 sec 17.9 minutes 35.7 years O(3n)0.059 6.5 years 2*108 centuries An algorithm is useless (a problem is unsolvable) if - it takes too long to compute - it requires too much memory

19 General Rules for Computing Time Complexity (1) O(1) if the execution time is not determined by input data (2) Apply Addition if the code segments are placed consecutively (3) Apply Multiplication if the code segments are nested (4) O(logn) if the number of executions is reduced in half repeatedly (5) O(2n) if the number of executions is doubled repeatedly (6) Always consider the worst case

20 Big-O examples (1) cin >> n; for (k=1; k<=n; k++) for(j=10; j<=k; j++) for(m=1; m<=n+10; m++) a = 10; Time complexity is O(n3) (2) cin >> n for(j=0; j <= myfunc(n); j++) cout << j for(k=10; k<= n+9; k++) cout << j << k; Time complexity is: (a) O(n2) if myfunc(n) returns n (b) O(nlogn) if myfunc(n) returns logn

21 Big-O examples(2) (3) cin >> n; total = 0 for(j=1; j <n; j++) cin >> x total = x + total if((total %2) == 1) cout << total else for(k=1; k<=j; k++) cout << k; Time complexity: O(n2)

22 Big-O examples(3) (4) float f(float n) { sum =0; for(j=1; j<n; j++) sum = sum +j; return sum;} Time complexity: O(n) (5) float g(float n) { sum = 0 for(j=0; j<n; j++) sum = sum +f(j) } Time complexity: O(n2)

23 Big-O examples(4) (6) float h(int n) {for (j=0; j<n; j++) cout << g(n) + f(n); } Time complexity: O(n3) (7) cin >> n; power =1; for(j=1; j< n; j++) power = power * 2 for(k=1; k< power; k++) cout << k Time complexity: O(2n) int j,n,k,power; // Now what?=minutes b/c overflow

24 Big-O examples(5) (8) cin >> n; x =1 while (x < n) x = 2*x Time Complexity: O(logn) Exercises: (1) Prove n3+2n = O(n3) The method: look for suitable values for C and n0 try 1: C = 1 n n3+2n n3 1 3 1 5 175 125 10 1200 1000 ==> fail

25 Exercise cont. try 2: C =2 n n3+2n 2n3 1 3 2 2 12 16 5 135 250 therefore, C=2 and n0=2 (2) Prove that 3n is not O(2n) Use the method of proof by contradiction The general approach (a) assume the theorem to be proved is false (b) show that this assumption implies that some known property is false (c ) conclude the original assumption is wrong

26 The proof: Assume 3n is O(2n) Thus, there must exist two constants C and n0 such that 3n <= C 2n for all n => n0 (3/2)n <= C but (3/2)n can get very large as n increases therefore, no such constant can be found conclude: 3n is not O(2n)

1 Data Structures and Algorithm Analysis