Mastering Data Structures with Vishnu Kotrajaras, PhD

Data Structures www.cp.eng.chula.ac.th/~vishnu Vishnu Kotrajaras, PhD.

Introduction • Why study data structure? • Can understand code written by others. • Can choose a correct data structure for any task. Vishnu Kotrajaras, PhD.

P 2 1 4 3 5 Example, storing 5 numbers Linked list P 1 2 3 4 5 Tree (Binary Search Tree) Vishnu Kotrajaras, PhD.

5 4 3 2 1 Choosing how to store Heap If we want to always retrieve a maximum value, heap is the best for that. Vishnu Kotrajaras, PhD.

Estimating the program speed • Big O if • where c and N0 are constants and N>=N0 • <see asymtotic.pdf page 2> • This is telling us how the program grows. Vishnu Kotrajaras, PhD.

Find the speed of the following code sigmaOfSquare(int n) // calculate { 1: int tempSum; 2: tempSum = 0; 3: for (int i=1;i<=n;i++) 4: tempSum += i*i; 5: return tempSum; } 1 unit (declare only) 1 unit (assignment) n+1 unit n unit 1 unit Multiply, add, and assignment, each has n times. Therefore we have 3n unit. 1 unit (return) Total time is5n+5 unit. Vishnu Kotrajaras, PhD.

We want to have a simple picture • It’s better to use an approximation time. That is Big O • From the example, the time of the loop is dominant (other running times become insignificant) • The loop is performed n times. • Therefore, Big O = O(n) • The detailed time is5n+5, which matches O(n) -> (5n+5<= 6n). Vishnu Kotrajaras, PhD.

FindingBIG O from various loops • For loop-> Its Big O is the number of repetition. • Nested loop 1: for (i = 1; i <= n; i++) 2: for (j = 1; j <= n; j++) statements; n times n times Big O is O(n2). Vishnu Kotrajaras, PhD.

FindingBIG O from various loops(cont2.) • Consecutive Statements 1: for (i = 0; i <= n; i++) 2: statement1; 3: for (j = 0; j <= n; j++) 4: for (k = 0; k <= n; k++) 5: statement2; O(n) O(n2) The answer is their max. -> O(n2) Vishnu Kotrajaras, PhD.

FindingBIG O from various loops(cont3.) • Big O definition for consecutive statements: • IfT1(N)=O(f(N)) andT2(N)= O(g(N)), then • T1(N)+ T2(N)= max(O(f(N),O(g(N))) • From last page -> f(n) = O(n), g(n) = O(n2) • The answer is thereforeO(n2) Vishnu Kotrajaras, PhD.

FindingBIG O from various loops(cont4.) • Conditional statement 1: if (condition) 2: Statement1 3: Else 4: Statement2 O(f(n)) O(g(n)) Use the max -> max(O(f(n),O(g(n))) Vishnu Kotrajaras, PhD.

FindingBIG O from recursion 1:mymethod (int n) { 2: if (n == 1) { 3: return 1; 4: } else { 5: return 2*mymethod(n – 1) + 1; 6: } 7:} n times, big O = O(n) Vishnu Kotrajaras, PhD.

Maximum Subsequence SumAlgorithm does matter • Maximum Subsequence Sum is: • For integerA1,A2, …,An • Maximum Subsequence Sum is that gives the maximum value. It is a consecutive sequence that gives the highest added value. • Example:-2, 11, -6, 16, -5, 7 • The sum of 11, -6, 16 is 21. But the max sequence is 11, -6, 16, -5, 7 -> the sum is 23. • 23 is the max. sub. Sum. consecutive Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 1st method First index Last index 1: int maxSubSum01 ( int [] a) { 2: int maxSum = 0; 3: for (int i = 0; i < a.length; i++) { 4: for (int j = i; j < a.length; j++) { 5: int theSum = 0; 6: for (int k = i; k <= j; k++) { 7: theSum += a[k]; 8: } 9: if (theSum > maxSum) { 10: maxSum = theSum; 11: } 12: } 13: return maxSum; 14: } 15: } Sum from first to last. Choose to store max value. Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 1st method(cont.) • This first method has big O = O(n3). • Not good enough. Too many redundant calculations. • If we have added elements from index 0 to 2, when we add elements from index 0 to 3, we should not start the addition from scratch. Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 2nd method Starting position 1: int maxSubSum02 (int [] a) { 2: int maxSum = 0; 3: for (int i = 0; i < a.length; i++) { 4: int theSum = 0; 5: for (int j = i; j < a.length; j++) { 6: theSum += a[j]; 7: if (theSum > maxSum) { 8: maxSum = theSum; 9: } 10: } 11: } 12: return maxSum; 13: } Do the addition from the starting position and collect the result. BIG O = O(n2) Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 2nd method(cont.) -2 11 -6 4 when i=0, j=0: theSum = -2maxSum = 0 when i=0, j=1: theSum = -2 + 11 = 9 maxSum becomes 9. when i=0, j=2: theSum = 9 + (-6) = 3 maxSum is still 9. when i=0, j=3: theSum = 3 + 4 maxSum is still 9. Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 3rd method • Use divide and conquer • The result sequence maybe in • The left half or the array, or • The right half, or • Lie between the left half and the right half. (its sequence contains the last element of the left half and the first element of the right half.) Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 3rd method (cont.) Max sub sum on this side is 7. Max sub sum on this side is 10. Max sub sum on the left with (-6) is 1. Max sub sum on the right with (2) is 10. Max sub sum that covers between the left side and the right side is therefore 1 +10 = 11 (this is the final answer). Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 3rd method (cont 2.) 1:int maxSumDivideConquer (int [] array, int leftindex, int rightindex { 2: //assume that the array can be divided evenly. 3: if (leftindex == rightindex) { // Base Case 5: if (array[leftindex] > 0 ) 6: return array[leftindex]; 7: else 8: return 0; // min value of maxSubSum 9: } 10: int centerindex = (leftindex + rightindex)/2; 12: int maxsumleft = maxSumDivideConquer(array, leftindex, centerindex); 13: int maxsumright = maxSumDivideConquer ( array, centerindex + 1, right); T(n) T(n/2) T(n/2) Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 3rd method (cont 3.) 14: int maxlefthalfSum = 0, lefthalfSum = 0; 15: //max sum – from the last element of the left //side to the first element. 16: for (int i = center; i >= leftindex; i--) { 17: lefthalfSum = lefthalfSum + array[i]; 18: if (lefthalfSum > maxlefthalfSum) { 19: maxlefthalfSum = lefthalfSum; 20: } 21: } O(n/2) Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 3rd method (cont 4.) 22: int maxrighthalfSum = 0, righthalfSum = 0; 23: // max sum – from the first element of the right //side to the last element. 24: for (int i = centerindex + 1; i <= rightindex; i++) { 25: righthalfSum = righthalfSum + array [i]; 26: if (righthalfSum > maxrighthalfSum) { 27: maxrighthalfSum = righthalfSum; 28: } 29: } O(n/2) Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 3rd method (cont 5.) 30: //finally, find max of the three. 31: return max3 (maxsumleft, maxsumright, maxlefthalfSum + maxrighthalfSum) } Therefore the total time is T(n) = 2T(n/2) + 2O(n/2) Using Master method we solve this equation • Big O = O(nlogn)<skip to page 33> This part takes constant time. We can ignore. Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 3rd method (cont 6.) • We find the totalBIG O: T(n) = 2T(n/2) + 2O(n/2) = 2T(n/2) + O(n) = 2T(n/2) + cn Divide everything by n, we get: O(n) <= c*n according to the definition (1) Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 3rd method (cont 7.) • We can create a series of equations: (2) (3) (X) Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 3rd method (cont 8.) • Do (1) + (2) + (3) +…..+ (x), we get: • The left and right hand side cancel each other out. And c is added for log2 n times. • Multiply both sides by n, we get: • Because T(1) is constant, we can conclude that • Big O = O(nlogn) Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 4th method • We improve on the2nd method, with two points to note: • First, the first element of any maximum subsequence sum cannot be a negative value. • For example: 3, -5, 1, 4, 7, -4 -5 cannot be the first element of our result. It can only make the total smaller. Any single positive number gives a better result anyway. Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 4th method (cont.) • Second, any subsequence that is negative cannot begin max sub sum. • Let us be in a loop execution. Let i be the index of the first element of a subsequence an j be the index of the last element of that subsequence. • Let the last element make this subsequence negative. • Let p be any index between i+1 and j. i p j Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 4th method (cont 2.) • The next step of this loop -> increment j by one. • If a[j] is negative, we will not get a better max sub sum. Max sub sum value will not change. • If a[j] is positive, a[i]+…+a[j] will be greater than a[i]+…+a[j-1]. However, because a[i]+…+a[j-1] is negative, the new sum is never more than a stored max sub sum. The new sum cannot even match a[j] alone. • Therefore if we have a negative subsequence, we should not move j. We should move i instead. Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 4th method (cont 3.) • Should we only increment i by one or more? • From our assumption, we know that a[j] makes a[i]+…+a[j] negative. Therefore, incrementing i by one within the range between i and p will only make a[i]+…+ a[p] smaller. (p is any index between i and j). • If we want to get a larger max sub sum, we must start our subsequence from position j+1. Therefore i should be incremented to j+1. i p j Vishnu Kotrajaras, PhD.

Solvingmax sub sum: 4th method (cont 4.) 1: int maxsubsumOptimum (int[] array) { 2: int maxSum = 0, theSum = 0; 3: for (int j = 0; j < a.length; j++) { 4: theSum = theSum + array [j]; 5: if ( theSum > maxSum) { 6: maxSum = theSum; 7: } else if (theSum < 0) { // if a[j] makes the 8: //sequence negative, 9: theSum = 0; // start again from 10: // position j+1. 11: } 12: } 13: return maxSum; 14: } Vishnu Kotrajaras, PhD.

Logarithm in big O • If we can spend a constant time (O(1)) to divide a problem into equal subproblems (3rd method of the maximum subsequence sum problem), that problem will have big O = O(log n). • Usually ,we make an assumption that all data is in the system. Otherwise, reading data in will take O(n). Vishnu Kotrajaras, PhD.

Example: O(log n) • finding5 in a sorted array. • If we start from the first array member, it takes O(n) to find a number. • But we know that the array is sorted: • So we can look at the middle of the array, and search from there, going to either left or right depending on the value of that middle element. • And keep searching by looking at the middle element of the subarray we are looking at, and so on. • This is called -> Binary Search. Vishnu Kotrajaras, PhD.

int binarySearch (int[] a, int x) { int left = 0, right = a.length – 1; while (left <=right) { int mid = (left + right)/2; if (a[mid] < x ) { left = mid + 1; } else if (a[mid] > x) { right = mid – 1; } else { return mid; } } return -1; // reaching this point means -> not found. <goto page 39> } Big O = O(log2 n) Vishnu Kotrajaras, PhD.

Example: O(log n) (cont.) • Greatest common divisor long gcd (long m , long n) { while (n!=0) { long rem = m%n; m = n; n = rem; } return m; } How do we find big O? The reduction of the remainder tells us the Big O. In this program, The remainder decreases without any specific pattern. Vishnu Kotrajaras, PhD.

Big O of gcd • We use the following definition: • ifM > N, M mod N < M/2 • Prove: • if N <= M/2: Because the remainder from M mod N must be less than N, so it must also be less than M/2. • if N > M/2: M divided by N will = 1 + (M-N). The remainder is M-N or M – (> M/2). Therefore the remainder is less than M/2. • If we look at the code for gcd: • The remainder from the xth loop will be used as m of the (x+2)th loop. • Therefore the remainder from the (x+2)th loop must be less than half the remainder from the xth loop. • Meaning -> with 2 iterations passed, the remainder must surely reduce by half or more. Vishnu Kotrajaras, PhD.

gcd (2564, 1988)) Vishnu Kotrajaras, PhD.

Example: O(log n) (cont 2.) • Calculate xn by divide and conquer. long power (long x, int n) { if (n==0) return 1; if (isEven (n)) return power (x*x, n/2); else return power (x*x, n/2)*x; } Big O =O (log2 n) The original problem is divided by half in each method call. Vishnu Kotrajaras, PhD.

O(log n) definition • logk n = O(n) whenk is constant. • This definition tells us that a logarithmic function has a small growth rate. • f(n) = loga n has its big O = O(logb n), where a and b is a positive number more than 1. • Any two logarithmic functions have the same growth rate. Vishnu Kotrajaras, PhD.

Any two logarithmic functions have the same growth rate: a proof • letand Vishnu Kotrajaras, PhD.

Runtime –small(top) to large (bottom) • c • log n • logk n • n • n log n • n2 • n3 • 2n Vishnu Kotrajaras, PhD.

Definitions other than big O • Big Omega ( ) T(N) = (g(N)) if there exist constantC and N0 that • T(N) >= C g(N), whereN>=N0 • From def. iff(N) = (N2), then f(N) = (N) = (N1/2) • We should choose the most realistic answer. Vishnu Kotrajaras, PhD.

Definitions other than big O (CONT.) • Big Theta ( ) • T(N) = (h(N)) ifT(N) = O(h(N)) andT(N) = (h(N)) • There existc1, c2, N0 that makec1*h(N) <= T(N) <= c2*h(N), where N >= N0 Vishnu Kotrajaras, PhD.

Definitions other than big O (CONT 2.) • small O • T(N) = o(p(N)) ifT(N) = O(p(N)) but T(N) (p(N)) Vishnu Kotrajaras, PhD.

Notes from the definitions • T(N) = O(f(N)) has the same meaning asf(N) = (T(N)) • We can sayf(N) is an“upper bound” of T(N), and T(N) is a lower bound of f(N). • f(N) = N2andg(N) = 2N2have the sameBig O และ Big . That is f(N) = (g(N)) • f(N) = N2can have severalBig O -> (O(N3), O(N4)) but the best value is O(N2). • We can usef(N) = (N2) to tell that this value is the best big O. Vishnu Kotrajaras, PhD.

Thus, we have the latest definition: • If T(N) is a Polynomial degree k, then T(N) = (Nk) • From here, • if T(N) = 5N4 + 4N3 + N, we know that T(N) = (N4) Vishnu Kotrajaras, PhD.

Best case, Worst case, Average case • worst case = a maximum running time possible. • best case = a minimum running time possible. • average case? • For eachinput, see how long the program runs. • average case running time = total time from every input divided by the number of input. Vishnu Kotrajaras, PhD.

Average case • The average case definition is based on an assumption that: • Each input has equal chance of occurrence. • If we do not want the assumption, • We must take a probability of each input into account. • Average case = (prob. ofinputi * unit time when use inputi ) Vishnu Kotrajaras, PhD.

Example: FindingAverage case • Let’s say we want to find x in an array of size n. • Best case: findx in the first array slot. • Worst case: x is in the array’s last slot, or x is not in the array at all. • Average case: • Assume each array slot has an equal chance of having x inside. • Therefore, a chance of x being in a slot is 1/n. Vishnu Kotrajaras, PhD.

Example: FindingAverage case (cont.) • Average Case running time = 1/n * (steps used when finding x in the first slot) + 1/n * (steps used when finding x in the second slot) + ... + 1/n * (steps used when finding x in the last slot, or not finding x at all) • = (1 + 2 +… + n) / n = (n+1)/2 • = O(n) = big O ofworst case Vishnu Kotrajaras, PhD.

Mastering Data Structures with Vishnu Kotrajaras, PhD

Mastering Data Structures with Vishnu Kotrajaras, PhD

Presentation Transcript

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

DATA STRUCTURES

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures