Créer une présentation
Télécharger la présentation

Télécharger la présentation
## Computer Science 212

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Computer Science 212**• Title: Data Structures and Algorithms • Instructor: Harry Plantinga • Text: Algorithm Design: Foundations, Analysis, and Internet Examples by Goodrich and Tamassia**Computer Science 212Data Structures and Algorithms**• The Heart of Computer Science • Data structures • Study of important algorithms • Algorithm analysis • Algorithm design techniquesPlus • Intelligent systems • Course Information: http://cs.calvin.edu/curriculum/cs/212/Grades on moodle**Why study DS & Algorithms?**• Some problems are difficult to solve and good solutions are known • Some “solutions” don’t always work • Some simple algorithms don’t scale well • Data structures and algorithms make good tools for addressing new problems • Fun! Joy! Esthetic beauty!**Place in the curriculum**• The study of interesting areas of computer science: upper-level electives • 108, 112: basic tools (like algebra to mathematics) • 212: More advanced tools. Introduction to the science of computing. (A little more mathematical than other core CS courses, but not as rigorous as a real math course.)**Example: Search and Replace**• Goal: replace one string with another • Which version is better? #!/usr/bin/perl my $a = shift; my $b = shift; my $input = ""; while (<STDIN>) { $input .= $_; } while ($input =~ m/$a/) { $input =~ s/$a/$b/; } #!/usr/bin/perl my $a = shift; my $b = shift; my $input = ""; while (<STDIN>) { $input .= $_; } $input =~ s/$a/$b/g;**Empirical Analysis**• Implement • Gather runtime data for various inputs • Analyze runtime vs. size • Example: suppose we test swap1 • Input sizes: 1, 2, 4, 8, 16, 32, 64, 128, 256 • Runtimes: 70, 110, 250, 560, 810, 1750, 6510, 25680, 102050**Note that slope gives growth rate of runtimeNote that first**few points are inflated by startup costs**Empirical runtime analysis**• What can we say about the runtimes of these two algorithms? • Could give runtime as a function of input size, for a particular implementation • But would that apply on different hardware? • What about a different implementation in C++ instead of Perl? • Is there anything we can say that is true in general? • Is version 2 better than version 1 in general? Why?**Empirical analysis: conclusions**• Empirical analyses are implementation-dependent • You have to use representative sample data • You can learn something about the growth rate from the slope or curve of the runtime graph • Bad algorithms can’t be fixed with faster computers • Big companies (e.g. Microsoft) are not immune to bad algorithms**What’s the runtime?**fori 0 ton 1 do forj 0 ton 1 do ifA[i] A[j] then A[j] A[i]**Theoretical Analysis**• Uses a high-level description of the algorithm instead of an implementation • Characterizes running time as a function of the input size, n. • Takes into account all possible inputs, often analyzing the worst case • Allows us to evaluate the speed of an algorithm independent of the hardware/software environment**Example: find max element of an array**AlgorithmarrayMax(A, n) Inputarray A of n integers Outputmaximum element of A currentMaxA[0] fori1ton 1do ifA[i] currentMaxthen currentMaxA[i] returncurrentMax Pseudocode (§1.1) • High-level description of an algorithm • More structured than English prose • Less detailed than a program • Preferred notation for describing algorithms • Hides program design issues**Control flow**if…then…[else …] while … do … repeat…until… for…do… Indentation replaces braces Method declaration Algorithm method (arg [, arg…]) Input… Output… Method call var.method (arg [, arg…]) Return value returnexpression Expressions Assignment(like in Java) Equality testing(like in Java) n2 Superscripts and other mathematical formatting allowed Pseudocode Details**Questions…**• Can a program be asymptotically faster on one type of CPU vs another? • Do all CPU instructions take equally long?**2**1 0 The Random Access Machine (RAM) Model • A CPU • An potentially unbounded bank of memory cells, each of which can hold an arbitrary number or character • Memory cells are numbered and accessing any cell in memory takes unit time.**Basic computations performed by an algorithm**Identifiable in pseudocode Largely independent from any programming language Exact definition not important (constant number of machine cycles per statement) Assumed to take a constant amount of time in the RAM model Examples: Evaluating an expression Assigning a value to a variable Indexing into an array Calling a method Returning from a method Primitive Operations**By inspecting the pseudocode, we can determine the maximum**number of primitive operations executed by an algorithm, as a function of the input size AlgorithmarrayMax(A, n) # operations currentMaxA[0] 2 fori1ton 1do 2+n ifA[i] currentMaxthen 2(n 1) currentMaxA[i] 2(n 1) { increment counter i } 2(n 1) returncurrentMax 1 Total 7n 1 Counting Primitive Operations (§1.1)**Algorithm arrayMax executes 7n 1 primitive operations in**the worst case. Define: a = Time taken by the fastest primitive operation b = Time taken by the slowest primitive operation Let T(n) be worst-case time of arrayMax.Thena (7n 1) T(n)b (7n 1) Hence, the running time T(n) is bounded by two linear functions Estimating Running Time**Theoretical analysis: conclusion**• We can count the number of RAM-equivalent statements executed as a function of input size • All remaining implementation dependencies amount to multiplicative constants, which we will ignore • (But watch out for statements that take more than constant time, such as $input =~ s/$a/$b/g) • Linear, quadratic, etc. runtime is an intrinsic property of an algorithm**What’s the runtime?**int n; cin >> n; for (int i=0; i<n; i++) for (int j=0; j<n; j++) for (int k=0; k<n; k++) { cout << "Hello, "; cout << "greetings, "; cout << "bonjour, "; cout << "guten tag, "; cout << "Здравствуйте, "; cout << "你好 "; cout << " world!"; }**What’s the runtime?**int n; cin >> n; if (n<1000) for (int i=0; i<n; i++) for (int j=0; j<n; j++) for (int k=0; k<n; k++) cout << "Hello\n"; else for (int j=0; j<n; j++) for (int k=0; k<n; k++) cout << "world!\n";**Function Growth Rates**• Growth rates of functions: • Linear n • Quadratic n2 • Cubic n3 • In a log-log chart, the slope of the line corresponds to the growth rate of the function**Constant factors**• The growth rate is not affected by • constant factors or • lower-order terms • Examples • 102n+105is a linear function • 105n2+ 108nis a quadratic function**Asymptotic (big-O) Notation (§1.2)**• Given functions f(n) and g(n), we say that f(n) is O(g(n))if there are positive constantsc and n0 such that f(n)cg(n) for n n0 • Example: 2n+10 is O(n) • 2n+10cn • (c 2) n 10 • n 10/(c 2) • Pick c = 3 and n0 = 10**Example**• Example: the function n2is not O(n) • n2cn • n c • The above inequality cannot be satisfied since c must be a constant**More Big-O Examples**• 7n-2 7n-2 is O(n) need c > 0 and n0 1 such that 7n-2 c•n for n n0 this is true for c = 7 and n0 = 1 • 3n3 + 20n2 + 5 3n3 + 20n2 + 5 is O(n3) need c > 0 and n0 1 such that 3n3 + 20n2 + 5 c•n3 for n n0 this is true for c = 4 and n0 = 21 • 3 log n + log log n 3 log n + log log n is O(log n) need c > 0 and n0 1 such that 3 log n + log log n c•log n for n n0 this is true for c = 4 and n0 = 2**Asymptotic analysis of functions**• Asymptotic analysis is equivalent to • ignoring multiplicative constants • ignoring lower-order terms • “for large enough inputs” • Big-O and growth rate • Big-O gives an upper bound on the growth rate of a function • Think of it as <= [asymptotically speaking]**Big-O Rules**• If is f(n) a polynomial of degree d, then f(n) is O(nd), i.e., • Drop lower-order terms • Drop constant factors • Use the smallest possible class of functions, if possible • Say “2n is O(n)”instead of “2n is O(n2)” • (The former is a stronger statement) • Use the simplest expression of the class • Say “3n+5 is O(n)”instead of “3n+5 is O(3n)”**Asymptotic Algorithm Analysis**• Asymptotic analysis: determine the runtime in big-O notation • To perform the asymptotic analysis • Find the worst-case number of primitive operations executed as a function of the input size • Express this function with big-O notation • Example: • We determine that algorithm arrayMax executes at most 7n 1 primitive operations • We say that algorithm arrayMax“runs in O(n) time” • Since constant factors and lower-order terms are eventually dropped anyway, we can disregard them when counting primitive operations**Computing Prefix Averages**• Runtime analysis example: Two algorithms for prefix averages • The i-th prefix average of an array X is average of the first (i+ 1) elements of X: A[i]= (X[0] +X[1] +… +X[i])/(i+1) • Computing the array A of prefix averages of another array X has applications to financial analysis**Prefix Averages (Quadratic)**• The following algorithm computes prefix averages in quadratic time by applying the definition AlgorithmprefixAverages1(X, n) Inputarray X of n integers Outputarray A of prefix averages of X #operations A new array of n integers n fori0ton 1do n sX[0] n forj1toido 1 + 2 + …+ (n 1) ss+X[j] 1 + 2 + …+ (n 1) A[i]s/(i+ 1)n returnA 1**Prefix Averages (Linear)**• The following algorithm computes prefix averages in linear time by keeping a running sum AlgorithmprefixAverages2(X, n) Inputarray X of n integers Outputarray A of prefix averages of X #operations A new array of n integers n s 0 1 fori0ton 1do n ss+X[i] n A[i]s/(i+ 1)n returnA 1 • Algorithm prefixAverages2 runs in O(n) time**The running time of prefixAverages1 isO(1 + 2 + …+ n)**The sum of the first n integers is n(n+ 1) / 2 There is a simple visual proof of this fact Thus, algorithm prefixAverages1 runs in O(n2) time Arithmetic Progression**What’s the runtime?**int n; cin >> n; for (int i=0; i<n; i++) for (int j=0; j<n; j++) for (int k=0; k<n; k++) cout << “Hello world!\n”; 2n3+n2+n+2? O(n3) runtime What if the last line is replaced by: string *s=new string(“Hello world!\n”); O(n3) time and space**What’s the runtime?**int n; cin >> n; for (int i=0; i<n; i++) for (int j=0; j<n; j++) for (int k=0; k<n; k++) cout << “Hello world!\n”; for (int i=0; i<n; i++) for (int j=0; j<n; j++) for (int k=0; k<n; k++) cout << “Hello world!\n”; O(n3) + O(n3) = O(n3) Statements or blocks in sequence: add**What’s the runtime?**int n; cin >> n; for (int i=0; i<n; i++) for (int j=n; j>1; j/=2) cout << “Hello world!\n”; Loops: add up cost of each iteration (multiply loop cost by number of iterations if they all take the same time) log n iterations of n steps O(n log n)**What’s the runtime?**int n; cin >> n; for (int i=1; i<=n; i++) for (int j=1; j<=i; j++) cout << “Hello world!\n”; Loops: add up cost of each iteration 1 + 2 + 3 + … + n = n(n+1)/2 = Q(n2)**What’s the runtime?**template <class Item> void insert(Item a[], int l, int r) { int i; for (i=r; i>l; i--) compexch(a[i-1],a[i]); for (i=l+2; i<=r; i++) { int j=i; Item v=a[i]; while (v<a[j-1]) { a[j] = a[j-1]; j--; } a[j] = v; } }**Math you need to Review**• properties of logarithms: logb(xy) = logbx + logby logb (x/y) = logbx - logby Logb xa = a logb x logba = logxa/logxb • properties of exponentials: a(b+c) = aba c abc = (ab)c ab /ac = a(b-c) b = a logab bc = a c*logab • Summations (Sec. 1.3.1) • Logarithms and Exponents (Sec. 1.3.2) • Proof techniques (Sec. 1.3.3) • Basic probability (Sec. 1.3.4)**Relatives of Big-Oh**• big-Omega • f(n) is (g(n)) if there is a constant c > 0 and an integer constant n0 1 such that f(n) c•g(n) for n n0 • big-Theta • f(n) is (g(n)) if there are constants c’ > 0 and c’’ > 0 and an integer constant n0 1 such that c’•g(n) f(n) c’’•g(n) for n n0 • little-o • f(n) is o(g(n)) if, for any constant c > 0, there is an integer constant n0 0 such that f(n) c•g(n) for n n0 • little-omega • f(n) is (g(n)) if, for any constant c > 0, there is an integer constant n0 0 such that f(n) c•g(n) for n n0**Intuition for Asymptotic Notation**Big-Oh • f(n) is O(g(n)) if f(n) is asymptotically less than or equal to g(n) big-Omega • f(n) is (g(n)) if f(n) is asymptotically greater than or equal to g(n) big-Theta • f(n) is (g(n)) if f(n) is asymptotically equal to g(n) little-oh • f(n) is o(g(n)) if f(n) is asymptotically strictly less than g(n) little-omega • f(n) is (g(n)) if is asymptotically strictly greater than g(n)**Example Uses of the Relatives of Big-Oh**• 5n2 is (n2) f(n) is (g(n)) if there is a constant c > 0 and an integer constant n0 1 such that f(n) c•g(n) for n n0 let c = 5 and n0 = 1 • 5n2 is (n) f(n) is (g(n)) if there is a constant c > 0 and an integer constant n0 1 such that f(n) c•g(n) for n n0 let c = 1 and n0 = 1 • 5n2 is (n) f(n) is (g(n)) if, for any constant c > 0, there is an integer constant n0 0 such that f(n) c•g(n) for n n0 need 5n02 c•n0 given c, the n0 that satisfies this is n0 c/5 0**Asymptotic Analysis: Review**• What does it mean to say that an algorithm has runtime O(n log n)? • n: Problem size • Big-O: upper bound over all inputs of size n • “Ignore constant factor” (why?) • “as n grows large” O: like <= for functions (asymptotically speaking) W: like >= Q: like =**Asymptotic notation: examples**• Asymptotic runtime, in terms of O, W, Q? • Suppose the runtime for a function is • n2 + 2n log n + 40 • 0.0000001 n2+ 1000000n1.999 • n3 + n2 log n • n2.0001 + n2 log n • 2n+ 100 n2 • 1.00001n+ 100 n97**Asymptotic comparisons**• 0.0000001 n2 = O(1000000n1.999 )? • n1.000001 = O(n log n)? • 1.0001n = O(n943)? • lg n = Q(ln n)? (Evaluate the limit of the quotient of the functions) No – a polynomial with a higher power dominates one with a lower power No – all polynomials (n.000001) dominate any polylog (log n) No – all exponentials dominate any polynomial Yes – different bases are just a constant factor difference**Estimate the runtime**• Suppose an algorithm has runtime Q(n3) • suppose solving a problem of size 1000 takes 10 seconds. How long to solve a problem of size 10000? • Suppose an algorithm has runtime Q(n log n) • suppose solving a problem of size 1000 takes 10 seconds. How long to solve a problem of size 10000? runtime 10-8 n3; if n=10000, runtime 10000s = 2.7hr runtime 10-3 n lg n; if n=10000, runtime 133 secs