1 / 89

CSE 326: Data Structures Introduction & Part One: Complexity

CSE 326: Data Structures Introduction & Part One: Complexity. Henry Kautz Autumn Quarter 2002. Overview of the Quarter. Part One: Complexity inductive proofs of program correctness empirical and asymptotic complexity order of magnitude notation; logs & series analyzing recursive programs

korbin
Télécharger la présentation

CSE 326: Data Structures Introduction & Part One: Complexity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 326: Data StructuresIntroduction &Part One: Complexity Henry Kautz Autumn Quarter 2002

  2. Overview of the Quarter • Part One: Complexity • inductive proofs of program correctness • empirical and asymptotic complexity • order of magnitude notation; logs & series • analyzing recursive programs • Part Two: List-like data structures • Part Three: Sorting • Part Four: Search Trees • Part Five: Hash Tables • Part Six: Heaps and Union/Find • Part Seven: Graph Algorithms • Part Eight: Advanced Topics

  3. Material for Part One • Weiss Chapters 1 and 2 • Additional material • Graphical analysis • Amortized analysis • Stretchy arrays • Any questions on course organization?

  4. Program Analysis • Correctness • Testing • Proofs of correctness • Efficiency • How to define? • Asymptotic complexity - how running times scales as function of size of input

  5. Proving Programs Correct • Often takes the form of an inductive proof • Example: summing an array int sum(int v[], int n) { if (n==0) return 0; else return v[n-1]+sum(v,n-1); } What are the parts of an inductive proof?

  6. Inductive Proof of Correctness int sum(int v[], int n) { if (n==0) return 0; else return v[n-1]+sum(v,n-1); } Theorem: sum(v,n) correctly returns sum of 1st n elements of array v for any n. Basis Step: Program is correct for n=0; returns 0.  Inductive Hypothesis (n=k): Assume sum(v,k) returns sum of first k elements of v. Inductive Step (n=k+1): sum(v,k+1) returns v[k]+sum(v,k), which is the same of the first k+1 elements of v. 

  7. Proof by Contradiction • Assume negation of goal, show this leads to a contradiction • Example: there is no program that solves the “halting problem” • Determines if any other program runs forever or not Alan Turing, 1937

  8. Program NonConformist (Program P) If ( HALT(P) = “never halts” ) Then Halt Else Do While (1 > 0) Print “Hello!” End While End If End Program • Does NonConformist(NonConformist) halt? • Yes? That means HALT(NonConformist) = “never halts” • No? That means HALT(NonConformist) = “halts” Contradiction!

  9. Defining Efficiency • Asymptotic Complexity - how running time scales as function of size of input • Why is this a reasonable definition?

  10. Defining Efficiency • Asymptotic Complexity - how running time scales as function of size of input • Why is this a reasonable definition? • Many kinds of small problems can be solved in practice by almost any approach • E.g., exhaustive enumeration of possible solutions • Want to focus efficiency concerns on larger problems • Definition is independent of any possible advances in computer technology

  11. Technology-Depended Efficiency • Drum Computers: Popular technology from early 1960’s • Transistors too costly to use for RAM, so memory was kept on a revolving magnetic drum • An efficient program scattered instructions on the drum so that next instruction to execute was under read head just when it was needed • Minimized number of full revolutionsof drum during execution

  12. The Apocalyptic Laptop • Speed  Energy Consumption • E = m c 2 • 25 million megawatt-hours • Quantum mechanics:Switching speed =  h / (2 * Energy) • h is Planck’s constant • 5.4 x 10 50 operations per second Seth Lloyd, SCIENCE, 31 Aug 2000

  13. Big Bang Ultimate Laptop, 1 year 1 second 1000 MIPS, since Big Bang 1000 MIPS, 1 day

  14. Defining Efficiency • Asymptotic Complexity - how running time scales as function of size of input • What is “size”? • Often: length (in characters) of input • Sometimes: value of input (if input is a number) • Which inputs? • Worst case • Advantages / disadvantages ? • Best case • Why?

  15. Average Case Analysis • More realistic analysis, first attempt: • Assume inputs are randomly distributed according to some “realistic” distribution  • Compute expected running time • Drawbacks • Often hard to define realistic random distributions • Usually hard to perform math

  16. Amortized Analysis • Instead of a single input, consider a sequence of inputs • Choose worst possible sequence • Determine average running time on this sequence • Advantages • Often less pessimistic than simple worst-case analysis • Guaranteed results - no assumed distribution • Usually mathematically easier than average case analysis

  17. Comparing Runtimes • Program A is asymptotically less efficient than program B iff the runtime of A dominates the runtime of B, as the size of the input goes to infinity • Note: RunTime can be “worst case”, “best case”, “average case”, “amortized case”

  18. Which Function Dominates? 100n2 + 1000 log n 2n + 10 log n n! 1000n15 3n7 + 7n n3 + 2n2 n0.1 n + 100n0.1 5n5 n-152n/100 82log n

  19. Race I n3 + 2n2 vs. 100n2 + 1000

  20. Race II n0.1 vs. log n

  21. Race III n + 100n0.1 vs. 2n + 10 log n

  22. Race IV 5n5 vs. n!

  23. Race V n-152n/100 vs. 1000n15

  24. Race VI 82log(n) vs. 3n7 + 7n

  25. Order of Magnitude Notation (big O) • Asymptotic Complexity - how running time scales as function of size of input • We usually only care about order of magnitude of scaling • Why?

  26. Order of Magnitude Notation (big O) • Asymptotic Complexity - how running time scales as function of size of input • We usually only care about order of magnitude of scaling • Why? • As we saw, some functions overwhelm other functions • So if running time is a sum of terms, can drop dominated terms • “True” constant factors depend on details of compiler and hardware • Might as well make constant factor 1

  27. Eliminate low order terms • Eliminate constant coefficients

  28. Common Names Slowest Growth constant: O(1) logarithmic: O(log n) linear: O(n) log-linear: O(n log n) quadratic: O(n2) exponential: O(cn) (c is a constant > 1) Fastest Growth superlinear: O(nc) (c is a constant > 1) polynomial: O(nc) (c is a constant > 0)

  29. Summary • Proofs by induction and contradiction • Asymptotic complexity • Worst case, best case, average case, and amortized asymptotic complexity • Dominance of functions • Order of magnitude notation • Next: • Part One: Complexity, continued • Read Chapters 1 and 2

  30. Part One: Complexity, continued Friday, October 4th, 2002

  31. Determining the Complexity of an Algorithm • Empirical measurement • Formal analysis (i.e. proofs) • Question: what are likely advantages and drawbacks of each approach?

  32. Determining the Complexity of an Algorithm • Empirical measurement • Formal analysis (i.e. proofs) • Question: what are likely advantages and drawbacks of each approach? • Empirical: • pro: discover if constant factors are significant • con: may be running on “wrong” inputs • Formal: • pro: no interference from implementation/hardware details • con: can make mistake in a proof! In theory, theory is the same as practice, but not in practice.

  33. Measuring Empirical Complexity:Linear vs. Binary Search • Find a item in a sorted array of length N • Binary search algorithm:

  34. My C Code void bfind(int x, int a[], int n) { m = n / 2; if (x == a[m]) return; if (x < a[m]) bfind(x, a, m); else bfind(x, &a[m+1], n-m-1); } void lfind(int x, int a[], int n) { for (i=0; i<n; i++) if (a[i] == x) return; } for (i=0; i<n; i++) a[i] = i; for (i=0; i<n; i++) lfind(i,a,n); or bfind

  35. Graphical Analysis

  36. Graphical Analysis

  37. slope  2 slope  1

  38. Property of Log/Log Plots • On a linear plot, a linear function is a straight line • On a log/log plot, any polynomial function is a straight line! • The slopey/ x is the same as the exponent horizontal axis vertical axis slope

  39. Why does O(n log n) look like a straight line? slope  1

  40. Summary • Empirical and formal analyses of runtime scaling are both important techniques in algorithm development • Large data sets may be required to gain an accurate empirical picture • Log/log plots provide a fast and simple visual tool for estimating the exponent of a polynomial function

  41. Formal Asymptotic Analysis • In order to prove complexity results, we must make the notion of “order of magnitude” more precise • Asymptotic bounds on runtime • Upper bound • Lower bound

  42. Definition of Order Notation • Upper bound: T(n) = O(f(n)) Big-O Exist constants c and n’ such that T(n)  c f(n) for all n  n’ • Lower bound:T(n) = (g(n)) Omega Exist constants c and n’ such that T(n) c g(n) for all n  n’ • Tight bound: T(n) = θ(f(n)) Theta When both hold: T(n) = O(f(n)) T(n) = (f(n))

  43. Example: Upper Bound

  44. Using a Different Pair of Constants

  45. Example: Lower Bound

  46. Conventions of Order Notation

  47. Upper/Lower vs. Worst/Best • Worst case upper bound is f(n) • Guarantee that run time is no more than c f(n) • Best case upper bound is f(n) • If you are lucky, run time is no more than c f(n) • Worst case lower bound is g(n) • If you are unlikely, run time is at least c g(n) • Best case lower bound is g(n) • Guarantee that run time is at least c g(n)

  48. Analyzing Code • primitive operations • consecutive statements • function calls • conditionals • loops • recursive functions

More Related