Key Ideas in Computer Science: Divide-and-Conquer, Recursion, Greedy Algorithms, Caching

Today: Five Key Ideas in Computer Science John Keyser Upcoming Events/Important Dates

5 Key Ideas • There are certain ideas that pop up repeatedly in computing • Knowing these ideas and recognizing when they can be applied will help you when developing software or approaching problems

1. Divide-and-conquer • General strategy for solving problems • An aspect of Computational Thinking • Break problems into smaller sub-problems, which might be easier to solve (if decoupled) • Example: chess moves • if I could just capture the opponents queen, then I could.... • Or break large datasets into smaller pieces

Example: mergesort • Given a list of numbers in random order • Split the list into 2 halves • Sort each half independently • Merge the two sub-lists (interleave) ← here is a list to sort ←divide into 2 sub-lists ←sort each separately ←merge them back together 1 18 22 17 6 13 9 10 7 15 14 4 1 18 22 17 6 13 | 9 10 7 15 14 4 1 6 13 17 18 22 | 4 7 9 10 14 15 1 4 6 7 9 10 13 14 15 17 18 22

Divide And Conquer • Tree structures • Top-down decomposition of a problem • Used to enable search/organization of data • Expressing hierarchies in data/relationships • Binary search • Repeatedly reduce to half size • Etc.

2. Recursion • A form of divide-and-conquer • Write functions that call themselves • Example: factorial n! = 1x2x3...n = n(n-1)! deffact(n): if n<=1: return 1 // base case return n*fact(n-1) call trace: fact(3) => fact(2) => fact(1) 1 <= 2*1=2<= 3*2=6 <=

2. Recursion • A form of divide-and-conquer • Write functions that call themselves • Example: factorial n! = 1x2x3...n = n(n-1)! deffact(n): if n<=1: return 1 // base case return n*fact(n-1) • Example: mergesort • when you divide list into 2 halves, how do you sort each half – by calling mergesort, of course! call trace: fact(3) => fact(2) => fact(1) 1 <= 2*1=2<= 3*2=6 <=

Recursion • Used to process things set up as divide-and-conquer • Algorithms can be expressed recursively, analyzed recursively • Used to define Fractals • Key to functional languages’ usefulness

You always need a base case…

3. Greedy algorithms • Most implementations involve making tradeoffs • We know NP-complete problems are hard and probably cannot be solved in polynomial time • Use a heuristic/shortcut – might get a pretty good solution (but not optimal) in faster time • Greedy methods do not guarantee an optimal solution • However, in many cases, a near-optimal solution can be good enough • It is important to know when a heuristic will NOT produce an optimal solution, and to know how sub-optimal it is (i.e. an “error bound”)

Examples of greedy algorithms • Navigation, packet routing, shortest path in graph, robot motion planning • Choose the “closest” neighbor in the direction of the destination • Document comparison (e.g. diff) • Start by aligning the longest matching substrings • Knapsack packing • Choose item with highest value/weight ratio first • Scheduling • Schedule the longest job first, (or the one with most constraints) ..out to be more efficient to find thelength of the longest subsequence. Then in the case where the...... .....increase the efficiency using thelength of the longest subsequence. But if the first characters differ..

Greedy: Optimal or Not? • Sometimes Greedy is NOT optimal: • Scheduling: schedule the first to start • Sometimes Greedy IS optimal: • Scheduling: schedule the first to end

4. Caching • One way to improve the efficiency of many programs is to use caching – saving intermediate results in memory that will get used multiple times • Why calculate the same thing multiple times? • Might require designing a special data structure (e.g. a hash table) to store/retrieve these efficiently • Amortization: the cost of calculating something gets divided over all the times it is used

Calculating Fibonacci numbers • F(n) = F(n-1)+F(n-2) • Base cases: F(1) = F(2) = 1 • This sequence of numbers arises in several patterns in nature, as well as the stock market • 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144... • F(3) = F(2) + F(1) = 1 + 1 = 2

Calculating Fibonacci numbers • F(n) = F(n-1)+F(n-2) • Base cases: F(1) = F(2) = 1 • This sequence of numbers arises in several patterns in nature, as well as the stock market • 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144... • F(4) = F(3) + F(2) = F(2) + F(1) + 1 = 1 + 1 + 1 = 3

Calculating Fibonacci numbers • F(n) = F(n-1)+F(n-2) • Base cases: F(1) = F(2) = 1 • This sequence of numbers arises in several patterns in nature, as well as the stock market • 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144... • F(5) = F(4) + F(3) = F(3) + F(2) + F(2) + F(1) = F(2) + F(1) + 1 + 1 + 1

Calculating Fibonacci numbers • F(n) = F(n-1)+F(n-2) • Base cases: F(1) = F(2) = 1 • This sequence of numbers arises in several patterns in nature, as well as the stock market • 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144... • F(6) = F(5) + F(4) = F(4) + F(3) + F(3) + F(2) = F(3) + F(2) +F(2) + F(1)+ F(2) + F(1) + 1 = F(2)+F(1)+ 1 + 1 + 1 + 1 + 1 + 1 = 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 = 8

Fibonacci Numbers • If you just use pure recursion, calculating Fibonacci(x) will take exponential time • Specifically, takes O() time • But, we can cache the results of previous computations. • So, can compute Fibonacci(x) in O(x) steps!

80486 • Caching also applies to hardware design • Memory hierarchy • For constants or global variables that get used frequently, put them in a register or L1 cache • Analog to a “staging area” • Variables used infrequently can stay in RAM • Very large datasets can be swapped out to disk 1kb cache

B 18 22 20 A C 25 32 D 19 B E 18 22 26 20 A C 25 32 D • An important example of caching is Dynamic Programming • Suppose our goal is to compute the min. travel distance between A and E • Build-up a table of smaller results for a subgraph

19 B E 18 22 26 20 A C 25 32 D • Extend table for larger results • Add row/column for E • E connects to the network at B and C • Compute dist of XE based on XB and XC d(D,E)=min[d(D,B)+19,d(D,C)+26] =min(47+19,25+26) =min(66,51)=51 d(A,E)=min[d(A,B)+19,d(A,C)+26] =min(18+19,20+26) =min(37,47)=37

5. Abstraction and Reuse • Abstraction is the key to becoming a good programmer • Don’t reinvent the wheel • More importantly, reuse things that have already been tested and debugged • This is the basis of Object-Oriented Programming

Many large software projects are built by plugging components together • Write a small amount of (“glue”) code that makes things work together • Example: creating a web browser out of: a) An HTML text parser b) A display engine (graphics, windows) c) URL query/retrieval network functions d) Plug-ins

Examples of Abstraction • Making a function out of things you do repeatedly • Parameterize the function so the function can be applied to a wider range of inputs

Here is output for scores of TAMU basketball games: histo([82,91,68,75,79,88,67,52,74,73,52,41,63,69,57,75,72,51,55,52,36,72]) 50 **** 55 ** 60 * 65 *** 70 **** 75 *** 80 * 85 * 90 * 95 Here is code for printing a histogram of basketball scores (which typically range between 50 and 100 points): def histo(Scores): i = 50 while i<100: c = 0 for s in Scores: if si and s<i+5: c += 1 print i,’*’*c i += 5

Here is output for scores of TAMU football games: histo([52,65,42,42,45,41,41,56,57,51,10,21,52],Low=0,High=70, Step=10) 0 10 * 20 * 30 40 ***** 50 ***** 60 * Suppose we want to generalize this code for printing a histogram of football scores too, which span a different range. Add parameters of lower and upper bound of histogram A and B, and step size S. def histo(Scores,Low,High,Step): i = Low while i<High: c = 0 for s in Scores: if si and s<i+Step: c += 1 print i,’*’*c i += Step

Object-oriented classes • Encapsulation – define internal representation of data • Interface – define methods, services • Good design – make the external operations independent of the internal representation (helps decouple code) • Example: a Complex number is a ‘thing’ that can be added/subtracted, multiplied (by another Complex or a scalar), conjugated, viewed as (a+bi)

Here is an example of a class definition of Complex Numbers in C++ class Complex { double re,im; // interval variables public: // constructor (initialization) Complex(double x,double y) { re = x; im = y; } void conjugate() { im *= -1; } double magnitude() { double z=re*re+im*im; return sqrt(z); } void print() { cout << "(" << re << "+" << im << "i)"; } }; A Complex object representing 1+2i has two member variables for holding the real and imaginary components. re=1.0 im=2.0

#include <iostream> #include <iomanip> #include <math.h> using namespace std; class Complex {...from previous slide... }; int main() { Complex p=Complex(1,2); cout << “|p|=“ << p.magnitude() << "\n"; } Note how we get the magnitude of a Complex object by invoking a method on it, p.magnitude(). The calculation is done internally to the object. Output: > g++ complex.cpp –o complex -lm > complex p = (1.0+2.0i) |p| = 2.23607

Templates in C++ • If you can sort a list of integers, why not generalize it to sort lists with any data type that can be pairwise-compared (total order)? 1 2 3 5 6 4 9 8 7 1 2 3 4 5 6 9 8 7 1 2 3 5 6 4 9 8 7 1 3 5 2 6 4 9 8 7 void insertionSort(int a[], int n) { for (int i = 1; i < n; i++) { int temp = a[i]; for (int j=i; j>0; j--) if(temp < a[j-1]) a[j] = a[j-1]; else break; a[j] = temp; } }

Templates in C++ • Can use same algorithm to sort any type T, as long as element can be compare with ‘<‘ operator • Works on float, characters, strings... template <class T> void insertionSort(T a[], int n) { for (int i = 1; i < n; i++) { T temp = a[i]; for (int j=i; j>0; j--) if(temp < a[j-1]) a[j] = a[j-1]; else break; a[j] = temp; } } defined for string, characters, floats...

API design – Application-Programmer Interface • A coherent, complete, logical system of functions and data formats • Example: OCR (optical character recognition) • You don’t want to have to implement feature-based character recognition that is font- and scale-independent yourself (probably) • Interface defines input (e.g. scanned TIFF images) and output (e.g. ASCII strings) String* OCRscan(TiffImage* input_image) • Are you going to indicate coordinates where word was found on the page? • Is the user able to load different character sets (alphabets)?

Abstraction • In my opinion, it’s the most important idea in all of computer science • Helps us manage complexity in design • Of computers • Of operating systems • Of algorithms • Of data structures • Of programs/functions • Of software projects/applications

Engineering Principles in Software Engineering A summary of the key ideas we talked about... 1. Divide-and-conquer 2. Recursion 3. Greedy algorithms, tradeoffs 4. Caching and dynamic programming 5. Abstraction and reuse

Upcoming Events/Important Dates • Next Class • Thursday 10/3: Prof. James Caverlee • Data-Driven Analytics

Key Ideas in Computer Science: Divide-and-Conquer, Recursion, Greedy Algorithms, Caching

Key Ideas in Computer Science: Divide-and-Conquer, Recursion, Greedy Algorithms, Caching

Presentation Transcript

“This is a Test. This is Only a Test!”

Software Testing

3D Test Issues

Test and Test Equipment December 2012 Hsin -Chu , Taiwan

Who wants to be a Millionaire?

Test Preparation, Test Taking Strategies, and Test Anxiety

Test Automation Tools: QF-Test and Selenium

System Test Specification

TDC ( Test Description Code)

Engine Condition Diagnosis

Chi-square test or c 2 test

200

Test del Software, con elementi di Verifica e Validazione, Qualità del Prodotto Software

Test of Significance

System Test Tools

Lesson 7