1 / 21

Computer Science Background for Biologists

Computer Science Background for Biologists. What is algorithm. Well-defined computational procedure that takes some values as input and produces some value as output. We are interested in the correctness and efficiency of computer algorithms

dalila
Télécharger la présentation

Computer Science Background for Biologists

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Science Background for Biologists

  2. What is algorithm • Well-defined computational procedure that takes some values as input and produces some value as output. • We are interested in the correctness and efficiency of computer algorithms • We seek to extract clean, well-defined problems from the typically messy “real” problem to gain insight into it.

  3. Example of an algorithm • Input: A sequence of n numbers (a1, a2, …an). • Output: A permutation (a’1, a’2, …a’n) of the input sequence such that a’1≤ a’2≤ …a’n.

  4. Exact String Matching • Input: A text string T, where |T| = n, and a pattern string P, where |P| = m. • Output: An index i such that Ti+k-1 = Pk for all 1 ≤ k ≤ m, i.e. showing that P is a substring of T. Text T: Pattern P:

  5. Exact String Matching • Brute force search algorithm for i =1 to n-m+1 do j=1; while ( T[i+j-1] == P[j] ) and (j <= m) j=j+1; if (j > m) then print “pattern at position ”, i;

  6. Algorithm Efficiency • Time efficiency of algorithms • Space efficiency of algorithms

  7. Machine Independent Analysis We assume that every basic operation takes constant time: • Example Basic Operations: • Addition, Subtraction, Multiplication, Memory Access • Time efficiency of an algorithm is the number of basic operations it performs • We do not distinguish between the basic operations.

  8. Time efficiency • In fact, we will not worry about the exact values, but will look at ``broad classes’ of values. • Let there be n inputs. • If an algorithm needs n basic operations and another needs 2n basic operations, we will consider them to be in the same efficiency category. • However, we distinguish between exp(n), n, log(n)

  9. Example: Time Complexity • This algorithm might use only n steps if we are lucky. • We might need about n*m steps if we are unlucky

  10. exp (n) n log n Order of Increase • We worry about the increase speed of our algorithms with increased input sizes.

  11. Function Orders • A function f(n) is O(g(n)) if ``increase’’ of f(n) is not faster than that of g(n). • A function f(n) is O(g(n)) if there exists a number n0 and a nonnegative c such that for all n  n0 , 0  f(n)  cg(n). • If limnf(n)/g(n) exists and is finite, then f(n) is O(g(n))

  12. Implication of Big oh notation • Big oh notation ― an upper bound on the number of steps that an algorithm takes in the worst case. • Suppose we know that our algorithm uses at most O(f(n)) basic steps for any n inputs, and n is sufficiently large, then we know that our algorithm will terminate after executing at most constant times f(n) basic steps. • We know that a basic step takes a constant time in a machine. • Hence, our algorithm will terminate in a constant times f(n) units of time, for all large n.

  13. Algorithm Complexity • Thus the brute force string matching algorithm is O(mn), or takes quadratic time • An quadratic time algorithm is usually fast enough for small problems, but not big ones. • An exponential-time algorithm can only be fast enough for tiny problems

  14. Any improvement based on brute force search? • Some of these comparisons are wasted work! • By being more clever, we can reduce the worst case running time to O(n+m) • Knuth-Morris-Pratt string matching

  15. NP , NP hard, NP complete Problems • A problem is assigned to the NP class if it can be verified in polynomial time. • A problem is NP-hard if an algorithm for solving it can be translated into one for solving any other NP-problem • NP-hard therefore means "at least as hard as any NP-problem,“ • NP-complete: it is both NP problem and NP-hard problem

  16. NP-Completeness • Unfortunately, for many problems, there is no known polynomial algorithm • Even worse, most of these problems can be proven NP-complete, meaning that no such algorithm can exist! • Heuristics , approximate

  17. Shortest Common Superstring • Input: A set S = {s1, s2, … sm} of text strings on some alphabet £. • Output: the shortest possible string T such that each si is a substring of T. • This application arises in DNA sequencing

  18. Shortest common superstring

  19. Shortest common superstring • NP-complete problems. • Can you suggest an algorithm to find the shortest common superstring? • Greedy heuristic ― approximate optimal solution

  20. Greedy Heuristic • We always merge the two strings with the longest overlap • Put the combined string back • Repeat until only one string remains • GREEDY finds a superstring of length at most twice optimal

  21. Time complexity of the greedy heuristic • We assume n strings, each string has a length of k. • N rounds • O(N2) strings comparisons • Each string comparison takes k2 steps.

More Related