1 / 16

Chapter Six in R Book Chapter 5 in Mount

Chapter Six in R Book Chapter 5 in Mount. M.M. Dalkilic. Lecture VI. Outline. Algorithm vs. Heuristic Statistics FASTA & BLAST brief introduction. Algorithm. General, well-specified sequence of instructions capable of being run on a Turing-complete computing device or formalism.

amina
Télécharger la présentation

Chapter Six in R Book Chapter 5 in Mount

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter Six in R BookChapter 5 in Mount M.M. Dalkilic Lecture VI

  2. Outline • Algorithm vs. Heuristic • Statistics • FASTA & BLAST brief introduction

  3. Algorithm • General, well-specified sequence of instructions capable of being run on a Turing-complete computing device or formalism. • You’ve already used BU Dynamic programming to find the minimal path from s to t in a structure w x s t y z

  4. Algorithm • Formal parameters (signature) Input (list of weights) Output (path + score) • Did you make your program generic? • Error checking? • Two other solutions w x s t y z

  5. Algorithm • Formal parameters (signature) Input (list of weights) Output (path + score) • Solution (I) Enumerate (path + scores) • (p1 + s1), (p2 + s2), … • linearly search for minimal pi

  6. Heuristic • Formal parameters (signature) Input (list of weights, threshold t, fitness function f, error e) Output (path + score) • Solution (II) Genetic Program (Sketch—so you can work out some of the details yourselves) • Encode solution into binary form sb • Randomly change bits in sb to create a family of solutions S = {sb, sb1, sb2, … , sbk} • Form S’ = {sbi | f(sbi) > t} • Limit <= max(f(S’)) • If (Previous Limit – Current Limit < e) return sbi that is maximum • From S” by randomly swapping bits BETWEEN solutions • From S” change a few bits randomly in a few solutions • GOTO 3

  7. Heuristic • Formal parameters (signature) Input (list of weights, threshold t, fitness function f, error e) Fitness function measures “goodness” of solution Error is the degree to which you’re willing to be different from the actual solution were it to exist (think about that)

  8. Heuristic • Problems • Local optima (in this case convex areas) • Difficult to search entire space—(kangaroos in the mist)—so must sometimes make “leaps of faith” • Not guarantee to converge—so you need to keep track of iterations • Does not produce same output, given same input

  9. Heuristic • Problems • Local optima (in this case convex areas) • Difficult to search entire space—(kangaroos in the mist)—so must sometimes make “leaps of faith” • Not guarantee to converge—so you need to keep track of iterations • Does not produce same output, given same input

  10. More Statistics • Recall F test is ratio of estimation of population variance from sample means to estimate average of sample variances • A large F value indicates difference—a small indicates no difference. “Large” can be associated with P value—the uncertainty you’re willing to accept in assuming the F value is truly reflective of the population.

  11. Handout showing (Matlab and P values) For comparing two variances

  12. t Test • Most often used test • Most often incorrectly used test • Cannot do sum of jointwise groups without taking into all parameters that affect P value • t = ratio of difference of sample means to standard error of difference of sample means • When there are two samples F=t2

  13. t Test

  14. t Test • What if samples differ?

  15. To Do’s Due Next Friday • Pick a disease of unknown etiology and begin accumulating papers on it—minimal (10) • Rewrite solutions to BU DP problem using Sol (I) and Sol (II). • Create a 2D plot in R of the solutions you generate from the above. The abscissa is a number created by prefixing the nodes on the leftmost side of the graph 1,2,3,4,5 from top to bottom to the base 10 value of the sequence of 1’s and 0’s for up and down respective. The bottom most path would be 5(1010)_2 = 510. This is paired with the value of the path 17. You would then have a point at (510,17). Plot Sol (I) in RED and Sol (II) in BLUE. Interpret the graph with respect to the search space and solution. • Problem 1 page 222 in Mount • What does TFIIIA bind to? Using BLASTA what orthologues do you find. What is its function? • You have three groups, Control, Group A, Group B. What do you conclude from the data given next about the groups:

More Related