1 / 30

Lior Pachter & Bernd Sturmfels

Algebraic Statistics for Computational Biology. Lior Pachter & Bernd Sturmfels. What is Biology? The study of living organisms. What is Statistics? The science concerned with the collection, organization, analysis and interpretation of data. What is Algebra?

mingan
Télécharger la présentation

Lior Pachter & Bernd Sturmfels

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algebraic Statistics for Computational Biology Lior Pachter & Bernd Sturmfels

  2. What is Biology? The study of living organisms. What is Statistics? The science concerned with the collection, organization, analysis and interpretation of data. What is Algebra? The part of mathematics that deals with generalized arithmetic.

  3. What is Algebraic Statistics?

  4. Customers who bought this book might also buy… What is Algebraic Statistics? There is no dictionary definition yet. The term was coined by European statisticians interested in applying Gröbner bases to the design of experiments. Their book is: G. Pistone, E. Riccomagno and H. Wynn, “Algebraic Statistics: Computational Algebra in Statistics”. CRC Press, 2000.

  5. Algebraic Statistics for Computational Biology Edited by Lior Pachter and Bernd Sturmfels Table of Contents Part I - Introduction to the four themes 1. Statistics 2. Computation 3. Algebra 4. Biology Part II - Studies on the four themes 5. Parametric Inference 6. Polytope Propagation on Graphs 7. Parametric Sequence Alignment 8. Bounds for Optimal Sequence Alignment 9. Inference Functions 10. Geometry of Markov Chains 11. Equations Defining Hidden Markov Models 12. The EM Algorithm for Hidden Markov Models 13. Homology Mapping with Markov Random Fields 14. Mutagenic Tree Models 15. Catalog of Small Trees 16. The Strand Symmetric Model 17. Extending Tree Models to Split Networks 18. Small Trees and Generalized Neighbor Joining 19. Tree Construction Using Singular Value Decomposition 20. Application of Interval Methods to Phylogenetics 21. Analysis of Point Mutations in Vertebrate Genomes 22. Ultra-Conserved Elements in Vertebrate and Fly Genomes New book: Algebraic Statistics for Computational Biology Edited by Lior Pachter and Bernd Sturmfels Cambridge University Press, Summer 2005

  6. Her name is DiaNA.She makes DNA sequences. TAGAGACGGGGGTTTCACAATGTTGGCCA Who is this girl ?

  7. The human genome Consists of 2.8 billion DNA bases. Sequenced in 2001 and finished in 2004. Contains genes: - these are subsequences which code for protein. - estimated number of genes: 20,000-25,000. - genes make up less than 5% of the genome. Example: Breast-ovarian cancer susceptibility gene (BRCA1)

  8. The human genome

  9. The human genome

  10. >hg17_dna range=chr17:38464686-38473085 5'pad=0 3'pad=0 revComp=FALSE strand=? repeatMasking=noneATCCAGAAGTCTAGTATACATCTCAAAATTCATGCATCTGGCCGGGCACAGTGGCTCACACCTGCAATCCCAGCACTTTGGGAGGCCGAGGTGGGTGGATTACCTGAGGTCAGGAGTTTAAGACCAGCCTGGCCAACATGGTAAAACCCCATCTCTACTAAAAATACAAGTATTAGCCAGGCATTGTGGCAGGTGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAAAATCACTTGAACCGGGAGGCGGAGGTTGGAGTGAGCTGAGATCGTGCTACCGCACTCCATGCACTCTAGCCTGGGCAACAGAACGAGATGCTGTCACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAAATTCTCACATCTAAAACAGAGTTCCTGGTTCCATTCCTGCTTCCTGCCTTTCCCACTCCCCCATATTCCCTACCATGCCTTCTTCATCTAATTTAATATTACTAACAAGATCTATTGTTCAAGCCAAAACCCAAGTGTCACTCCTTCAATTTCTCTTTACCTTATCCTCCAAATTTAATCCATTAGCAAGTCCTCTCTTCAAACCCATCCCAAACCAACCTTGTTTTTAACCATCTCCACACCACCAATTACCACAAGGATAAAATCTGAATTCCTTACCACCAAATACTATGTGATCTGGCCCTCATCTATGACCTTCTCCCATTCCTTGTGTAATCTCTGCCTCCACACATAATTTGCAAATTACTCCAGCTACACTGGCCTATTATTATTATTATTATTATTTTTGAGACGGAGTCTTGCTCTTTCGCCCAGCCTGGAGTGCAGTGGCGCAATCTCAGCTCACTGCAATCTCCGCCTCCTGGGTTCAAGCGATTCTCCTGCCCCAGCCTCCCAAGTAGCTGTGATTACAGGCACATGCCACCATTCCCAGCTAATTTTTTTTTGTTTTTGAGATGGAGTTTCACTCTTGTTGCCCAGGCTGGAGTGCAATGGTGCGATCTCAGCTCACCACAACCTCCACCTCCCGGGTTGATGAAGTGATTCTCTTGTCTCAGCCTCCCGTGTAGCTGGGATTAGAGGCACGCGCCACCACGCTGGGCAAATTTTTGTATTTTTAGTAGAGACAGGGTTTCTACCTCAGTGATCTGTCCGCCTTGACCTCCCAAAGTGCTGGGATTACAGGAATGAGCCACCACACCCAGCCGTGCCCAGCTAATTTTTGCATTTTTTAGTAGAGATGGGGTTTTGCCACGTTGGCCAGGCTGGTCTCAAACTCCTGACCTCAGGGGATCTGCCTGCCTCGGCCTCCTAGAGTGCTGGAATTACAGGTGTGAGCCACTGTGCCCGAACCTTTTATCATTATTATTTCTTGAGACAGGAGTCTTGCTCTGTCGTTCAGGCTGGAGTGCAGTGATGCGATCTTGGCTCACTGTAACTCCTACCTTTCGGTTCAAGTGATTCTCCTGCCTCAGCCTCTGGAGTAGCTGGGATTACAGGCACTGGGATTACAGGCACACACCACCACACCATGCTAGTTTTTTGTATTTTTAGTAGAGATGGGGTTTCACCATGTTGGCCAGGCTGGTCTCGAACTCCTGACCTCAAGTGATTTGCCTGCCTTGGCTTCCCAAAGTGCTGGGATTATAGGCACGAGCCACCACACACGACCAACATTGGCCTATCTTTTAAAAAATAAACCAAGCTCTGGCCGGGCACAGTGGCTCACACCTGTGATCCCAGCACTTTGGGAGGTTGAGGTGGTTGGATCACTTGAGTTCAGGAGTTTGAGACCAGCCTGACCAACGTGGTAAAACCCCATCTCTACTAAAAATAAAAACTAGTCGGGTGTGGTAGCACGCGTGCCTGTAATACCAGCTACTCAGGAGGCCAAGGCAGGAGAATTGCTTGAACCCAGGAGACAGAGTTTGCAGTGAGCCAAGATTGTGCCACTGCACTCCAGCCTGGGGGATAGAGGGAGACACCATCTCAAAAAAACCAAAATACAGAAATCAAAAAACCACACTCATTATTACCTCAAGACCTTTATGTTTGCTATTCCTCTGCCTATAAGATGCATTCCCTTCATTTTTCAAGGACAATTATTTCTTGTTATTTAGGTCTCAGCTCAATTTTTTCAGAAAGGCTTTCCCTGGCCTCCTTAAACGAAAGTAATCAACAACCTTTGACAGCTAATACTATTCCACTGTTCTGTATATTTCTCCATAGCATTTATTGTTATCTTAAATTCATCTTTATTGTGTATCTCCCCTCGACAGAACCTGAATCCTACCAGGGACTTAGTTAGTCTTATTTACTGTTGCATTCCTAGTGCCCAGAACACAGTAGGCTCCCAATAAATAGCCACTGAATAAAAGTTAAAACCAACAAAAATAATCATTTAATTAATTATGAATACATCGAATTGTGCACAATAGTTTATAAAATTACTTTTTTTTTTTTTTTAAGACAGGGTCTCATTCTGTCTCACAGGCTGGAGTGCAGTGGTGCAATCTAGGCTCACTGCAACCTCCGCCTCCCGGGTTCAAGTGATTCTCCTGCCTCAGCCTCCCCAGCAGCTAGGATTACAGGCACATGCCACCACGCTCGACTAATTTTTTTGTGTTTTTAGTAGAGACAAGGTTTCACCATGTTGACCAGGCTGGTCTCGAACTCCTGACCTCAAGTGATCCACCTGCCTTGGCCACTCAAAGTGCTGGGATTATAGGCATGAGCCACCACGCCTGGCCTATAAAATTACTTTCACATTTCATTTTGCCTGATCTGTTGTCACAGAAGTTCTCAGATGGCTGTTCTGAAATTATTCCTCCTCCTACACTCTATCTTATTTACTTCTCACTGTTCTCAGTATCATAAAGTGCAACATCTTTTTGAAGCAATCTGAATTATAAACAGATACATTTGCATGTATATATATGTATATATGCATATGCACACACACACTTTTTTTTTTTTAAGAGACAGGGTCTTGCTCTGTGCAAGTGCAAGAGTGCAATGGTATGATCATAGCTCACTGCAGCCTTGAACTCCTGGGCTCAAGTGATTCTTCTGGCTTAGCTTCCTCAGTAGCTAAGACTACAGAAGCACACTGCCATGCCCGGCTAATTAAAAAAAAATTTTGTGGAGACAGAGTCTCACTATGTTGCCCAGGCTGGTTTCAAACTCCTGGCCTCAAGTAATCTTCCTGTCTCAGCCTCCCAAAGGGCTGAGATTATAAGTGTGAGCCACTGCATCTGGACTGCATATTAATATGAAGAGCTTTTCTTCAACAACAGTGAACAGTTTTCTACAAAGGTATATGCAAGTGGGCCCACTTCTTGTTCTTATGAATCTTTTCTTTCCTTTTATAAAACTCCTTTTCCTTTCTCTTTTCCCCAAAGAAAGGACTGTTTCTTTTGAAATCTAGAACAAATGAGAACAGAGGATATCCTGGTTTGCGCTGCAAAATTTTTTTTTTTTTTAAGACGGAGTCTCGCTCTGTTGCCAGGTTGGAGTGCAGTGGCACGATCTTGGCTCATTGCAACCTCCACCTCCCGGGTTCAAGAGATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGAACTAAAGGCGCATGCCACCACGCTGAGTAATTTTTTGTATTTTAGTAGAGACAGGGTTTCACCATGTTGCCCAGGCTGATCTCGAACTCCTGAGCTCAGGCAATCTGCCTGTCTTGGCCTCCCACAGTGTTAGGATTACAGGCATGAGCCACTGCACCCGATTTTTTTTTTCTTTTGATGGAGTTTTGCTCTTGTTGCCCAGGTTAGAGTGCAATGATGCGATCTCAGCTCACTGCAACCCCCGCCTCCCAGGTTCAAGTGATTCTCCTGCCTCAGCCTCCCGAGTAGCTGGAATTACAGGCAAGTGCCACCAAGCCCGGCTAATTTTGTATTTTTAGTAGAAACGGGGTTTCTCCATGTTGGTCAGGCTGGTCTTGAACTCCCGACATCAGGTGATCCAAGCGCCTCAGCCTCCCAAAGCGCTGGGATTATAGGTATGAGCCACAGTGCAGGCCTGCATAATTCTTGATGATCCTCATTATCATGGAAAATTTGTGCATTGTTAAGGAAAGTGGTGCATTGATGGAAGGAAGCAAATACATTTTTAACTATATGACTGAATGAATATCTCTGGTTAGTTTGTAACATCAAGTACTTACCTCATTCAGCATTTTTCTTTCTTTAATAGACTGGGTCACCCCTAAAGAGATCATAGAAAAGACAGGTTACATACAGCAGAAGAACGTGCTCTTTTCACGGAGATAGAGAGGTCAGCGATTCACAAAAGAGCACAGGAAGAATGACAGAGGAGAGGTCCTTCCCTCTAAAGCCACAGCCCTTTAATAAGGCTTGTAGCAGCAGTTTCCTTCTGGAGACAGAGTTGATGTTTAATTTAAACATTATAAGTTTGCCTGCTGCACATGGATTCCTGCCGACTATTAAATAAATCCCTAGCTCATATGCTAACATTGCTAGGAGCAGATTAGGTCCTATTAGTTATAAAAGAGACCCATTTTCCCAGCATCACCAGCTTATCTGAACAAAGTGATATTAAAGATAAAAGTAGTTTAGTATTACAATTAAAGACCTTTTGGTAACTCAGACTCAGCATCAGCAAAAACCTTAGGTGTTAAACGTTAGGTGTAAAAATGCAATTCTGAGGTGTTAAAGGGAGGAGGGGAGAAATAGTATTATACTTACAGAAATAGCTAACTACCCATTTTCCTCCCGCAATTCCTAGAAAATATTTCAGTGTCCGTTCACACACAAACTCAGCATCTGCAGAATGAAAAACACTCAAAGGATTAGAAGTTGAAAACAAAATCAGGAAGTGCTGTCCTAAGAAGCTAAAGAGCCTCAGTTTTTTACACTCCCAAGATCAATCTGGATTTATGATTCTAAAACCCCTGGTGACAGAATCAGAGGCTGAAAACACCACTAATTATAACCAGCAGGTATGGATATTTGGAAGTCTAGGGGAGGCTGATATGAAGTTAAGACCAGAGGAAATATCTGTCCACTCCCTCTTCTCAACACCCATCTTCTAGACGCCAAGGCTAGCTATAGATCTCCATTATAGTGTTCAAGGAATTAGGAATTATCCATGTCAATAGTTTTGATTAATGTGGACGGAGAACATCTATATTACTAGATGGCAATATGTGAAAGAAGAAAACAGTATTGTTGAAAACCTAAATCTGAAATGTCAATGTAATGACAAATTTTCACCCCTAGAATGTCTACCTGGGGAGTCCTAACCCTCTAATATTCCCCTGAGAGGGATGGGAGAATACAGTGCAGAGCTTTTATATAAGTATTTCAGAAAGCAGTAGCTAAAGAATCACTTGTTTATTTCCCAGTGTTTCAAAGGCCCTTCTGAAGAACTAAGCAAACTAAGGAAAGACCATTTAGTTTTAAACAGGAGAAATGTATTTAACTAAATCCTAAACACAGCAGGCTATCTGCAAGCAGCAGCAGCAGCAGCAGCCATGCTCCCTCACAGAATCCTTACAATTTTTGAAGTTTTTTGTTTAACTGCTACAAAAGCCGATTTAGTAACATTTATTACACTTAAAAACTTCAGTTCATTTGTAGTTCAAAGCAAATGTATTGGCTTTGAGTTTAAAGACTGAACTACTTTAGATTTGATTTGCATTTTTTTTTTTTTTTTTTTTTGAGATGCAGTCTTGCTCTGTCAGCCAGGCTGGAGTGCAGTGGCTGGATCTCAGCTCACGGCAAGCTCTGCCTCCTGGGTTCATGCCATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGGACTACAGATGCCCGCCACCATGCCCGGCTAATTTTTTGTATTTTTACTAGAGATGGGGTTTCACCGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCTGCCCGCCTTGGCCCCCCAAAGCGCTGGGATTACAGGCCTGAGCCACCACGCTTGGCATCTTTTTACCTTTCATTAACTTTGATGCAAACCTATAGCTTAAGGTATCTTAAACTTTAATGACATTTTTCTCTAAAATAGTAGTTTGTAATAACTTGTTCTGGCACCTGGCTCCAATGAACACTACCCTCTGACCCTGTGGTATAATTTTCATGAGTAAGTGGAAACCTAAGATCTTAGAAGTTCAACGGCAATGTGTCCAAGGGGTTTAGATCCTCTCCTTAAGTGCCTGTATCTCTGTGAAAAGAATCATCATAGGCTAGGCGCGATGGCTCACACCTGTAATCCCAGCACTTTGGGAGGCCGAGGTAGGTGGATCACCTGAGGTCGGGAGTCCAAGACCAGCCTGACTGACATGGAAAAACCCTGTCTCTACTAAAAATACAAAATTAGGTATGGTGGTGCATTCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGGAGGGGGAGGTTGCAGCAAGCCAAGATCGTGCCATTGCACTCCAGCAGCCTGGGCAACAAGAGTGAAAAACTACACCTCAAAAACAAAAACAAAAACAAAAGAATCATCATCAAGTGAACTGGAACACATCCAGAGAACTAATTTTGTTAGAAAGATTTTAGAGTTGAGCCACACAATCTGCATCTTCTGCGTCCTCCATGCACTCGTCTGCTTTCTGGAGCCCCATGAGTGAGTCTTAATCCTGTTCCAGATAACAGTTCTCTTCCGGGTAACGGTTCTTCAGATACTTGAAGACAGTGTCTTATTTCCTTAAATCTTCTCATTTCTTCTTCAAAAGACAGTATTTCAAGTTACTTTTATGTATCTTTACCATCTACCTCTGGATAAACACTCTCCAATTTGTCAGTGACCATGTTAAAAACCAAGCACGGTGCTTAAAACTGACATCATCTTTCAGGCAATCACTCCATTGGAGAATACAGTGGGGCTCTGGATCTGTACTTCACTTGCTCCAGAGCCTCTGCTTGTGTTAATACGGCCCAGTTTCAAATAAGCATTTTTAGCAGCCCTGAAATGTGTACTCAGATTTAGTTTATAGTCAACTAAAAACACCCAGAGGTCTCCTGTATTACACAAGTTATAATTAAAACCTTAAAAGAGAAAGGTATAGGACAAATGATCTGTCTCCTCCCTTTTTTGCTTTTTCATATGTTAAGACTATCTCGGAGCTGTTATCAGACTTTTTTCCTGAAAAACTCTCAACAATACTCAAACTAGGTGTTACATGAAGCTGGGGTCTCCAGGTTTTGCCTCACTTGTTCTTTCTTTTGTTGTTGTTGAGACAGAGTCTCACTCTGTCGCCAGGCTGGAGTGCAGTGGCAGGATCTCAGCTGACTGCAACCTCAGCCTCCAGAGTTCAAGCAATTCTTCTGTGTCAGCCTCCCAAGTAGCTGGGATTACAGGTGCACACCACCACGCCCAGCCA

  11. INPUT: ..t..r…o..p..i..c..a..a..l...g..e..e..t..r..y.. OUTPUT: ..t..r…o..p..i..c..a..a..l...g..e..e..t..r..y.. ome Annotation is the labeling of the input sequence, in this case with 3 colors: keep change delete Another example of annotation

  12. tctctggttagtttgtaacatcaagtacttacctcattcagcatttttctttctttaatagactgggtcacccctaaagagtctctggttagtttgtaacatcaagtacttacctcattcagcatttttctttctttaatagactgggtcacccctaaagag tccgggattagtctgtatgaggtacccaccacactcagaagttttctttcttggatagacttgatcacccctgaagagaag

  13. Data summary tctctggttagtttgtaacatcaagtacttacctcattcagcatttttctttctttaatagactgggtcacccctaaagag tccgggattagtctgtatgaggtacccaccacactcagaagttttctttcttggatagacttgatcacccctgaagagaag

  14. Statistics Question Are the two sequencesindependent? Algebra Question Is the 4x4 matrix close to rank 1?

  15. The independence model • m = 16 observable states {A,C,G,T}2 • d = 6 unknown parameters • = (aA , aC , aG , aT , bA , bC , bG ,bT) where aA+ aC +aG +aT=bA +bC +bG +bT= 1 Independence means probabilities factor AG= prob(A,G) = aAbG

  16. The independence model • m = 16 observable states {A,C,G,T}2 • d = 6 unknown parameters • = (aA , aC , aG , aT , bA , bC , bG ,bT) where aA+ aC +aG +aT=bA +bC +bG +bT= 1 Independence means probabilities factor AG= prob(A,G) = aAbG The model is the polynomial map • (a,b) aTb

  17. Models for discrete data A statistical model is a parameterized family of probability distributions U Q U D d = number of parameters m = number of observable states Q = the parameter space D = probability simplex on the m states

  18. The geometry of maximum likelihood estimation parameter space data probability simplex

  19. Observed data tctctggttagtttgtaacatcaagtacttacctcattcagcatttttctttctttaatagactgggtcacccctaaagagatc tccgggattagtctgtatgaggtacccaccacactcagaagttttctttcttggatagacttgatcacccctgaagagaag

  20. tctctggttagtttgtaacatcaagtacttacCTCATTCAGCATTTTTCTTTCTTTAATAGACTGGGTCACCCctaaagagatctctctggttagtttgtaacatcaagtacttacCTCATTCAGCATTTTTCTTTCTTTAATAGACTGGGTCACCCctaaagagatc tccgggattagtctgt---atgaggtacccacCACACTCAGAAGTTTTCTTTCTTGGATAGACTTGATCACCCctgaagagaag ** * ***** *** ** * **** *** ** **** * *********** ******* * ******** ****** Hidden data

  21. c g t Example: n=5, m=4 g gttta- gt--gc ** g t t t a finish start The alignment problem is to find the shortest path in the alignment graph: This is solved with dynamic programming and is known in computational biology as the Needleman-Wunsch algorithm.

  22. The algebraic statisticalmodel for sequence alignment, known as the pair hidden Markov model, is the image of a map whose coordinates are polynomials with one term for each path in the alignment graph. The logarithms of the 33 parameters give the edge lengths for the shortest path problem on the alignment graph.

  23. General Mathematical Framework • Statistical models are algebraic varieties. • Algebraic varieties can be tropicalized. • Tropicalized models are useful • for MAP inference in statistics. L. Pachter and B. Sturmfels, Tropical Geometry of Statistical Models, Proceedings of the National Academy of Sciences, Volume 101:46 (2004), p 16132--16137. L. Pachter and B. Sturmfels, Parametric Inference for Biological Sequence Analysis, Proceedings of the National Academy of Sciences, Volume 101:46 (2004), p 16138--16143.•

  24. 2.1. Tropical arithmetic and dynamic programming In tropical algebraic geometry, varieties are piecewise linear…

  25. Human tctctggttagtttgtaacatcaagtacttacCTCATTCAGCATTTTTCTTTCTTTAATAGACTGGGTCA Chimp tctctggttagtttgtaacatcaagtacttacCTCATTCAGCATTTTTCTTTCTTTAATAGACTGGGTCA Mouse tcccagatcagttcgt---atcaggtacccacCACATTCAGAAGTCTTCTTTCTTGGATAGACCGGACCA Rat tccgggattagtctgt---atgaggtacccacCACACTCAGAAGTTTTCTTTCTTGGATAGACTTGATCA Dog tttctgattcgtttgtaacattgagtacctacCTCATCTAGTATCTTTCTTTCTTTAATAGACTGGGTTA * * * ** ** ** **** *** ** ** * ********* ****** * * Comparative Genomics A phylogenetic tree on 5 taxa.

  26. Human tctctggttagtttgtaacatcaagtacttacCTCATTCAGCATTTTTCTTTCTTTAATAGACTGGGTCA Chimp tctctggttagtttgtaacatcaagtacttacCTCATTCAGCATTTTTCTTTCTTTAATAGACTGGGTCA Mouse tcccagatcagttcgt---atcaggtacccacCACATTCAGAAGTCTTCTTTCTTGGATAGACCGGACCA Rat tccgggattagtctgt---atgaggtacccacCACACTCAGAAGTTTTCTTTCTTGGATAGACTTGATCA Dog tttctgattcgtttgtaacattgagtacctacCTCATCTAGTATCTTTCTTTCTTTAATAGACTGGGTTA * * * ** ** ** **** *** ** ** * ********* ****** * * Comparative Genomics Petersen graph parametrizes trees on 5 taxa.

  27. Trees are Ubiquitous in Biology Fig. 1. Y Chromosome of D. pseudoobscura Is Not Homologous to the Ancestral Drosophila Y Antonio Bernardo Carvalho and Andrew G. Clark, Science, January 7 2005.

  28. 1 5 2 4 3 1 5 4 2 3 1 4 2 5 3 1 3 2 4 5

  29. Conclusion Organ (liver) Algebra, discrete mathematics and statistics are relevant for genomics. Organ system (digestive) Tissue (liver sinusoid) Cell (hepatocyte) Organelle (nucleus) TAGAGACGGGGGTTTCACAATGTTGGCCA Molecule (DNA)

  30. Algebraic Statistics for Computational Biology Group Department of Mathematics, U.C. Berkeley http://math.berkeley.edu/~lpachter/ascb_book/ Photo courtesy of Robert Fisher Lawrence Hall of Science March 7th, 2005

More Related