1 / 43

CS369 Computational Biology

CS369 Computational Biology. University of Auckland Course coordinator: Alexei Drummond. CS369 Computational Biology. This course provides an overview of algorithms and scientific computing techniques in computational biology and bioinformatics.

Télécharger la présentation

CS369 Computational Biology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS369 Computational Biology University of Auckland Course coordinator: Alexei Drummond

  2. CS369 Computational Biology • This course provides an overview of algorithms and scientific computing techniques in computational biology and bioinformatics. • A hands-on introduction to topics including • Dynamic programming and string algorithms, • Markov models, Hidden Markov models, • Heuristic search algorithms, • Pairwise and multiple sequence alignment, • Phylogenetic Reconstruction • Gene finding and secondary structure prediction CS369 2007

  3. Your lecturers Alexei Drummond Coordinator Michael Dinneen Algorithmics Patricia Riddle Learning and pattern discovery CS369 2007

  4. Recommended Texts • Bioinformatics: Sequence and Genome analysis, 1st & 2nd Editions • by David W Mount (2001; 2004) • 1st Edition is available as an E-book in the library • 2nd edition is available on short term loan • Biological sequence analysis • by Durbin, Eddy, Krogh and Mitchinson (1998) • Available on short term loan • Algorithm Design • by Kleinberg J. and Tardos E (2006) CS369 2007

  5. Assessment • 2 x 2 hour labs (3rd April and 29th May) • 5% each • 1 assignment (1st May handout) • 20% • 1 Mid-semester test • 10% • 1 Exam • 60% CS369 2007

  6. Lecture schedule - first half CS369 2007

  7. Lecture schedule - second half CS369 2007

  8. What is computational biology? • Computers and biology? • the organization and the analysis of biological data with computers? • The new biology of the 21st century? CS369 2007

  9. What is bioinformatics? • Rapidly growing biological databases contain information about • Cellular and molecular biology • Ecology and Evolutionary biology • Microbiology • Genomics • Proteomics • The application of computational biology to understand this data. CS369 2007

  10. What is computational biology? • Computational biology combines the tools and techniques of • mathematics, • statistics, • computer science, • biology • in order to understand the masses of biological data CS369 2007

  11. Computational biology and the tree of life CS369 2007

  12. CS369 2007

  13. The human 18S gene CS369 2007

  14. The frog 18S gene CS369 2007

  15. Pairwise alignment of human and frog Humans and Frogs are 94% similar! CS369 2007

  16. Pairwise sequence alignment Sequences x = a c g g t s y = a w g c c t t Alignment x¢ = a – c g g – t s y¢ = a w – g c c t t CS369 2007

  17. Dynamic programming algorithm • computation is carried out bottom-up • store solutions to subproblems in a table • all possible subproblems solved once each, beginning with smallest subproblems • work up to original problem instance • only optimal solutions to subproblems are used to compute solution to problem at next level • DO NOT carry out computation in recursive, top-down manner • same subproblems would be solved many times CS369 2007

  18. Principle of Optimality Auckland Te Kuiti Wellington CS369 2007

  19. Scoring • Numeric score associated with each column • Total score = sum of column scores • Column types: • Identical (+ve) (2) Conservative (+ve) (3) Non-conservative (-ve) (4) Gap (-ve) x¢ = a – c g g– t s y¢ = a w – g cc t t CS369 2007

  20. Comparing many species This is called a “multiple sequence alignment” CS369 2007

  21. CS369 2007

  22. Tree of 3000 species CS369 2007

  23. A C G T T A C C G T A A G T G Molecular tracking of the Southern Saltmarsh Mosquito multiply extract decode align build tree CS369 2007

  24. Bioinformatics and genome evolution CS369 2007

  25. Computational biology and disease • Bacterial infection can causes serious illnesses, e.g. • Necrotising Fasciitis (“Flesh eating disease”) • Toxic Shock Syndrome (TSS) • (e.g. the illness Lana Cockroft got from the cut in her foot during “Celebrity Treasure Island”). T: Staphylococcus aureus B: Streptococcus pyogenes, CS369 2007

  26. Computational biology and Disease Necrotizing fasciitis CS369 2007

  27. Computational biology and Disease Identify protein structure of superantigens - so that potential drug targets can be uncovered. CS369 2007

  28. “Old” Technology 1977 Small Scale Slow Length: long ≈ 700bp New Technology: 454 2003 Large scale: Whole genome Fast Length: short ≈ 100bp http://www.454.com Computational biology and sequencing technology CS369 2007

  29. African Mammoth Indian Bring the mammoth back Using this new sequencing technology and bioinformatics to compare the sequences to the African elephant genome, scientists sequenced 28 million nucleotides of the mammoth genome from frozen bones! CS369 2007

  30. Where does the DNA come from? • Scientists Sequence DNA of Woolly Mammoth (extinct 10,000 years ago) • http://www.science.psu.edu/alert/Schuster12-2005.htm CS369 2007

  31. Conclusion • Computational biology and bioinformatics are fast becoming central to the biological sciences • Ecology • Evolutionary biology (tree of life) • Health and disease (flesh-eating bacteria) • Molecular biology and cell biology • Personalized medicine • Computational biology is heavily algorithmic and often statistical, drawing from the ‘hard-as-in-rock’ sciences like mathematics, computer science and statistics - this type of knowledge is becoming increasingly important in biology. CS369 2007

  32. Reading • Chapter 1 “Biological Sequence Analysis”, Durbin et al • Chapter 1 of Mount • Don’t worry about any biological terms that you don’t immediately understand! This chapter is more to give you a sense of where we are going to go. The pertinent details will be explained along the way. CS369 2007

  33. Biomolecules and the Central Dogma of Molecular Biology

  34. The central dogma of molecular biology Replication Transcription Translation DNA RNA Protein CS369 2007

  35. What is a macromolecule? • A chain molecule made up of a large number of repeating units. • Homo-polymers are made up of the same units • Hetero-polymers are made up of a number of different units • Nylon, Polyester, DNA (poly-nucleotide) and Protein (poly-amino acid) are examples • Macromolecular 3D structure is a question conformational change rather than breaking and forming strong bonds. CS369 2007

  36. Biomolecular sequences 5’-ACGATCGACTGGTATATCGATGCT-3’ DNA 5’-ACGAUCGACUGGUAUAUCGAUGCU-3’ RNA MFINRWLFSTNHKDIGTLYLLFGAW Protein CS369 2007

  37. DNA 5’-ACGATCGACTGGTATATCGATGCT-3’ 3’-TGCTAGCTGACCATATAGCTACGA-5’ A - T In nature DNA exists in a double helix of two complementary sequences. Watson-Crick Base pairing G - C CS369 2007

  38. RNA 5’-ACGAUCGACUGGUAUAUCGAUGCU-3’ RNA G - C 3 H bonds A - U 2 H bonds G - U 1 H bonds Secondary structure 3D structure CS369 2007

  39. Proteins • Proteins are hetero-polymers made up of 20 standard amino acids and are “coded for” by genes. • They form 3D molecular structures that do work in the cell. • An open problem is to determine structure computationally from the primary sequence of amino acids. A cartoon of the 3D structure of the myoglobin protein CS369 2007

  40. The genetic code Amino acids CS369 2007

  41. The genetic code CS369 2007

  42. CS369 2007

  43. CS369 2007

More Related