1 / 57

BCB 444/544

BCB 444/544. Lecture 33 Genomics #33_Nov09. Required Reading ( before lecture). √ Mon Nov 5 - Lecture 31 Phylogenetics – Parsimony and ML Chp 11 - pp 142 – 169 √ Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33 Functional and Comparative Genomics

tamal
Télécharger la présentation

BCB 444/544

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BCB 444/544 Lecture 33 Genomics #33_Nov09 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  2. Required Reading (before lecture) √ Mon Nov 5 - Lecture 31 Phylogenetics – Parsimony and ML • Chp 11 - pp 142 – 169 √ Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33 Functional and Comparative Genomics • Chp 17 and Chp 18 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  3. Assignments & Announcements Fri Nov 9 - HW#6 (will be posted this weekend) HW#6 - More fun with Machine Learning!! Due: Fri Nov 16 (or sometime before Mon Nov 26) BCB 444/544 F07 ISU Dobbs#33 - Genomics

  4. Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html • Nov 7 Wed - BBMB Seminar 4:10 in 1414 MBB • Sharon Roth Dent MD Anderson Cancer Center • Role of chromatin and chromatin modifying proteins in regulating gene expression • Nov 8 Thurs - BBMB Seminar 4:10 in 1414 MBB • Jianzhi George Zhang U. Michigan • Evolution of new functions for proteins • Nov 9 Fri - BCB Faculty Seminar2:10 in 102 SciI • Amy AndreottiISU • T cell signaling: insights from protein NMR spectroscopy BCB 444/544 F07 ISU Dobbs#33 - Genomics

  5. Chp 11 – Phylogenetic Tree Construction Methods and Programs SECTION IV MOLECULAR PHYLOGENETICS Xiong: Chp 11 Phylogenetic Tree Construction Methods and Programs • Distance-Based Methods • Character-Based Methods • Phylogenetic Tree Evaluation • Phylogenetic Programs BCB 444/544 F07 ISU Dobbs#33 - Genomics

  6. Machine Learning • What is learning? • What is machine learning? • Learning algorithms • Machine learning applied to bioinformatics and computational biology • Some slides adapted from Dr. Vasant Honavar and Dr. Byron Olson BCB 444/544 F07 ISU Dobbs#33 - Genomics

  7. Examples of Machine Learning Algorithms • Naïve Bayes (NB) • Bayes Theorem • Neural network (NN) or Artificial Neural Net (ANN) • Perceptrons • Support Vector Machine (SVM) • Kernel functions Lab - WEKA: Decision Trees (DT), NB, SVM BCB 444/544 F07 ISU Dobbs#33 - Genomics

  8. An Application: Predicting RNA Binding Sites in Proteins • Problem: Given an amino acid sequence, classify each residue as RNA binding or non-RNA binding • Input to the classifier is a string of amino acid identities • Output from the classifier is a class label, either binding or not BCB 444/544 F07 ISU Dobbs#33 - Genomics

  9. Bayes Theorem Applied to RNA Binding Site Prediction BCB 444/544 F07 ISU Dobbs#33 - Genomics

  10. Naïve Bayes for Binary Classification Assign c = 1 if Otherwise, assign c = 0 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  11. Example: Is ARG 6 RNA-binding or not? ARG 6 T S K K K R Q R G S R p(X1 = T | c = 1) p(X2 = S | c = 1) … ≥ θ p(X1 = T | c = 0) p(X2 = S | c = 0) … BCB 444/544 F07 ISU Dobbs#33 - Genomics

  12. Predicted vs Actual RNA Binding for Ribosomal protein L15 (PDB ID 1JJ2:K) Predicted Actual BCB 444/544 F07 ISU Dobbs#33 - Genomics

  13. Artificial Neural Networks (ANNs or NNs) • Neural networks - classify “input vectors” or “examples” into categories (2 or more) • They are loosely based on biological neurons • Some of most successful methods for predicting secondary structure are based on neural networks: • Neural networks are trained to recognize amino acid patterns corresponding to known secondary structure elements; these patterns are used to predict secondary structure type for aa sequences in proteins of unknown structure BCB 444/544 F07 ISU Dobbs#33 - Genomics

  14. Biological Neurons “Sum” Input Signals & Generate Output Signal Dendrites receive inputs, Axon sends output Image from Christos Stergiou and Dimitrios Siganos http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html BCB 444/544 F07 ISU Dobbs#33 - Genomics

  15. Simple Neuron = “Perceptron” Perceptron is “Simplest ANN” = feed-forward NN = linear classifier Image from Christos Stergiou and Dimitrios Siganos http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html BCB 444/544 F07 ISU Dobbs#33 - Genomics

  16. The Perceptron X1 w1 T w2 X2 wN XN Input X Weights W Summation S Threshold T Output F Perceptroncombines input vectorsX1…N , compares “sum” S with a threshold T, and generates output class label: either 1 or 0 If weights W and threshold T are not known in advance, the perceptron must be trained.Ideally, perceptron is trained to return correct answer for all training examples, and perform well on test examples it has never seen. Training set must contain both classes of data (i.e.. with “1” and “0” output). BCB 444/544 F07 ISU Dobbs#33 - Genomics

  17. 1 1/2 0 0 Perceptron “Sums” Inputs by Computing Dot Product S = XW • Input is a vector X; Weight is are another vector W • Perceptron Summation S computes the dot product, S = XW • Perceptron Output F is a function of S: it is often discrete (1 or 0), in which case the function is a step function • For continuous output, a sigmoidal function is often used: BCB 444/544 F07 ISU Dobbs#33 - Genomics

  18. Training a Perceptron Find the weights W that minimize the error function E: P: number of training examples Xi: training vectors F(WXi): output of perceptron t(Xi) : target value for Xi Use steepest descent: - compute gradient: - update weight vector: - iterate (: learning rate) BCB 444/544 F07 ISU Dobbs#33 - Genomics

  19. Artificial Neural Network (ANN) • Artificial neural network • Set of perceptrons • interconnected such that • outputs of some units become inputs of other units • Many topologies are possible! • Can have multiple layers Neural networks are trained in same way perceptrons are trained, by minimizing an error function: BCB 444/544 F07 ISU Dobbs#33 - Genomics

  20. Support Vector Machines - SVMs Image from http://en.wikipedia.org/wiki/Support_vector_machine BCB 444/544 F07 ISU Dobbs#33 - Genomics

  21. SVM Finds Maximum-Margin Hyperplane(i.e., hyperplane that provides maximum separation between two classes of instances in dataset) Image from http://en.wikipedia.org/wiki/Support_vector_machine BCB 444/544 F07 ISU Dobbs#33 - Genomics

  22. Kernel “Trick” BCB 444/544 F07 ISU Dobbs#33 - Genomics

  23. Kernel Function BCB 444/544 F07 ISU Dobbs#33 - Genomics

  24. Take Home Messages • Must consider how to set up the learning problem (supervised or unsupervised, generative or discriminative, classification or regression, etc.) • Lots of algorithms out there • No algorithm performs best on all problems BCB 444/544 F07 ISU Dobbs#33 - Genomics

  25. Genomics - for excellent overview lectures, see these posted by NHGRI & Pevsner: 1- Genomic sequencing Mapping and Sequencing CTGA2005Lecture1.pdf Eric Green, NHGRI 2- Human genome project The Human Genome 2005-10-19_ch17.pdf Jonathan Pevsner, Kennedy Krieger Institute 3- SNPs Studying Genetic Variation II: Computational Techniques Jim Mullikin, NHGRITGA2005Lecture13.pdf 4- Comparative Genomics Comparative Sequence Analysis Elliott Margulies, NHGRI CTGA2005Lecture8.pdf BCB 444/544 F07 ISU Dobbs#33 - Genomics

  26. 1- Genomic sequencingMany thanks to:Eric Green, NHGRI for the following slides extracted from his lecture on:Mapping and SequencingCTGA2005Lecture1.pdf BCB 444/544 F07 ISU Dobbs#33 - Genomics

  27. Genomic Sequencing - Brief Review E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  28. Comparison of Sequenced Genome Sizes E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  29. Comparison of Genetic & Physical Maps E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  30. STSs: Provide common markers for "linking" genetic & physical maps E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  31. With complete genomes (now), why bother to generate physical maps? E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  32. Genomic sequencing requires assembly of sequences obtained from cloned DNA E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  33. Human Genome Sequencing • Two approaches: • Public (government) - International Consortium • (6 countries, NIH-funded in US) • "Hierarchical" cloning & BAC-by-BAC sequencing • Map-based assembly • Private (industry) - Celera (Craig Venter) • Whole genome random "shotgun" sequencing • Computational assembly • (took advantage of public maps & sequences,too) • Guess which human genome Celera sequenced? BCB 444/544 F07 ISU Dobbs#33 - Genomics

  34. NIH: "Hierarchical" BAC-by-BAC Sequencing E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  35. "Hierarchical" Subcloning Strategy E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  36. Celera: Whole-Genome "Shotgun" Sequencing E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  37. "Shotgun" Sequencing Stategy E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  38. Either Strategy: Sequence "Finishing" = Hardest part !! E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  39. Advances in DNA Sequencing Technology E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  40. Sequencing Method #1: Gilbert-Maxim "Chemical Degradation" E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  41. Sequencing Method #2: Sanger "Di-deoxy Chain Termination" E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  42. Automated Sequencing for Genome Projects: Sanger method - with improvements Another “recent” improvement: rapid & high resolution separation of fragments in capillaries instead of gels (E Yeung,Ames Lab, ISU) E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  43. Recent technologies? Pyro- & 454 Sequencing BCB 444/544 F07 ISU Dobbs#33 - Genomics

  44. 1st Eukaryotic Genome Sequence: S. cerevisiae E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  45. 1st Animal Genome Sequence: C. elegans E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  46. Timetable for Human Genome Sequencing: Faster than expected! E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  47. 1st Draft Human Genome: ”Complete" in 2001 E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  48. Public Sequencing - International Consortium E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  49. "Finishing" the Human Genome - continues… E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

  50. After "Complete" Human Genome Sequence What next? E Green 2005 BCB 444/544 F07 ISU Dobbs#33 - Genomics

More Related