1 / 55

Functional Annotation & Comparative Genomics

Functional Annotation & Comparative Genomics. Lavanya Rishishwar February 26 th , 2014. Outline. Functional annotation What is functional annotation? Importance of functional annotation Approaches to functional annotation Pros/cons of available approaches Comparative genomics

linh
Télécharger la présentation

Functional Annotation & Comparative Genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Functional Annotation & Comparative Genomics Lavanya Rishishwar February 26th, 2014

  2. Outline Functional annotation • What is functional annotation? • Importance of functional annotation • Approaches to functional annotation • Pros/cons of available approaches Comparative genomics • What is comparative genomics? • Importance of comparative genomics • Approaches and tools

  3. Functional Annotation The ‘what?’

  4. Genome Assembly Assemble the Pieces Right

  5. Gene Prediction When on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers . Whenon board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers . Identify the words

  6. Functional Annotation Whenon board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers . nat·u·ral·ist [nach-er-uh-list, nach-ruh-] noun 1. a person who studies or is an expert in natural history, especially a zoologist or botanist. 2. an adherent of naturalism in literature or art. Origin: 1580–90; natural + -ist Identify the function (i.e., meaning) of each word DATABASES PROFILES Origin of Species, The noun ( On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life ) a treatise (1859) by Charles Darwin setting forth his theory of evolution.

  7. Comparative Genomics When on board RMS Titanic, as painter, I was much struck with certain facts in the distribution of the inhabitants of United Kingdom, and in the socioeconomicalrelations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of capitalism-that mystery of mysteries, as it has been called by one of our greatest philosophers . When on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers .

  8. Not just Newtonian The gravity of the annotation process

  9. Albert B, et al. (2002) Molecular biology of cell. New York: Garland Science. function “Ultimately, one wishes to determine how genes—and the proteins they encode—function in the intact organism.”

  10. Function? What is it? • To a cell biologist function might refer to the network of interactions in which the protein participates or to the location to a certain cellular compartment. • To a biochemist, function refers to the metabolic process in which a protein is involved or to the reaction catalyzed by an enzyme.

  11. Functional Annotation Functional annotation consists of attaching biological information to genomic elements. • Biochemical function • Biological function • Involved regulation and interactions • Expression

  12. What needs to be annotated? • Proteins – • Domain/Motifs • Signaling Peptide • Transmembrane region • Coding and non-coding RNAs • Operons

  13. Domain/Motif • Domain:A discrete structural unit that is assumed to fold independently of the rest of the protein and to have its own function.~20-100 aa • Motif:Are short, conserved regions and frequently are the most conserved regions of domains. Motifs are critical for the domain to function.

  14. Coding and non coding RNA’s Protein CodingEnzymesStructural Regulatory Signal TransductionReceptors ToxinsVirulence Factors Membrane/ TransmembraneNon Coding RiboswitchesCRISPRSrna's Pathway Prediction

  15. How Gene Performs Function? Operon • Operon: Several genes with related functions that are regulated together, because one piece of mRNA codes for several related proteins. • PolycistronicmRNA, mRNA coding for more than one polypeptide, is found only in prokaryotes

  16. An overview Approaches to Functional Annotation

  17. Approaches to functional annotation • Ab initioBased on intrinsic characteristics of gene/protein features • Signaling peptides (SignalP, LipoP) • Transmembrane domains (TMHMM) • Homology BasedInformation transfer from experimentally characterized system • BLAST • InterPro

  18. Ab initioapproaches • Fairly standard – TM and Signaling peptides have a distinct pattern of sequence composition • TM proteins are membrane bound receptors and channels that are of particular pharmacological relevance (therapeutic or vaccine target) • Signal peptides direct proteins to their proper cellular or extracellular location

  19. Homology based approaches • Significant sequence similarity implies homology or shared ancestry that often leads to shared function • Assumption – • Genes/proteins evolved to perform some function will retain that function • Deleterious mutations will be weeded out by purifying selection • Evolution is mostly dominated by divergence • Homology will thus entail a high chance of shared origin and function

  20. Homology based approaches • Databases: • NCBI • GenBank • RefSeq • EBI • SwissProt • UniProt • DDBJ • KEGG • Tools • BLAST • InterProScan • GO-based

  21. Databases

  22. Primary vs derivative sequence databases Genomes Assemblies PGAAP Curators Sequence Data GenBank RefSeq From Sequencing Labs Computational Algorithms UniGene

  23. Database choices • RefSeq, SwissProt and UniProt are all • Very reliable • High level of annotation • Minimal redundancy • Integration with other databases

  24. Gene Ontology Shulaev, V., Sargent, D. J., Crowhurst, R. N., Mockler, T. C., Folkerts, O., Delcher, A. L., ... & Salama, D. Y. (2010). The genome of woodland strawberry (Fragariavesca). Nature genetics, 43(2), 109-116.

  25. Analysis Tools - BLAST God help you if you do this here.

  26. Analysis Tools - InterProScan

  27. Analysis Tools - InterProScan

  28. Analysis Tools – GO Based • Blast2GO • GOMiner • …?

  29. Criteria for selecting methods • Currently being maintained • Applicable to Prokaryotic sequences • Could be installed locally (support batch jobs if GUI) OR Could be included in a pipeline i.e., have a command-line interface

  30. Gene naming • You need to have a clear logic and support for assigning names to the predicted proteins • A generally accepted scheme is as follows: • High confidence matches – function and annotation can be transferred • Multiple high confidence matches – assign a less specific name e.g. ABC transporter • Low confidence matches – assign function as putative • Match to a hypothetical protein – conserved hypothetical protein • No match in the database – hypothetical protein • How high is high? Depends on your data.

  31. Automated Pipelines Takes in whole genome assembly and spits out annotations. E.g.: • PGAAP – Prokaryotic Genome Automatic Annotation Pipeline • CG-Pipeline – Computational Genomics Pipeline • RAST – Rapid Annotation using subsystem technology • KEGG – Kyoto Encyclopedia of Genes and Genomes • And more comes out each year with specific focus and capability

  32. Choosing The Right Function Prediction Tool Caution!Pros and Cons of Conventional Approaches

  33. “Perutz et al. showed in 1960 that myoglobin and hemoglobin, the first two protein structures to be solved at atomic resolution using X-ray crystallography, have similar structures even though their sequences differ.”

  34. Pros and Cons: There are no free lunches! • Homology Useful but different from “same” function • Simply implies common ancestry Punta, M., & Ofran, Y. (2008). The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function. PLoS computational biology, 4(10), e1000160.

  35. Pros and Cons: There are no free lunches! Punta, M., & Ofran, Y. (2008). The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function. PLoS computational biology, 4(10), e1000160.

  36. Pros and Cons: There are no free lunches! • Again: Quality of prediction is as good as the quality of annotation of the database • Eukaryotic function predictor can not be used for Prokaryotes and vice versa • Building pan-genomes is a good strategy for finding more confident matches

  37. Comparative Genomics

  38. Comparative Genomics Ciccarelli, F. D., Doerks, T., Von Mering, C., Creevey, C. J., Snel, B., & Bork, P. (2006). Toward automatic reconstruction of a highly resolved tree of life.Science, 311(5765), 1283-1287.

  39. Comparative Genomics • In a nutshell – it’s comparing similarities and differences in genomes (proteins/genes/SNPs) of multiple organisms from same or different species. • Helps in answering – • Present: lifestyle - virulent vs avirulent; horizontally acquired segments • Past: Evolution

  40. Comparative Genomics • Biological questions of general interest: • Are there are rearrangements? • Is the region(s) of interest syntenic across species? • Are their gene gain/loss event leading to specific trait? • What organisms are more similar? What are most distant? • What factors confer virulence to the genome? • In our case: capsule switching? What, why and how?

  41. Comparative Genomics Genomic Rearrangement Darling, Aaron E., IstvánMiklós, and Mark A. Ragan. "Dynamics of genome rearrangement in bacterial populations." PLoS Genetics 4.7 (2008): e1000128.

  42. Comparative Genomics Synteny Krause, A., Ramakumar, A., Bartels, D., Battistoni, F., Bekel, T., Boch, J., ... & Goesmann, A. (2006). Complete genome of the mutualistic, N2-fixing grass endophyteAzoarcus sp. strain BH72. Nature biotechnology, 24(11).

  43. Comparative Genomics Horizontal Gene Transfer http://textbookofbacteriology.net/HorizontalTransfer.gif

  44. Comparative Genomics • You are going to hear more about your specific goals next week • Remember: The focus here is not about the tools but (1) identification of the biological question, (2) your approach to answering the question and (3) your results with interpretation

  45. Databases • As before – there are number of sequence databases available • You need to decide what subset of that database should you taking into consideration • For e.g.: what organism/serogroup/sequence type should your database be focused on? • If we are also looking for virulence factors - VFDB • If we are interested in pathways – KEGG, Pathway Tools

  46. Analysis Tools • Homology Based – BLAST, Protein Clusters, Pathway Analysis • Phylogenetics – MEGA, T-Coffee • Virulence - VFDB • Horizontal/Lateral Gene Transfer – Dark Horse, Alien Hunter, Phylogeny Based • Visualization

  47. Phylogenetic Analysis • There are a number of ways you can compare organisms/genomes: • 16S rRNA tree • MLST based methods • ANI based methods • All three can be visualized as a tree to assess the relatedness between the organisms • ANI has been shown to correlate well with DDH by Konstantinidis et al More traditional Konstantinidis, K. T., Ramette, A., & Tiedje, J. M. (2006). The bacterial species definition in the genomic era. Philosophical Transactions of the Royal Society B: Biological Sciences, 361(1475), 1929-1940. Goris, J., Konstantinidis, K. T., Klappenbach, J. A., Coenye, T., Vandamme, P., & Tiedje, J. M. (2007). DNA–DNA hybridization values and their relationship to whole-genome sequence similarities. International journal of systematic and evolutionary microbiology, 57(1), 81-91.

  48. Different phenotype, same evolutionary lineages • Phenotypic concordance need not support same ancestral lineage • At times it has been observed that species tend to gain certain set of mutations in same or different gene(s) which leads to the same phenotype • Acquiring antibiotic resistance is one such example • The investigation of such cases depends on a case-by-case manner with underlying reasons varying from SNPs, gene gain/loss, indels, plasmid uptake etc

  49. Visualization • It’s important than you think • Plethora of visualization tools are available today for various purposes • E.g.: • Circos • CGView • BRIG • Artemis • IGV • Mauve • VISTA, etc

More Related