1 / 33

Modeling Functional Genomics Datasets CVM8890-101

Modeling Functional Genomics Datasets CVM8890-101. Lesson 2 13 June 2007 Teresia Buza. Lesson 2: Introduction to functional annotation. Orthologs and homologs; clusters of orthologous genes (COGs) and the gene ontology (GO); and how to find what functional annotation is available.

neka
Télécharger la présentation

Modeling Functional Genomics Datasets CVM8890-101

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Functional Genomics DatasetsCVM8890-101 Lesson 2 13 June 2007 Teresia Buza

  2. Lesson 2: Introduction to functional annotation. Orthologs and homologs; clusters of orthologous genes (COGs) and the gene ontology (GO); and how to find what functional annotation is available.

  3. Introduction to Functional • Annotation

  4. Where are we? Central Dogma New technology Genomic hypothesis Genome Gene ATGTCCTATCCATGTCGTACAGATTGACGAGAT Genome sequencing mRNA transcript Transcriptome Transcript profiling Protein Proteome Protein quantification What is all this? Structural annotation What next? Functional annotation

  5. Genome Annotation • Biologists refer to both the annotation of the genome and functional annotation of gene products: • “Structural” Annotation • & • “Functional” Annotation

  6. Structural & Functional Annotation • Structural annotation • Identification of genomic elements. • ORFs predicted during genome assembly • Location of ORFs • Gene structure • Coding regions • Location of regulatory motifs etc • Functional annotation • Attaching biological information to genomic elements. • Biochemical function • Biological function • Involved regulation and interactions • Expression etc • These steps may involve both biological experiments and in • silico analysis. http://en.wikipedia.org/wiki/Genome_annotation#Genome_annotation (with modifications)

  7. Why Functional Annotation? Enables you to take large “laundry lists” of genes/proteins and turn them into a biologically useful model

  8. Functional Annotation • Annotation of gene products = Gene Ontology (GO) annotation • Initially, predicted ORFs have no functional literature and GO annotation relies on computational methods (rapid but ?Quantity vs Quality) • Functional literature exists for many genes/proteins prior to genome sequencing (slow but provide high quality annotations) • GO annotation does not rely on a completed genome sequence!

  9. Types of Functional annotation • Based in direct experimental evidence of function • Experiments in the same ORGANISM example: • Enzyme assays • Binding experiments • Pathway analysis • Synthetic lethals • Functional complementation • Gene mutations • RNAi • 2-hybrid interactions etc • Indirect Evidence of function • Expression analysis • Structure analysis • Sequence analysis

  10. Functional Annotation • Problem: • Many genes/proteins have no annotation • Some have unknown functions • Challenge: • We want to get the maximum functional annotation for modeling our data • Solution: • Read papers (Pubmed etc) • Search for homologs/orthologs of known function • Homologs and orthologs help assign function….

  11. 2. Finding Function: orthologs and homologs

  12. What are Homologs, Orthologs, Paralogs? Homolog Is a relationship between genes separated by the event of speciation or genetic duplication Ortholog Orthologs are homologous genes in different species that evolved from a common ancestor gene by speciation. Normally (not always), orthologs retain the same function in the course of evolution. Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes. Paralog Paralogs are homologous genes related by duplication within a genome. Paralogs evolve new functions, even if these are related to the original one. http://homepage.usask.ca/~ctl271/857/def_homolog.shtml

  13. Orthologs & Paralogs orthologs Paralogs http://www.ensembl.org/info/data/compara/tree_example1.jpg

  14. How to search for Orthology? • BLAST : http://www.ncbi.nlm.nih.gov/BLAST/ • Sequence alignment search tool • Utilizes heuristic algorithm • MPsrch: http://www.ebi.ac.uk/MPsrch/ • Sequence comparison tool • Implement Smith & Waterman algorithm • Utilizes exhaustive algorithm • Domain analysis: http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml • Analysis of regions of sequence homology among sets of proteins that are not all full- • length homologs. • Homology domains often, but not always, correspond to recognizable protein folding • domains • Protein family databases (e.g. COGs & KOGs) • Superfamily: Complete set of proteins having sequence homology over essentially their • full length. • Subfamilies: Incomplete set of homologous proteins which yet encompass proteins of • diverse function

  15. Systems for Functional Annotation 1. Clusters of Orthologous Groups (COGs)  Prokaryotes 2. euKaryote Orthologous Groups (KOGs)  Eukaryotes 3. Gene Ontology (GO)

  16. COGs & KOGs • Both are based on orthology. • Genes are assigned to broad categories (A-Z) • Each category corresponds to an ancient conserved domain • COGs - prokaryotes • KOGs - eukaryotes

  17. Clusters of Orthologous Groups (COGs) http://www.ncbi.nlm.nih.gov/COG/ Text search: COGs has 25 functional categories (A – Z) in four broad groups • Information storage and processing • Cellular processes and signaling • Metabolism • Poorly characterized

  18. COGs Categories INFORMATION STORAGE AND PROCESSING [J] Translation, ribosomal structure and biogenesis [A] RNA processing and modification [K] Transcription [L] Replication, recombination and repair [B] Chromatin structure and dynamics CELLULAR PROCESSES AND SIGNALING [D] Cell cycle control, cell division, chromosome partitioning [Y] Nuclear structure [V] Defense mechanisms [T] Signal transduction mechanisms [M] Cell wall/membrane/envelope biogenesis [N] Cell motility [Z] Cytoskeleton [W] Extracellular structures [U] Intracellular trafficking, secretion, and vesicular transport [O] Posttranslational modification, protein turnover, chaperones ftp://ftp.ncbi.nih.gov/pub/COG/COG/fun.txt

  19. COGs Categories METABOLISM [C] Energy production and conversion [G] Carbohydrate transport and metabolism [E] Amino acid transport and metabolism [F] Nucleotide transport and metabolism [H] Coenzyme transport and metabolism [I] Lipid transport and metabolism [P] Inorganic ion transport and metabolism [Q] Secondary metabolites biosynthesis, transport and catabolism POORLY CHARACTERIZED [R] General function prediction only [S] Function unknown ftp://ftp.ncbi.nih.gov/pub/COG/COG/fun.txt

  20. Example 1 Classification of COGs by functional categories Tatusov et al., 2000: The COG database: a tool for genome-scale analysis of protein functions and evolution

  21. Example 2 40 35 30 AMX 25 20 15 10 5 0 40 35 30 CTC 25 Decrease 20 Increase 15 10 5 0 40 35 30 25 ENR 20 15 10 5 0 - C D E F G H I J K L M N O P Q R S T U V COG categories Effects of Antibiotics on Pasteurella multocida transcriptome Nanduri et al 2006

  22. The Gene Ontology (GO) • The Gene Ontology (GO) is the de facto Standard for functional annotation • GO functional annotation is based on orthology AND direct experimental evidence • GO terms allow much more detailed functional analysis (> 24,000 terms) than COGs & KOGs (25 broad terms) • GO is a controlled vocabulary of terms split into three related ontologies covering basic areas of molecular biology: • molecular function: 8,123 terms • biological process: 13,960 terms • cellular component: 2,071 terms GO Report 2007- 04

  23. Example 3 Functional Annotation of Chicken Proteomic data Cellular Component

  24. Use GO for……. • Modeling function in high-throughput datasets (arrays!) started by Fly, Yeast, Mouse (Ashburner et al 2000, 2001) • Grouping gene products by biological function • Determining which classes of gene products are over-represented or under-represented • Focusing on particular biological pathways and functions (hypothesis-driven) • Relating a protein’s location to its function

  25. Annotating to the GO • Need to show type of evidence of function • Literature curation: read and interpret reviewed literature (IDA, IGI, IMP, IPI, IGC) (TAS, NAS) • Computational analysis (RCA, ISS, IEA) http://www.geneontology.org/GO.evidence.shtml

  26. 4. How to find functional annotation for your species

  27. How to find functional annotation • For quick search you need to know: • Name of your species (e.g Sus scrofa, Aspergillus flavus) • Taxonomy ID (e.g 9823 – S. scrofa, 5059 – A. flavus etc) • Database to look in (e.g. NCBI, Uniprot, EBI-GOA, GOC, AgBase etc) • Not all functional annotation for a species will be in one database! • Not very many species have a broad coverage of GO annotation… BUT do not worry • Search for their homologs might help • May rely on manual annotation from literature (Refer Manual annotation Course on by Fiona McCarthy)

  28. Are the genes/proteins in GenBank? Check by Taxon ID Functional annotation Yes No Known? NM_, NP_ Annotate by structural/sequence similarity  ORTHOLOGS (ISS code) Annotate by structural/sequence similarity  ORTHOLOGS (ISS code) Annotate by structural/sequence similarity  ORTHOLOGS (ISS code) UniParc/IPI Yes No GO No GO No GO Manual annotations from literature (IDA, IMP, IPI, IGI, IEP codes) Manual annotations from literature (IDA, IMP, IPI, IGI, IEP codes) Manual annotations from literature (IDA, IMP, IPI, IGI, IEP codes) UniProtKB GOA make GO annotations (IEA) using automated methods Fill in GO association file GOA collect all GO annotations & submit to GOC Submit to AgBase (Agricultural Species) • GOC maintain annotation files • unfiltered GOA • filtered GOA GOA maintain annotation file AgBase maintains annotation file

  29. Demonstration

More Related