810 likes | 1.46k Vues
Functional Annotation. Background + Strategy. The Group. Outline. What is Functional Annotation The I mportance of Functional Annotation The Biology of H . haemolyticus Background for Functional Annotation Pros/Cons of Available Approaches Planned Approach Breadth Depth . Outline.
E N D
Functional Annotation Background + Strategy The Group
Outline • What is Functional Annotation • The Importance of Functional Annotation • The Biology of H. haemolyticus • Background for Functional Annotation • Pros/Cons of Available Approaches • Planned Approach • Breadth • Depth
Outline • What is Functional Annotation • The Importance of Functional Annotation • The Biology of H. haemolyticus • Background for Functional Annotation • Pros/Cons of Available Approaches • Planned Approach • Breadth • Depth
Functional Annotation The ‘what?’
Genome Assembly Assemble the Pieces Right
Gene Prediction When on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers . Whenon board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers . Identify the words
Functional Annotation Whenon board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers . nat·u·ral·ist [nach-er-uh-list, nach-ruh-] noun 1. a person who studies or is an expert in natural history, especially a zoologist or botanist. 2. an adherent of naturalism in literature or art. Origin: 1580–90; natural + -ist Identify the function (i.e., meaning) of each word DATABASES PROFILES Origin of Species, The noun ( On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life ) a treatise (1859) by Charles Darwin setting forth his theory of evolution.
Outline • What is Functional Annotation • The Importance of Functional Annotation • The Biology of H. haemolyticus • Background for Functional Annotation • Pros/Cons of Available Approaches • Planned Approach • Breadth • Depth
Not just Newtonian The gravity of the annotation process
Albert B, et al. (2002) Molecular biology of cell. New York: Garland Science. function “Ultimately, one wishes to determine how genes—and the proteins they encode—function in the intact organism.”
Function? What is it? • To a cell biologist function might refer to the network of interactions in which the protein participates or to the location to a certain cellular compartment. • To a biochemist, function refers to the metabolic process in which a protein is involved or to the reaction catalyzed by an enzyme.
Functional Annotation Functional annotation consists of attaching biological information to genomic elements. • Biochemical function • Biological function • Involved regulation and interactions • Expression
Whatever happened to wet-lab? “Experimentally annotating one complete bacterial genome varies from organism to organism. Roughly speaking, it could take as much as $25,000 and a period of 6-12 months for completing the process” - Alejandro Caro
The Naked Truth No. of Genomes in KEGG KEGG Genome: Release Update of Jan 2012
How Gene Performs Function? Operon • Operon: Several genes with related functions that are regulated together, because one piece of mRNA codes for several related proteins. • Polycistronic mRNA,, mRNA coding for more than one polypeptide, is found only in prokaryotes
Coding and non coding RNA’s Protein CodingEnzymesStructural Regulatory Signal TransductionReceptors ToxinsVirulence Factors Membrane/ TransmembraneNon Coding RiboswitchesCRISPRSrna's Pathway Prediction
Domain/Motif • Domain:A discrete structural unit that is assumed to fold independently of the rest of the protein and to have its own function.~20-100 aa • Motif:Are short, conserved regions and frequently are the most conserved regions of domains. Motifs are critical for the domain to function.
Outline • What is Functional Annotation • The Importance of Functional Annotation • The Biology of H. haemolyticus • Background for Functional Annotation • Pros/Cons of Available Approaches • Planned Approach • Breadth • Depth
Understanding the Target Haemophilushaemolyticus - The Biography
Haemophilushaemolyticus • Gram-negative • Facultative anaerobe • Known to colonize the human respiratory tract. • Out of the 8 Haemophilus species found to colonize the respiratory tract, H. influenzaeand H. haemolyticusare the most prevalent ones. • H. haemolyticus is an emerging pathogen • 5 cases of invasive disease reported between 2009-10.
Strains of H. haemolyticus • fucK : ncodingfuculose-kinase. fucK deletion has been observed in some Hi isolates • Hpd: encoding a lipoprotein protein D,
Phylogeny NielsNørskov-Lauritsen, N., et al. (2005).Multilocus sequence phylogenetic study of the genus Haemophilus with description of Haemophiluspittmaniae sp. nov. International Journal of Systematic and Evolutionary Microbiology, 55, 449–456
Outline • What is Functional Annotation • The Importance of Functional Annotation • The Biology of H. haemolyticus • Background for Functional Annotation • Pros/Cons of Available Approaches • Planned Approach • Breadth • Depth
Ontology • An ontology is a "formal, explicit specification of a shared conceptualization“ • Two formal major ontology schemes: • EC – Enzyme Commission Number • GO – Gene Ontology
Enzyme Commission (EC) • A large scale comprehensive attempt to organize and classify enzymes according to its function • For inclusion in the list, direct experimental evidence is to be provided for its claimed activity • Organizes the list of enzymes in four levels of hierarchy, starting with the top most 6 classes: • Oxidoreductases • Transferases • Hydrolases • Lyases • Isomerases • Ligases
Chronology: Enzyme Commission (EC) • Cons of EC: • Hierarchy only provides parent to child relationship • Only specific to enzymes (doesn't cover all of the proteins)
Chronology: Gene Ontology (GO)Or in other words "give this protein a name and stick to it!!"
What is the GO? • Molecular Function • Biological Process • Cellular Component • Relations between the terms • ‘is_a’ • ‘part_of’, ‘has_part’ • ’regulates’
Structure of GO du Plessis L, Skunca N, DessimozC (2011). The what, where, how and why of gene ontology–a primer for bioinformaticians. Brief Bioinform. Doi: 10.1093/bib/bbr002
Where Do Annotations Come From? • Inferred from experiment • Most reliable • Base for computational method • Inferred from computational method • Sequence similarity, structural similarity, etc. • Inferred from author statement • Curator statement and Obsolete evidence codes
Why use the GO? • The ‘GO Consortium’ consists of a number of large databases working together to define standardized ontologies and provide annotations to the GO. • Search for interacting genes • Reason across the relations • Analyze the results of high-throughput experiment • Infer function of un-annotated genes and inter protein-protein interactions.
Outline • What is Functional Annotation • The Importance of Functional Annotation • The Biology of H. haemolyticus • Background for Functional Annotation • Pros/Cons of Available Approaches • Planned Approach • Breadth • Depth
Choosing The Right Function Prediction Tool Caution!Pros and Cons of Conventional Approaches
“Perutz et al. showed in 1960 that myoglobin and hemoglobin, the first two protein structures to be solved at atomic resolution using X-ray crystallography, have similar structures even though their sequences differ.”
Pros and Cons: There are no free lunches! • Homology Useful but different from “same” function • Simply implies common ancestry
Pros and Cons: There are no free lunches! • Quality of Prediction is as good as the quality of annotation of the database • Eukaryotic function predictor can not be used for Prokaryotes and vice versa
Outline • What is Functional Annotation • The Importance of Functional Annotation • The Biology of H. haemolyticus • Background for Functional Annotation • Pros/Cons of Available Approaches • Planned Approach • Breadth • Depth
A Snapshot of the Iceberg Named Functional Annotation Breadth and Depth of the analysis
Spectrum of Methods Selected Breadth
Criteria for selecting methods • Currently being maintained • Applicable to Prokaryotic sequences • Could be installed locally (support batch jobs if GUI) OR Could be included in a pipeline i.e., have a command-line interface
Categories of Approaches • Sequence similarity-based • Phylogenomics-based • Domain/pattern/profile - based • Domain-based • Pattern-based • Profile-based • Sequence clustering-based • Machine learning-based • Network-based
Level 1 The building blocks!
PanGenomeAnalysis • PanGeome is the full complement of genes in a species. • It includes core genome which is a set of genes that are present in all strains, dispensable genome that are genes present in 2 or more strains and unique genes which are unique to specific strains. • In this case, we will be using pangeome of Haemophilusinfluenzae. • This database will be used as the reference database in BLAST. • This method gives high confidence annotations since the strains selected are very closely related to the organism in question.
BLAST: How it works? • Divide a query sequence into short chunks called words, • Look for exact matches • in case of hit try extending the alignment