750 likes | 954 Vues
BICH 489-500 - CACAO. Biocurator Training Session. Plan for tonight. Pre-assessment survey Syllabus Review Annotation synthesis Practice!. Mutualistic Relationship. We want you to get experience with: CRITICALLY reading scientific papers Bioinformatics resources
E N D
BICH 489-500 - CACAO Biocurator Training Session
Plan for tonight • Pre-assessment survey • Syllabus • Review • Annotation synthesis • Practice!
Mutualistic Relationship • We want you to get experience with: • CRITICALLY reading scientific papers • Bioinformatics resources • Collaborating with other biocurators • Synthesizing functional annotations • We want to get high quality functional annotations to contribute back to the GO Consortium and other biological databases
Growing need for functional annotations • Advances in DNA sequencing mean lots of new genomes & metagenomes
Growing need for high quality functional annotations • High quality annotations allow us to infer the function of genes • Which allows us to understand the capabilities of genomes and understand the patterns of gene expression
Classic MODel Literature Database Curators (rate limiting) Datasets
What does a functional annotation have to do with this course? • Process of attaching information from the scientific literature to proteins • CACAO will teach you to become a biocurator • you will be adding functional annotations to the biological database GONUTS (http://gowiki.tamu.edu)
How is CACAO scored? • Points for a complete annotation • GO term (right level of specificity) • Reference (paper) • Evidence code • Identify where in the paper the evidence is • Refinements used to steal points for incorrect &/or incomplete annotations • Identify a problem • Suggest correct alternative • Refinements can be entered by any team (including the original team)
How can you get the annotations required by Rubric #2? • Synthesize complete & correct annotations. • Correctly refine (challenge & correct) someone else’s annotation. • If your annotation gets challenged, offer the best correction.
Functional annotation with Gene Ontology • Controlled vocabulary with • Term identifiers • GO:0000075 • Name • cell cycle checkpoint • Definitions • "A point in the eukaryotic cell cycle where progress through the cycle can be halted until conditions are suitable for the cell to proceed to the next stage." [GOC:mah, ISBN:0815316194] • Relationships • is_a GO:0000074 ! regulation of progression through cell cycle • Terms arranged in a Directed Acyclic Graph (DAG)
Why use Ontologies? • Standardization • facilitate comparison across systems • facilitate computer based reasoning systems • Good for data mining! • leading functional annotation ontology = Gene Ontology (GO)
What is GO? Who is the GO Consortium (GOC)? • GO = ~30,000 terms for gene product attributes • Molecular Function (enzyme activity) • Biological Process (pathways) • Cellular Component (parts of the cell) • GO Consortium - set of biological databases that are involved in developing GO and contributing GO annotations
Cellular Component • where a gene product acts
Molecular Function • activities or “jobs” of a gene product glucose-6-phosphate isomerase activity figure from GO consortium presentations
Biological Process • a commonly recognized series of events cell division Figure from Nature Reviews Microbiology 6, 28-40 (January 2008)
Which subontology (MF, BP or CC) would the following terms fit in? GO:0001070 RNA binding transcription factor activity GO:0003677 DNA binding GO:0009254 Peptidoglycan turnover GO:0003918 DNA topoisomerase (ATP-hydrolyzing) activity GO:0006835 dicarboxylic acid transport GO:0009360 DNA polymerase III complex GO:0005694 Chromosome GO:0008270 Zinc ion binding GO:0000901 translation repressor activity, non-nucleic acid binding
Where can we find GO terms? GONUTS http://gowiki.tamu.edu
Search for GO terms on GONUTS http://gowiki.tamu.edu
CHICK - AgBase (Gallus gallus) • dictyBase - dictyBase (Dictyostelium discoideum - slime mold) • FB - FlyBase (Drosophila melanogaster) • HUMAN - Reactome, BHF-UCL • MGI - Mouse genome informatics (Mus musculus - house mouse) • SGD - Saccharomyces genome database (Saccharomyces cerevisiase - yeast) • TAIR - The Arabidopsis Informatics Resource (Arabidopsis thaliana) • WB - WormBase (Caenorhabditis elegans) • ZFIN - Zebrafish model organism database (Danio rerio)
What do you actually need once you have found the correct term? GO:0004713
Practice http://gowiki.tamu.edu 1. What is the GO term for GO:0004713? 2. What is the GO identifier for mitosis? 3. How many results (ballpark) do you get when you search for cell division using the Go, Search or G buttons? 4. How many child terms are there for plasma membrane? How many grandchildren? 5. What term is the parent of GO:0006825?
4 REQUIRED parts of EVERY GO annotation http://gowiki.tamu.edu/wiki/index.php/SGD:ADA2 GO ** I will cover this again!!
4 Required Parts of a GO annotation (cont) Evidence code
4 Required Parts of a GO annotation (cont) Reference Notes (about evidence)
2 other parts that may be required… Qualifier With/from
Where are we adding GO annotations? GONUTS http://gowiki.tamu.edu
What you must fill in (for every annotation) GO:0004713 PMID:1111 IDA: Inferred from direct assay Figure 2a
What you might also have to fill in Not sure? Check the competition guidelines. Ask a coach (Jim, Debby, Adrienne or usually me)!
What do we know so far? Questions? 1. You will be making functional (GO) annotations using GO terms. 2. You can search for GO terms on GONUTS. 3. You will be adding your GO annotations to GONUTS. 4. There are 4 required parts & 2 parts that may be required in a GO annotation. 5. You have to base your annotation on an experiment published in a scientific paper.
What can you annotate? • Proteins. • Any protein with a record in UniProt (Universal Protein Resource - http://uniprot.org) • How can you find proteins to annotate? • Think of ways to identify a protein or paper to annotate
Think • Consult your neighbor(s)
Choosing a protein to annotate 1. randomly 2. topics of interest (ie efflux pump proteins, biofilms, marine biology) 3. papers you have come across while doing other stuff 4. methods you know or want to learn 5. phenotypes and mutants you are interested in 6. by author 7. by pathway or regulon 8. suggested by another - high ratio of IEA:manual annotations in GONUTS - mentioned in another class 9. current paper mentions another gene product 10. review papers (ie Annual Reviews are excellent sources) 11. Uniprot, GONUTS, WikiPathways, PubMed searches 12. protein annotated by other teams 13. ask a coach
Finding a scientific paper on a certain protein • Has to be a scientific paper with experimental data in it. • Anything else is a valid reason to challenge! • PubMed, PubMed Central, GoogleScholar… • No review articles • no books, textbooks, wikipedia articles, class notes… • You will need the PMID number
Practice - searching PubMed http://pubmed.org • How many papers do you get when you search for “coli”? • How many of those papers are reviews? • What is the title of the oldest paper when you search for “coli AND RNA polymerase”? • How many results are there when you search for “GTPase activity and Gene Ontology”? • What is the PMID of the paper when you search for “Hu JC AND coli AND lysR AND 2010”?
Why do we annotate on GONUTS? • UniProt (Universal Protein Resource) will not let us annotate protein records on their site. • They are a professionally-curated & closed database. • GONUTS will. • GONUTS pulls the info from the UniProt record when it makes a page for you to edit.