1 / 75

BICH 489-500 - CACAO

BICH 489-500 - CACAO. Biocurator Training Session. Plan for tonight. Pre-assessment survey Syllabus Review Annotation synthesis Practice!. Mutualistic Relationship. We want you to get experience with: CRITICALLY reading scientific papers Bioinformatics resources

tamas
Télécharger la présentation

BICH 489-500 - CACAO

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BICH 489-500 - CACAO Biocurator Training Session

  2. Plan for tonight • Pre-assessment survey • Syllabus • Review • Annotation synthesis • Practice!

  3. Mutualistic Relationship • We want you to get experience with: • CRITICALLY reading scientific papers • Bioinformatics resources • Collaborating with other biocurators • Synthesizing functional annotations • We want to get high quality functional annotations to contribute back to the GO Consortium and other biological databases

  4. Growing need for functional annotations • Advances in DNA sequencing mean lots of new genomes & metagenomes

  5. Growing need for high quality functional annotations • High quality annotations allow us to infer the function of genes • Which allows us to understand the capabilities of genomes and understand the patterns of gene expression

  6. Classic MODel Literature Database Curators (rate limiting) Datasets

  7. What does a functional annotation have to do with this course? • Process of attaching information from the scientific literature to proteins • CACAO will teach you to become a biocurator • you will be adding functional annotations to the biological database GONUTS (http://gowiki.tamu.edu)

  8. How is CACAO scored? • Points for a complete annotation • GO term (right level of specificity) • Reference (paper) • Evidence code • Identify where in the paper the evidence is • Refinements used to steal points for incorrect &/or incomplete annotations • Identify a problem • Suggest correct alternative • Refinements can be entered by any team (including the original team)

  9. How can you get the annotations required by Rubric #2? • Synthesize complete & correct annotations. • Correctly refine (challenge & correct) someone else’s annotation. • If your annotation gets challenged, offer the best correction.

  10. Functional annotation with Gene Ontology • Controlled vocabulary with • Term identifiers • GO:0000075 • Name • cell cycle checkpoint • Definitions • "A point in the eukaryotic cell cycle where progress through the cycle can be halted until conditions are suitable for the cell to proceed to the next stage." [GOC:mah, ISBN:0815316194] • Relationships • is_a GO:0000074 ! regulation of progression through cell cycle • Terms arranged in a Directed Acyclic Graph (DAG)

  11. Why use Ontologies? • Standardization • facilitate comparison across systems • facilitate computer based reasoning systems • Good for data mining! • leading functional annotation ontology = Gene Ontology (GO)

  12. What is GO? Who is the GO Consortium (GOC)? • GO = ~30,000 terms for gene product attributes • Molecular Function (enzyme activity) • Biological Process (pathways) • Cellular Component (parts of the cell) • GO Consortium - set of biological databases that are involved in developing GO and contributing GO annotations

  13. Cellular Component • where a gene product acts

  14. Molecular Function • activities or “jobs” of a gene product glucose-6-phosphate isomerase activity figure from GO consortium presentations

  15. Biological Process • a commonly recognized series of events cell division Figure from Nature Reviews Microbiology 6, 28-40 (January 2008)

  16. Which subontology (MF, BP or CC) would the following terms fit in? GO:0001070 RNA binding transcription factor activity GO:0003677 DNA binding GO:0009254 Peptidoglycan turnover GO:0003918 DNA topoisomerase (ATP-hydrolyzing) activity GO:0006835 dicarboxylic acid transport GO:0009360 DNA polymerase III complex GO:0005694 Chromosome GO:0008270 Zinc ion binding GO:0000901 translation repressor activity, non-nucleic acid binding

  17. Part 1: Using GONUTS

  18. Where can we find GO terms? GONUTS http://gowiki.tamu.edu

  19. Search for GO terms on GONUTS http://gowiki.tamu.edu

  20. CHICK - AgBase (Gallus gallus) • dictyBase - dictyBase (Dictyostelium discoideum - slime mold) • FB - FlyBase (Drosophila melanogaster) • HUMAN - Reactome, BHF-UCL • MGI - Mouse genome informatics (Mus musculus - house mouse) • SGD - Saccharomyces genome database (Saccharomyces cerevisiase - yeast) • TAIR - The Arabidopsis Informatics Resource (Arabidopsis thaliana) • WB - WormBase (Caenorhabditis elegans) • ZFIN - Zebrafish model organism database (Danio rerio)

  21. What do you actually need once you have found the correct term? GO:0004713

  22. Practice http://gowiki.tamu.edu 1. What is the GO term for GO:0004713? 2. What is the GO identifier for mitosis? 3. How many results (ballpark) do you get when you search for cell division using the Go, Search or G buttons? 4. How many child terms are there for plasma membrane? How many grandchildren? 5. What term is the parent of GO:0006825?

  23. What does a GO annotation consist of?

  24. 4 REQUIRED parts of EVERY GO annotation http://gowiki.tamu.edu/wiki/index.php/SGD:ADA2 GO ** I will cover this again!!

  25. 4 Required Parts of a GO annotation (cont) Evidence code

  26. 4 Required Parts of a GO annotation (cont) Reference Notes (about evidence)

  27. 2 other parts that may be required… Qualifier With/from

  28. Where are we adding GO annotations? GONUTS http://gowiki.tamu.edu

  29. Where do you add an annotation? Add a row in the table.

  30. What you must fill in (for every annotation) GO:0004713 PMID:1111 IDA: Inferred from direct assay Figure 2a

  31. What you might also have to fill in Not sure? Check the competition guidelines. Ask a coach (Jim, Debby, Adrienne or usually me)!

  32. What do we know so far? Questions? 1. You will be making functional (GO) annotations using GO terms. 2. You can search for GO terms on GONUTS. 3. You will be adding your GO annotations to GONUTS. 4. There are 4 required parts & 2 parts that may be required in a GO annotation. 5. You have to base your annotation on an experiment published in a scientific paper.

  33. Part 2: Finding proteins to annotate

  34. What can you annotate? • Proteins. • Any protein with a record in UniProt (Universal Protein Resource - http://uniprot.org) • How can you find proteins to annotate? • Think of ways to identify a protein or paper to annotate

  35. Think • Consult your neighbor(s)

  36. Choosing a protein to annotate 1. randomly 2. topics of interest (ie efflux pump proteins, biofilms, marine biology) 3. papers you have come across while doing other stuff 4. methods you know or want to learn 5. phenotypes and mutants you are interested in 6. by author 7. by pathway or regulon 8. suggested by another - high ratio of IEA:manual annotations in GONUTS - mentioned in another class 9. current paper mentions another gene product 10. review papers (ie Annual Reviews are excellent sources) 11. Uniprot, GONUTS, WikiPathways, PubMed searches 12. protein annotated by other teams 13. ask a coach

  37. Finding a scientific paper on a certain protein • Has to be a scientific paper with experimental data in it. • Anything else is a valid reason to challenge! • PubMed, PubMed Central, GoogleScholar… • No review articles • no books, textbooks, wikipedia articles, class notes… • You will need the PMID number

  38. Practice - searching PubMed http://pubmed.org • How many papers do you get when you search for “coli”? • How many of those papers are reviews? • What is the title of the oldest paper when you search for “coli AND RNA polymerase”? • How many results are there when you search for “GTPase activity and Gene Ontology”? • What is the PMID of the paper when you search for “Hu JC AND coli AND lysR AND 2010”?

  39. Why do we annotate on GONUTS? • UniProt (Universal Protein Resource) will not let us annotate protein records on their site. • They are a professionally-curated & closed database. • GONUTS will. • GONUTS pulls the info from the UniProt record when it makes a page for you to edit.

More Related