1 / 24

Introduction to Genomics and Bioinformatics

Introduction to Genomics and Bioinformatics. Maureen J. Donlin Departments of Molecular Microbiology & Immunology Biochemistry & Molecular Biology donlinmj@slu.edu 6/3/2014. Goals for the course. Finding and using publicly available datasets and tools for genomics and bioinformatics

sahkyo
Télécharger la présentation

Introduction to Genomics and Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Genomics and Bioinformatics Maureen J. Donlin Departments of Molecular Microbiology & Immunology Biochemistry & Molecular Biology donlinmj@slu.edu 6/3/2014

  2. Goals for the course • Finding and using publicly available datasets and tools for genomics and bioinformatics • Utilize these tool and datasets in your research • Interpret the output from various analysis and prediction programs • Learn to write a results section for a manuscript

  3. Exercise format • Each exercise will consist of 2-4 sections which represent a biological question to be answered with bioinformatics tools/resources • You’ll provide the answer in the same format as you would write for the results section of a paper • Why did you do this experiment or analysis? • What did you actually do? • What did you observe? • What does it mean?

  4. Grading • Grading: • Exercises 70 % • Final exam 20 % • Class attendance 10 % • Grading policy handout • Details about late assignment and tests

  5. Logistics • Course website: • http://biochem.slu.edu/bchm628/ • Contact: • Phone: 977-8858 • Email: donlinmj@slu.edu • Office – DRC 507 • Call or email. • Usually at WashU on Wednesdays

  6. Lecture outline • Overview of theme for this course • Large datasets = long lists of genes • How to interrogate gene lists using publicly available data • Introduction to sequence databases • Quality control and annotation

  7. Host-pathogen interactions in model organisms • Caenorhabditis elegans will be the model organism • Various bacteria (S. aureus) and fungi (C. albicans) will be the pathogens • Examine data from microarrays, RNA sequencing and proteomic studies • Use various public databases and tools to interrogate and analyze the data

  8. Aspects of host-pathogen interactions • Pathogen virulence factors • High-throughput expression analysis of pathogens during infection • Genetic differences between closely related species that differ in their ability to infect & kill C. elegans • Host innate immune response • High-throughput expression analysis of host during infection • Comparison of host response to different pathogens • Factors that mediate infection • Screen for pathogen & host factors that affect virulence and susceptibility to infection

  9. Types of worm killing Disease Models & Mech. (2008) 1:205 CurrOpinMicrobiol. (2008) 11:251 App. & Env. Microbiol. (2012) 78:2075

  10. Dataset 1: Response to fungal infection • “Candida albicansInfection of Caenorhabditis elegans Induces Antifungal Immune Defenses”Pukkila-Worley R., Ausubel FM and Mylonakis E(2011) PLoS Pathogens 7:e1002074 PMID: 21731485 • Study innate immune response to C. albicans in a model host • Live yeast establish intestinal infection but heat-killed yeast are avirulent • Identified 313 genes differentially expressed (DE or DEG) with infection by C. albicans • 56% of those genes were also DE with heat-killed yeast • Not much overlap with genes DE in response to S. aureusor P. aeruginosa

  11. Starting point for Exercise 1 • Supplementary table 3 which lists the >300 genes DE in response to C. albicans and also gives the overlap with the heat killed C. ablicans • Goals are to use NCBI to find information about a few genes from the list • Use Excel to bring in additional data into your list of genes

  12. Biological Databases • DNA -> RNA -> Protein • DNA archives – genomes, ESTs (Genbank/EMBL) • Annotated mRNAs/Genes (Gene) • RNA (miRNAs, snoRNA, structures) • Protein databases • Automated translation (GenPept/TrEMBL) • Curated (Uniprot) • Structures (PDB)

  13. Biological databases • Store data in a form that allows users to search and retrieve • Use defined relationships between data to allow finding related records • Genome linked to genes • Genes linked to transcript isoforms • Each transcript linked to encoded protein • Genbank records include all cross-database records as active links

  14. Quality control and annotation • Genbank – an archive • Users submit data and own exclusive rights for all updates to those records • All submissions reviewed/approved by NCBI

  15. Growth of Genbank

  16. NCBI

  17. Gene annotation • Assign or define: • Gene name • Gene structure • Molecular Function • Biological process • Cellular component • Ect…. • Ideally, this data is known experimentally • Curate: pull this data from the literature

  18. Annotation • Time consuming and costly • Not keeping pace with rate of genome sequencing • 2008: 2nd assembly of C. neoformanstype A • 2013: Only 1st assembly in Genbank • 2014: Refined gene models using NGS data • Organism specific databases often have better annotation • Curated databases aims at a particular field • EuPath (Eukaryotic pathogens)

  19. Gene database • Gene – derived database • Curators at NCBI review submissions/literature and create annotated records of every gene and gene product for a subset of organisms • Currently: • ~244 million sequence records in Genbank • ~16 million records in the Gene database

  20. Curation& annotation of all known proteins • Provide […] comprehensive, high-quality and freely accessible resource of protein sequence and functional information. • www.uniprot.org

  21. Uniprot databases • 545,000 reviewed (UniprotKB/Swiss-Prot) • ~56 million not yet reviewed (UniprotKB/TrEMBL)

  22. Other databases • Genome databases (Thursdays topic) • Organism specific (Yeast, Drosophilia, C. elegans, ect.) • Expression patterns • Protein domains • Metabolic pathway • ….. • NAR Database issue: 1st issue of every year • See handout • http://nar.oxfordjournals.org/content/42/D1.toc

More Related