1 / 39

Genomics and Personalized Health Care Databases

Genomics and Personalized Health Care Databases. Bailee Ludwig Quality Management . Molecular Biology Databases. Excellent means of storing a vast amount of Information in a central , sharable location

taite
Télécharger la présentation

Genomics and Personalized Health Care Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genomics and Personalized Health CareDatabases Bailee Ludwig Quality Management

  2. Molecular Biology Databases • Excellent means of storing a vast amount of Information in a central, sharable location • Biological databases are designed especially for the proper storing, searching & retrieving biological data • Keyword Searches • Cross-Referencing • 3D capabilities

  3. Database Categories • Categories • Nucleotide Sequence Databases • Gene Databases • Genome Databases • Protein Sequence Databases • Structure Databases • Metabolic and Signaling Pathways • Human Genes and Diseases • Microarray Data and other Expression Databases • … • Each contains specific information • Each is interrelated

  4. Nucleotide & Protein Sequence Databases

  5. National Center for Biotechnology Information (NCBI) • Created as a part of National Library of Medicine in 1988 • Establish public databases • Perform research in computational biology • Develop software tools for sequence analysis • Disseminate biomedical information • Databases • Sequence, such as GeneBank, RefSeq, dbSNP • Literature, such as PubMed, OMIM • Tools • Entrez. Blast, Cn3D, etc.

  6. NCBI Homepage

  7. NCBI Site Map

  8. All Databases at NCBI:

  9. Let’s Check out NCBI • http://www.ncbi.nlm.nih.gov/sites/gquery?itool=toolbar

  10. Multiple ways to find Genes…

  11. Let’s Look at BRCA1

  12. GenBank http://www.ncbi.nlm.nih.gov/Genbank/

  13. GenBank • Nucleotide only sequence database • GenBank Data • Direct submissions individual records (BankIt, Sequin) • Batch submissions via email (EST, GSS, STS) • ftp accounts established for sequencing centers • Data shared nightly amongst three collaborating databases: • GenBank • DNA Database of Japan (DDBJ). • European Molecular Biology Laboratory Database (EMBL)

  14. GeneBank Release 175.0 • ftp://ftp.ncbi.nih.gov/genbank/ • Full release every two months • Incremental and cumulative updates daily • Release 175.0 (12/15/2009) • 112,910,950 Sequences • 110,118,557,163 Bases

  15. NCBI Reference Sequences

  16. GenBank Record (Header)

  17. Summary

  18. GenBank Record (Sequence) ORIGIN 1 aaaaagagaaactgttgggagaggaatcgtatctccatatttcttctttcagccccaatc 61 caagggttgtagctggaactttccatcagttcttcctttctttttcctctctaagccttt 121 gccttgctctgtcacagtgaagtcagccagagcagggctgttaaactctgtgaaatttgt 181 cataagggtgtcaggtatttcttactggcttccaaagaaacatagataaagaaatctttc 241 ctgtggcttcccttggcaggctgcattcagaaggtctctcagttgaagaaagagcttgga 301 ggacaacagcacaacaggagagtaaaagatgccccagggctgaggcctccgctcaggcag 361 ccgcatctggggtcaatcatactcaccttgcccgggccatgctccagcaaaatcaagctg 421 ttttcttttgaaagttcaaactcatcaagattatgctgctcactcttatcattctgttgc 481 cagtagtttcaaaatttagttttgttagtctctcagcaccgcagcactggagctgtcctg 541 aaggtactctcgcaggaaatgggaattctacttgtgtgggtcctgcacccttcttaattt 601 tctcccatggaaatagtatctttaggattgacacagaaggaaccaattatgagcaattgg 661 tggtggatgctggtgtctcagtgatcatggattttcattataatgagaaaagaatctatt 721 gggtggatttagaaagacaacttttgcaaagagtttttctgaatgggtcaaggcaagaga 781 gagtatgtaatatagagaaaaatgtttctggaatggcaataaattggataaatgaagaag 841 ttatttggtcaaatcaacaggaaggaatcattacagtaacagatatgaaaggaaataatt 901 cccacattcttttaagtgctttaaaatatcctgcaaatgtagcagttgatccagtagaaa 961 ggtttatattttggtcttcagaggtggctggaagcctttatagagcagatctcgatggtg

  19. FASTA: Sequence Format

  20. Sequence Viewer Graphics

  21. RefSeq

  22. RefSeq • Database of reference sequences • http://www.ncbi.nlm.nih.gov/RefSeq/ • Curated • Many experimentally validated • Some partially validated via ESTs • Some computationally predicted • Non-redundant; one record for each gene, or each splice variant, from each organism represented

  23. Accession Numbers • DNA sequences and other molecular data are tagged with accession numbers that are used to identify a sequence or other record relevant to molecular data • RefSeq provides an expertly curated accession number that corresponds to the most stable, agreed-upon “reference” version of a sequence. • RefSeq identifiers include the following formats: • Complete chromosome NC_###### • Genomic contig NT_###### • mRNA (DNA format) NM_###### • Protein NP_######

  24. Accession Numbers: More Examples AC_123456 Genomic Alternate complete genomic AP_123456 Protein Protein products; alternate NG_123456 Genomic Incomplete genomic regions NR_123456 RNA Non-coding transcripts NW_123456 Genomic Genomic assemblies NZ_ABCD12345678 Genomic Whole genome shotgun data XM_123456 mRNA Transcript products XP_123456 Protein Protein products XR_123456 RNA Transcript products YP_123456 Protein Protein products ZP_12345678 Protein Protein products

  25. EST

  26. EST • Expressed Sequence Tags database (dbEST) is a division of GenBank that contains sequence data and other information on "single-pass" cDNA sequences, or "Expressed Sequence Tags", from a number of organisms • http://www.ncbi.nlm.nih.gov/sites/entrez?db=nucest&cmd=search&term=

  27. EST • mRNA: Genomic regions actively transcribed in cell • cDNA (complementary DNA) • Copy of mRNA using mRNA as a template • Sequence is complementary to mRNA • EST: Expressed Sequence Tag (a short sub-sequence of a transcribed cDNA sequence) • Partial cDNA sequence • Can be 5’ or 3’ • Typical size: 200 - 500 bp • Represents mRNA actively transcribed in cell • Use to identify • Genes; Alternative splicing; etc.

  28. Access to dbEST Data • EST sequences are included in the EST division of GenBank, available from NCBI by anonymous ftp and through Entrez • The nucleotide sequences may be searched using the BLAST server • The TBLASTN program which takes an amino acid query sequence and compares it with six-frame translations of dbEST DNA sequences is particularly useful. • EST sequences are also available as a flat file in the FASTA format by anonymous ftp in the /repository/dbEST directory at ftp.ncbi.nih.gov

  29. UniGene

  30. UniGene • www.ncbi.nlm.nih.gov/UniGene • Each UniGene entry is a set of transcript sequences that appear to come from the same transcription locus (gene or expressed pseudogene) • In addition to sequences of well-characterized genes, hundreds of thousands novel expressed sequence tag (EST) sequences have been included. • UniGene may be of use as a resource for gene discovery. • UniGene has also been used by experimentalists to select reagents for gene mapping projects and large-scale expression analysis.

  31. Numbers of UniGene Entries • Bostaurus (cow) 42,843 • Canis lupus familiaris (dog) 27,853 • Equuscaballus (horse) 8,348 • Homo sapiens (human) 123,396 • Musmusculus (mouse) 78,289 • Ovisaries (sheep) 18,814 • Rattusnorvegicus (Norway rat) 63,434 • Susscrofa (pig) 51,576 • Daniorerio (zebrafish) 51,481

  32. UniGene • UniGene is a useful tool to look up information about expressed genes • UniGene displays information about the abundance of a transcript (expressed gene), as well as its regional distribution of expression

  33. Protein Structure

  34. Now… Let’s Give these databases a closer look with a Lab

More Related