1 / 32

NCBI Molecular Biology Resources

NCBI Molecular Biology Resources. A Field Guide part 1. September 29, 2004 ICGEB. Types of Databases. Primary Databases Original submissions by experimentalists

miya
Télécharger la présentation

NCBI Molecular Biology Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NCBI Molecular Biology Resources A Field Guide part 1 September 29, 2004 ICGEB

  2. Types of Databases • Primary Databases • Original submissions by experimentalists • Database staff review and may organize the data, but we don’t add/modify additional information • Records are “owned” and updated by their authors • Examples: GenBank, SNP, GEO • Derivative Databases • Human-curated (compilation and correction of data) • Examples: Gene(LocusLink), Structure & Literature databases • Computationally-Derived • Example: UniGene • Combination • Examples: RefSeq, Genome Assembly, Domain databases

  3. NCBI’s Derivative Sequence Database GenBank genomes transcripts proteins

  4. Forming the “best representative” sequence Standardizing nomenclature and record structure Adding annotation (references, sequence features) mRNAs RELEASE 6 IS NOW AVAILABLE ON THE FTP SITE! Genomes Proteins

  5. RefSeq Curation Processes Curated genomic DNA (NC, NT, NW) Scanning.... Curated Model mRNA(XM) (XR) Model protein (XP) Curated mRNA(NM) (NR) Protein(NP)

  6. RefSeq Chromosomes: NC_ LOCUS NC_000913 4639221 bp DNA circular BCT 30-JUL-2003 DEFINITION Escherichia coli K12, complete genome. ACCESSION NC_000913 VERSION NC_000913.1 GI:16127994 KEYWORDS . SOURCE Escherichia coli K12. ORGANISM Escherichia coli K12 Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Escherichia. REFERENCE 1 (bases 1 to 4639221) AUTHORS Blattner,F.R., Plunkett,G. III, Bloch, C.A., Perna, N.T., Burland,V., Riley,M., Collado-Vides,J., Glasner,J.D., Rode, C.K., Mayhew,G.F., Gregor,J., Davis,N.W., Kirkpatrick,H.A., Goeden,M.A., Rose,D.J., Mau,R. and Shao,Y. TITLE The complete genome sequence of Esherichia coli K12. JOURNAL Science 277 (5331), 1453-1474 (1997) MEDLINE 97426617 PUBMED 9278503 REFERENCE 2 (bases 1 to 4639221) AUTHORS Blattner,F.R. TITLE Direct submission JOURNAL Sumbitted (16-JAN-1997) Guy Plunkett III, Laboratory of Genetics, University of Wisconsin, 445 Henry Mall, Madison, WI 53706, USA. E-mail ecoli@genetics.wisc.edu Phone: 608-262-2543 Fax: gene 3954631..3956478 /gene="mutL" /locus_tag="b4170" /note="synonym: mut-25" CDS 3954631..3956478 /gene="mutL" /locus_tag="b4170" /function="methyl-directed mismatch repair" /codon_start=1 /transl_table=11 /product="MutL" /protein_id="NP_418591.1" /db_xref="GI:16131992" /translation="MPIQVLPPQLANQIAAGEVVERPASVVKELVENSLDAGATRIDI DIERGGAKLIRIRDNGCGIKKDELALALARHATSKIASLDDLEAIISLGFRGEALASI SSVSRLTLTSRTAEQQEAWQAYAEGRDMNVTVKPAAHPVGTTLEVLDLFYNTPARRKF LRTEKTEFNHIDEIIRRIALARFDVTINLSHNGKIVRQYRAVPEGGQKERRLGAICGT AFLEQALAIEWQHGDLTLRGWVADPNHTTPALAEIQYCYVNGRMMRDRLINHAIRQAC EDKLGADQQPAFVLYLEIDPHQVDVNVHPAKHEVRFHQSRLVHDFIYQGVLSVLQQQL ETPLPLDDEPQPAPRSIPENRVAAGRNHFAEPAAREPVAPRYTPAPASGSRPAAPWPN AQPGYQKQQGEVYRQLLQTPAPMQKLKAPEPQEPALAANSQSFGRVLTIVHSDCALLE RDGNISLLSLPVAERWLRQAQLTPGEAPVCAQPLLIPLRLKVSAEEKSALEKAQSALA ELGIDFQSDAQHVTIRAVPLPLRQQNLQILIPELIGYLAKQSVFEPGNIAQWIARNLM SEHAQWSMAQAITLLADVERLCPQLVKTPPGGLLQSVDLHPAIKALKDE" BASE COUNT 978672 a1011074 c 997153 g 974742 t ORIGIN 1 cgtcttcatt gtcagacagc agaatttgta cgcgctgttc ggcttgttgt aatttggcct 61 gcccctgacg tgccagctgc acgccgcgtt cgaactcgtt cagcgcctct tccagcggca 121 ggtcgccact ttccagacgg gttacaatct gttccagctc gctcagcgcc ttttcaaagc 181 tggcgggcgc ctcatttttc ttcggcataa tgaatgtctg actctcaata tttttcgccc 241 cgtcatggta acggactcag ggcaaatagc aaataacgcg caatggtaag gtgatgtgca 301 cagcaaagcg atgttagtgg tatacttccg cgcctggatg cagccgcagg tgtgggctgc 361 tgtatttttc cctatacaag tcgcttaagg cttgccaacg aaccattgcc gccatgaagt 421 ttatcattaa attgttcccg gaaatcacca tcaaaagcca atctgtgcgc ttgcgcttta 481 taaaaatcct taccgggaac attcgtaacg ttttaaagca ctatgatgag acgctcgctg 541 tcgtccgcca ctgggataac atcgaagttc gcgcaaaaga tgaaaaccag cgtctggcta 601 ttcgcgacgc tctgacccgt attccgggta tccaccatat tctcgaagtc gaagacgtgc 661 cgtttaccga catgcacgat attttcgaga aagcgttggt tcagtatcgc gatcagctgg 721 aaggcaaaac cttctgcgta cgcgtgaagc gccgtggcaa acatgatttt agctcgattg 781 atgtggaacg ttacgtcggc ggcggtttaa atcagcatat tgaatccgcg cgcgtgaagc 841 tgaccaatcc ggatgtgact gtccatctgg aagtggaaga cgatcgtctc ctgctgatta 901 aaggccgcta cgaaggtatt ggcggtttcc cgatcggcac ccaggaagat gtgctgtcgc 961 tcatttccgg tggtttcgac tccggtgttt ccagttatat gttgatgcgt cgcggctgcc Annotation of Gene, CDS, and other features Genome sequence

  7. RefSeq: NCBI’s Derivative Sequence Database RefSeq Benefits • Non-redundant   • Explicitly linked nucleotide and protein sequences • Updated to reflect current sequence data and biology • Validated by hand • Format consistency • Distinct accession series • Stewardship by NCBI staff and collaborators ftp://ftp.ncbi.nih.gov/refseq/release

  8. Announcing! Genes:The Gene Summary Database Summary pages of curated information about genetic loci for organisms in the RefSeq project. ►Graphics ►Gene information ►Bibliography (PubMed links) ►General gene information ►NCBI Reference Sequences ►Related sequences ►Additional Links

  9. Entrez Gene

  10. NM/NP Records in Entrez Gene

  11. UniGene Clustering Expressed Sequences • Records are clusters of mRNAs and ESTs that ideally represent single genes • Records are created automatically by a modified BLAST algorithm • UniGene provides a means to identify an EST or unannotated mRNA

  12. A Cluster of ESTs:Arabidopsis serine protease query 5’ EST hits 3’ EST hits Sequence & Expression

  13. Chordata Mammalia Bos taurus (cow) Canis familiaris (dog) Homo sapiens (human) Mus musculus (mouse) Ovis aries (sheep) Rattus norvegicus (rat) Sus scrofa (pig) Aves Gallus gallus (chicken) Amphibia Xenopus laevis (african clawed frog) Xenopus tropicalis (western clawed frog) Actinopterygii Danio rerio (zebra fish) Oncorhynchus mykiss (rainbow trout) Oryzias Latipes (japanese rice fish) Salmo salar (salmon) Ascidiacea Ciona intestinalis (sea squirt) Embryophyta Cycadopsida Pinus taeda (loblolly pine) Bryopsida Physcomitrella patens Eudicotyledons Arabidopsis thaliana (thale cress) Glycine max (soybean) Helianthus annus (sunflower) Lactuca sativa (lettuce) Lotus corniculatus (lotus flower) Lycopersicon esculentum (tomato) Malus x domestica (apple) Medicago truncatula (barrel medic) Populus tremula/tremuloides (poplar) Solanum tuberosum (potato) Vitis vinifera (wine grape) Liliopsida Hordeum vulagre (barley) Oryza sativa (rice) Saccharum officinarum (noble cane) Sorghum bicolor (sorghum) Triticum aestivum (bread wheat) Zea mays (corn) • Arthropoda • Insecta • Anopheles gambiae (malaria mosquito) • Apis mellifera (honeybee) • Drosophila melanogaster (fruit fly) • Bombyx mori (silkworm) Echinodermata EchinoideaStrongylocentrotus purpuratus Nematoda Chromadorea Caenorhabditis elegans Platyhelminthes TrematodaSchistosoma mansoni Mycetozoa Dictyosteliida Dictyostedlium discoideum (slime mold) • Chlorophyta • Chlorophycaea Chlamydomonas reinhardii • Apicomplexa • Coccidia Toxoplasma gondii UniGene Collections As of July 2004

  14. Finding UniGene Clusters by link by Entrez search

  15. UniGene Cluster for PRNP

  16. Complete Genomesas of June 2004 Organelles: • Mitochondria (558) • Plastids (40) • Plasmids (626) • Nucleomorphs (3) Viruses (1923) Archaebacteria (44) Eubacteria (176) Eukaryotes (61)

  17. Simple Genomes • Full chromosomal sequences are provided • Genes are annotated • The annotation can be shown graphically and linked to sequence records

  18. mutL

  19. Complex Genomes • Sequences are provided complete or we help assemble • Heavy annotation: Genes, transcript regions & ORFs, sequence variations & markers, clones, ESTs, etc. • The annotation can be shown graphically and linked to other databases using the MapViewer

  20. Viewing Complex Genomes NCBI Map Viewer • Map Viewer Home Page • Shows all supported organisms • Provides links to genomic BLAST • Genome Overview Page • Provides links to individual chromosomes • Shows hits on a genome graphically • Chromosome Viewing Page • Allows interactive views of annotation details • Provides numerous maps unique to each genome

  21. Map Viewer Home Page

  22. Genome Overview Page Search the maps Genomic BLAST Species-specific help!

  23. Search For Human PRNP PRNP

  24. Human PRNP on Genome View

  25. Chromosome Viewing Page Map Summary Add or remove maps Master Map with exploded content Genes UniGene Zooming Controls Clone

  26. Zooming in… Left click

  27. Map Viewer Analysis Tools Evidence Viewer Link to Protein Homologene Homologene Link to OMIM Sequence Viewer Download Sequence ModelMaker

  28. Homologene

  29. Homology Comparisons on Map Viewer

  30. Intermission

More Related