Comprehensive Analysis of GenBank Records and Related Nucleotide Databases
This document serves as a detailed guide on GenBank records, focusing on the flat file format and the feature table, particularly for mouse (Mus musculus) REV1-like mRNA and bovine (Bos taurus) hemochromatosis. It outlines key entries such as accession numbers, organism details, and sequence metrics, along with strategies for effective searching in nucleotide databases like Entrez. Additional insights include genomic and transcript variant information, providing a robust framework for genetic research, analysis, and data retrieval.
Comprehensive Analysis of GenBank Records and Related Nucleotide Databases
E N D
Presentation Transcript
National Center for Biotechnology Information A Field Guidepart 2 UT-Health Science Center February 14, 2006
Header Feature Table Sequence GenBank Records The Flatfile Format
LOCUS NM_019570 4279 bp mRNA linear INV 28-OCT-2004 DEFINITION Mus musculus REV1-like(S. cerevisiae)(Rev1l),mRNA ACCESSION NM_019570 VERSION NM_019570.3 GI:50811869 KEYWORDS . = Title A Typical GenBank Record
GenBank Record: Feature Table, con’t. GenPept identifier
skip GenBank Record: sequence
[accn] [orgn] [mdat] [prop] Indexing for Nucleotide UID 59958365 FieldIndexed Terms [primary accession] NM_001012399 [title] Bos taurus hemochromatosis (hfe), mRNA. [organism] Bos taurus [sequence length] 1168 [modification date] 2005/02/19 [properties] biomol mrna gbdiv mam srcdb refseq
[Title] Entrez Nucleotide: HFE 137 records Not HFE
42 records Curated HFE splice variants (11 total) Smarter Query hfe[title] AND human[orgn]
hfe[title]ANDhuman[orgn] (con’t) Primary data
srcdb Preview/Index: Properties, srcdb Properties
Preview/Index: Properties, srcdb …AND srcdb refseq[Properties]
Preview/Index: Properties, srcdb …AND srcdb ddbj/embl/genbank[Properties]
Primate division gbdiv pri[prop] EST division gbdiv est[prop] Database Queries #1hfe 137 #2 hfe[title]AND human[orgn] 42 #3 #2 AND srcdb refseq[prop] 11 #4 #2 AND srcdb ddbj/embl/genbank[prop] 31 #5 #4 AND gbdiv pri[prop] 29 #4 #4 AND gbdiv est[prop] 2
Genomic DNA biomol genomic[prop] cDNA biomol mrna[prop] Molecule Queries #1hfe 116 #2 hfe[title]AND human[orgn] 42 #3 #2 AND biomol mrna[prop] 29 #4 #2 AND biomol genomic[prop] 13
Entrez Nucleotide Reviewed RefSeqs with transcript variants: srcdb refseq reviewed[prop]ANDtranscript[title] AND variant[title] More Queries… Fields are database-specific
Entrez Nucleotide Reviewed RefSeqs with transcript variants: srcdb refseq reviewed[prop]ANDtranscript[title] AND variant[title] Entrez Gene Topoisomerase genes from Archaea: topoisomerase[gene name]ANDarchaea[organism] Genes on human chromosome 2 with OMIM links 2[chromosome] ANDhuman[organism]AND“gene omim”[filter] Membrane proteins linked to cancer: “integral to plasma membrane”[gene ontology]ANDcancer[dis] More Queries… Fields are database-specific
Genomic Biology UniGene E-PCR Map Viewer Trace Archive Genome Resources Genomic Biology
Genome Projects: microb 13 Eukaryotic Genome Sequencing Projects Selected: Complete – 0, Assembly – 2, In Progress - 11
E-PCR Map Viewer Trace Archive Genome Resources Genomic Biology UniGene
UniGene Gene-oriented clusters of expressed sequences • Automatic clustering using MegaBlast • Each cluster represents a unique gene • Informed by genome hits • Information on tissue types and map locations • Useful for gene discovery and selection of mapping reagents
A Cluster of ESTs query 5’ EST hits 3’ EST hits
UniGene Collections Species UniGene
ftp://ftp.ncbi.nih.gov/repository/UniGene/Homo_sapiens/ Get Sequences web page
UniGene Map Viewer E-PCR Trace Archive Genome Resources Genomic Biology
E-PCR Genomic sequence here