160 likes | 311 Vues
On line (DNA and amino acid) Sequence Information . Lecture 9. Introduction. Annotation of genes Basic bioinformatics Databases NCBI home page Query and return results DNA sequence results page Protein sequence results page . Bioinformatcs Databases.
E N D
Introduction • Annotation of genes • Basic bioinformatics Databases • NCBI home page • Query and return results • DNA sequence results page • Protein sequence results page
Bioinformatcs Databases • The Biological data, generated by various labs, is submitted and stored in specific databases is : • The data is Nucleotide: DNA and mRNA (cDNA) and Proteins sequences • The main “primary” nucleotide sequence databases are: • United states: Genebank (NCBI) • Europe: Nucleotide sequence database (EMBL) • Japan: DNA databank of Japan. • These databases also contain sequences related to: • Expressed sequence tags (ESTs) small (800 bp) of mRNA and can be used to see what genes are expressed…
Protein Databases • The main protein databases is: • Uniprot: (universal Protein resource) • Uniprot (KB) databases contains data from • SWISS-PROT (most up-to date information) • Trembl: (translation of coding sequences.) • PIR database • Both the nucleotide and databases contain much more detail than sequences and the detail is referred to annotation.
Annotation of sequences • Once the gene sequence’s have been determined then the data must be annotated: (Klug 2010) • Identify regulatory regions • Other sequences of interest: exons/ introns, coding sequences (cds), polyA signal • In protein annotation there are mRNA sequences • Other organisms where the DNA sequence/ AA sequence is to found • Journals/Reference to where data came from. Global Sequence
Bioinformatics Database • Bioinformatic Databases contain information for various biological data: • To faciliate finding information there are a number of specific search engines: • NCBI has ENTREZ • EMBL has SRS • Consider the following query: • What is the DNA and amino acid sequence for the following gene: Human BTEB • more detail on the terms can be found by looking at a sample record: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord
Coding section of gene The Exon intron structure is also available in graphic form
Other databases databases • The nucleotide (Genbank and EMBL) and protein (Uniprot) contain the “raw data” and are referred to as primary databases. • More specific databases derive data from these and are referred to as secondary database; examples include protein family and sequence similarity databases such as PROSITE and PRINTS • There are databases which contain information about specific organisms such as e. coli using Genome online database (GOLD)
Other databases • Databases for specific types of sequences such as those associated with promoters and other regulatory elements. • Others include structural databases from the Protein Data Bank • On-line Mendelian inheritance of man (OMIM) which contains information on human genes and genetic disorders.
Bioinformatics Search Engines • The Entrez (NCBI) search engine retrives information from NCBI databases and can be used to obtain other information including publications (Pubmed), 3D protein structures, online mendellian inheritance of Man…. A tutorial can be found at: • Entrez: Making use of its power: • The EMBL uses ExPASy site which utilises the open source application: Sequence retrival system: a tutorial can be found at: • SRS tutotial: quick tour
Other important information sources • PUBMED: Literature research: journal articles/ conference proceedings/ books etc. • Search under many fields: keyword, author…. • Returns: journal articles/abstracts • Two types: general/review. • NCBI account: set up an NCBI account to manage previous searches…. • BTEB pubmed search found at: • http://www.ncbi.nlm.nih.gov/pubmed?term=BTEB&cmd=DetailsSearch