1 / 10

Flowchart of Sequence Data Management: Primary & Secondary Databases in Bioinformatics

This lecture covers the flowchart of sequence data originating from laboratories and literature to primary databases, focusing on various aspects of bioinformatics. It highlights the importance of reliable data from primary sources such as GenBank, EMBL, DDBJ, and SwissProt. Additionally, it explores secondary databases like RefSeq and Conserved Domain Database (CDD) within NCBI, emphasizing the evolution of protein sequence databases and examples, including SWISS-PROT and TrEMBL. The lecture also stresses the reliability of data based on submission methods and potential sequencing errors.

jewel
Télécharger la présentation

Flowchart of Sequence Data Management: Primary & Secondary Databases in Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Ayesha M. Khan 22 Feb, 2012 Lec-4

  2. Flowchart of sequence data from labs and literature to primary sequence database and subsequent secondary databases Secondary Sequence Database Protein Domains & Families Metabolic Pathways e.g. RefSeq and Conserved Domain Database (CDD) within NCBI Primary Sequence Database Amino AcidNucleic Acid e.g. GenBank, EMBL, DDBJ SwissProt and PIR Sequencing centers Literature Researchers Lec-4

  3. Always remember that: • The data within primary databases is as reliable as the data submitted. • This depends primarily on the methods used to produce it. • Regardless of who obtains the sequence data, nucleic acid and amino acid sequencing results are subject to errors. Lec-4

  4. Protein Sequence databases • The protein sequence database was developed at the National Biomedical Research Foundation (NBRF) • Early 1960’s by Margaret Dayhoff to investigate evolutionary relationships among proteins • 1988 onwards, maintained collectively by: Protein Information Resource (PIR) at NBRF, International Protein Information Database of Japan (JIPID), and the Martinsried Institute for Protein Sequences (MIPS). Lec-4

  5. Examples of molecular sequence types in NCBI records Lec-4

  6. Lec-4

  7. Lec-4

  8. Lec-4

  9. Protein Sequence databases SWISS-PROT Started in 1986-University of Geneva and EMBL It is now maintained by Swiss Institute of Bioinformatics (SIB) and EBI/EMBL TrEMBL Started in 1996-Follows SWISS-PROT format and contains translations of coding sequences in EMBL. It also provides: synthetic sequences, short amino acid fragments, and codons that do not encode real proteins. Lec-4

  10. Composite protein sequence databases • A database that merges a variety of different primary sources. • They obviate the need to interrogate multiple resources. • It can eliminate identical sequence copies, or eliminate both identical and highly similar sequences. Lec-4

More Related