1 / 21

Protein sequence retrieval AND other database information

Protein sequence retrieval AND other database information. databases. Protein sequence(primary) SWISS-PROT PIR-International Protein sequence (composite) OWL NRDB. Protein sequence (secondary). PROSITE PRINTS Pfam. Macromolecular structures. Protein Data Bank (PDB)

wyanet
Télécharger la présentation

Protein sequence retrieval AND other database information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein sequence retrievalAND other database information

  2. databases • Protein sequence(primary) • SWISS-PROT • PIR-International • Protein sequence (composite) • OWL • NRDB

  3. Protein sequence (secondary) • PROSITE • PRINTS • Pfam

  4. Macromolecular structures • Protein Data Bank (PDB) • Nucleic Acids Database (NDB) • HIV Protease Database • ReLiBase • PDBsum • CATH • SCOP • FSSP

  5. Nucleotide sequences • GenBank • EMBL • DDBJ • Genome sequences • Entrez genomes • GeneCensus • COGs

  6. Integrated databases • InterPro • Sequence retrieval system (SRS) • Entrez

  7. Protein Sequence Alignment and Database Searching Alignment of Two Sequences (Pair-wise Alignment) The Scoring Schemes or Weight Matrices Techniques of Alignments DOTPLOT Multiple Sequence Alignment (Alignment of > 2 Sequences) Extending Dynamic Programming to more sequences Progressive Alignment (Tree or Hierarchical Methods) Iterative Techniques Stochastic Algorithms (SA, GA, HMM) Non Stochastic Algorithms Database Scanning FASTA, BLAST, PSIBLAST, ISS Alignment of Whole Genomes MUMmer (Maximal Unique Match)

  8. Input Query Amino Acid Sequence DNA Sequence Blastp tblastn blastn blastx tblastx Compares Against Protein Sequence Database Compares Against translated Nucleotide Sequence Database Compares Against Nucleotide Sequence Database Compares Against Protein Sequence Database Compares Against translated nucleotide Sequence Database An Overview of BLAST

  9. Comparison of Whole Genomes • MUMmer (Salzberg group, 1999, 2002) • Pair-wise sequence alignment of genomes • Assume that sequences are closely related • Allow to detect repeats, inverse repeats, SNP • Domain inserted/deleted • Identify the exact matches • How it works • Identify the maximal unique match (MUM) in two genomes • As two genome are similar so larger MUM will be there • Sort the matches found in MUM and extract longest set of possible matches that occurs in same order (Ordered MUM) • Suffix tree was used to identify MUM • Close the gaps by SNPs, large inserts • Align region between MUMs by Smith-Waterman

  10. Secondary protein database • SWISS-PROT (1986) • Best annotated, least redundant • PIR (Protein Information Resource) • More automated annotation • Collaborations with MIPS and JIPID

  11. Secondary protein databases • SWISS-PROT (1986) • Best annotated, least redundant • PIR (Protein Information Resource) • More automated annotation • Collaborations with MIPS and JIPID • Uniprot (2003) • UniProt (Universal Protein Resource) is a central repository of protein sequence and function created by joining the information contained in Swiss-Prot, TrEMBL, and PIR.

  12. Primary (archival) GenBank/EMBL/DDBJ UniProt PDB Medline (PubMed) BIND Secondary (curated) RefSeq Taxon UniProt OMIM SGD Databases

  13. Organismal Divisions Used in which database? BCT Bacterial DDBJ - GenBank FUN Fungal EMBL HUM Homo sapiens DDBJ - EMBL INV Invertebrate all MAM Other mammalian all ORG Organelle EMBL PHG Phage all PLN Plant all PRI Primate (also see HUM) all (not same data in all) PRO Prokaryotic EMBL ROD Rodent all SYN Synthetic and chimeric all VRL Viral all VRT Other vertebrate all

  14. Functional Divisions PAT Patent EST Expressed Sequence Tags STS Sequence Tagged Site GSS Genome Survey Sequence HTG High Throughput Genome (unfinished) HTC High throughput cDNA (unfinished) CON Contig assembly instructions Organismal divisions: BCTFUNINVMAMPHGPLN PRIRODSYNVRLVRT

  15. EST: Expressed Sequence Tag Expressed Sequence Tags are short (300-500 bp) single reads from mRNA (cDNA) which are produced in large numbers. They represent a snapshot of what is expressed in a given tissue, and developmental stage. Also see: http://www.ncbi.nlm.nih.gov/dbEST/ http://www.ncbi.nlm.nih.gov/UniGene/

  16. STS Sequenced Tagged Sites, are operationally unique sequence that identifies the combination of primer pairs used in a PCR assay that generate a mapping reagent which maps to a single position within the genome. Also see: http://www.ncbi.nlm.nih.gov/dbSTS/http://www.ncbi.nlm.nih.gov/genemap/

  17. GSS: Genome Survey Sequences • Genome Survey Sequences are similar in nature • to the ESTs, except that its sequences are genomic • in origin, rather than cDNA (mRNA). • The GSS division contains: • random "single pass read" genome survey sequences. • single pass reads from cosmid/BAC/YAC ends (these could • be chromosome specific, but need not be) • exon trapped genomic sequences • Alu PCR sequences Also see: http://www.ncbi.nlm.nih.gov/dbGSS/

  18. HTG: High Throughput Genome High Throughput Genome Sequences are unfinished genome sequencing efforts records. Unfinished records have gaps in the nucleotides sequence, low accuracy, and no annotations on the records. Also see: http://www.ncbi.nlm.nih.gov/HTGS/ Ouellette and Boguski (1997) Genome Res.7:952-955

  19. Which tool? mRNA Genomic STS/ GSS EST Other Other HTGS dbEST Simple • Better control of annotations • pop/phylo • segmented sets Simple dbSTSdbGSS Customized software or tbl2asn E-mail or FTP WWW BankIt Sequin or tbl2asn WWW BankIt E-mail or FTP E-mail or FTP E-mail

More Related