1 / 41

Computational Structural Biology

Computational Structural Biology. T.A.: Naama Amir E-mail: naamaamir@mail.tau.ac.il Schreiber 10 Homework (probably 4) Acceptance hour – very flexible. (Available after the tutorial) Most important: Ask a lot of questions! . Know your protein. Exercise 1: Databases presentation .

sonja
Télécharger la présentation

Computational Structural Biology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Structural Biology • T.A.: Naama Amir • E-mail: naamaamir@mail.tau.ac.il • Schreiber 10 • Homework (probably 4) • Acceptance hour – very flexible. • (Available after the tutorial) • Most important: Ask a lot of questions!

  2. Know your protein Exercise 1: Databases presentation

  3. Presented Databases: • UniProt- main sequence database • SwissProt • Tremble • NCBI- lots of databases, including sequence and structures • RCSB- the Protein Data Bank- all deposited structures • PDBsum- combines structural & sequence data

  4. UniProt The Universal Protein Resource • The world's most comprehensive catalog of information on proteins • Sequence, function & more… • Comprised mainly of the databases: • SwissProt – 412525 last year, 538010 protein entries now – high quality annotation, non-redundant & cross-referenced to many other databases. manually annotated and reviewed. • TrEMBL – 17651716 last year, 23994583 protein entries now – computer translation of the genetic information from the EMBL Nucleotide Sequence Database  many proteins are poorly annotated since only automatic annotation is generated

  5. UniProt • Annotation description includes: • Function(s) of the protein; • Posttranslationalmodification(s) such as phosphorylation, acetylation and GPI-anchor; • Domains and sites, for example, calcium-binding regions, ATP-binding sites, zinc fingers, homeoboxes, • Secondary structure, e.g. alpha helix, beta sheet; • Quaternary structure, i.g. homodimer, heterotrimer, etc.; • Similarities to other proteins; • Diseases associated with any number of deficiencies in the protein; • Sequence annotationas Sequence conflicts, variants, etc

  6. UniProt • Connected to many other databases (e.g EC, PdbSum, PDB (to be discussed…)) • Each sequence has a unique 6 letter accession • Entries in SwissProt also have IDs, which usually make sense (e.g. CADH1_HUMAN for a cadherin of humans) • Download sequence in FASTA format

  7. UniProt • FASTA format for protein sequences: >P05102|MTH1_HAEPH Modification methylaseHhaI MIEIKDKQLTGLRFIDLFAGLGGFRLALESCGAECVYSNEWDKYAQEVYEMNFGEK EGDITQVNEKTIPDHDILCAGFPCQAFSISGKQKGFEDSRGTLFFDIARIVREKKPK VVFMENVKNFASHDNGNTLEVVKNTMNELDYSFHAKVLNALDYGIPQKRERIYMIC RNDLNIQNFQFPKPFELNTFVKDLLLPDSEVEHLVIDRKDLVMTNQEIEQTTPKTV LGIVGKGGQGERIYSTRGIAITLSAYGGGIFAKTGGYLVNGKTRKLHPRECARVMG PDSYKVHPSTSQAYK QFGNSVVINVLQYIAYNIGSSLNFKPY

  8. UniProt http://www.uniprot.org/ Type accession: P05102 Or ID: MTH1 _HAEPH

  9. UniProt

  10. UniProt General data: name, origin, EC (enzymatic reaction)…

  11. UniProt Functional data, including the GO annotations Scroll down to find the sequence & download the FASTA

  12. UniProt Functional data, including the GO annotations Scroll down to find the sequence & download the FASTA

  13. UniProt Known sites, predicted/known secondary structures, Natural variation or mutagenesis

  14. UniProt The protein’s sequence in FASTA format Download Send to BLAST (will be discussed later on in the course)

  15. UniProt References for all info in the page- important to take a look…

  16. UniProt Connections to other databases Other sequence database, e.g. genebank Related structures in the PDB (if available) Model-structure in the ModBase database- automatically derived! All sorts of domain\motifs databases- The family related to the entry

  17. NCBI National Center for Biotechnology Information • biomedical and genomic information. • Many sorts of databases:e.g. biomedical literature, sequences, and protein structures.

  18. NCBI • Public databases

  19. NCBI • Public databases • GenBank -NIH genetic sequence database, a collection of all publicly available DNA sequences. • RefSeq- The Reference Sequence collection- comprehensive, integrated, non-redundant, well-annotated sequences: genomic DNA, Transcripts, proteins. • ~6,413,124 protein entries • Protein–compiled from a variety of sources, including SwissProt, PIR, PRF, PDB, and translations from annotated coding regions in GenBank and RefSeq.

  20. NCBI • http://www.ncbi.nlm.nih.gov/ e.g. UniProt accession The protein database

  21. NCBI

  22. NCBI • NCBI- http://www.ncbi.nlm.nih.gov/ GI- the unique ID given to each sequence UniProt accession, not ID! PDB entries

  23. RCSB – Protein Data Bank (pdb) • The main & comprehensive database for biological macro-molecular structures • Each structure receives a PDB ID: a 4 letters unique identifier • Search by author, PDB id or any keyword. • Download structures

  24. RCSB – Protein Data Bank (pdb) • RCSB- Protein Databank http://www.rcsb.org PDB ID: 10mh

  25. RCSB – Protein Data Bank (pdb) • Protein Data Bank Download structure The paper describing the structure Display structure

  26. RCSB – Protein Data Bank (pdb) • Protein Data Bank Download structure The paper describing the structure Display structure

  27. RCSB – Protein Data Bank (pdb) • PDB files have a specific format: • HEADER – pdb code and deposited date • TITLE – the paper title. • REMARK • JRNL- reference • HELIX, BETA- secondary structure • ATOM – The actual protein/DNA/RNA chain • HETATM- additional atoms such as ligands, water etc. • MODEL/ENDMDL • … http://www.wwpdb.org/documentation/format3.1-20080211.pdf

  28. RCSB – Protein Data Bank (pdb) Atom number PDB files have a specific format: ATOM 7 SD MET A 1 -29.059 28.614 71.539 1.00 26.90 S ATOM 8 CE MET A 1 -27.535 29.074 70.866 1.00 16.57 C ATOM 9 N ILE A 2 -29.656 32.903 69.094 1.00 25.93 N ATOM 10 CA ILE A 2 -30.077 33.171 67.730 1.00 25.49 C HETATM 3139 C6 SAH 328 -11.642 26.514 89.489 1.00 17.97 C HETATM 3140 N6 SAH 328 -10.474 26.661 90.103 1.00 14.50 N HETATM 3141 N1 SAH 328 -11.895 25.334 88.899 1.00 23.10 N HETATM 3142 C2 SAH 328 -13.079 25.090 88.350 1.00 16.93 C HETATM 3143 N3 SAH 328 -14.120 25.887 88.278 1.00 16.05 N HETATM 3144 C4 SAH 328 -13.832 27.092 88.861 1.00 14.31 C HETATM 3145 O HOH 329 -29.525 42.890 90.934 1.00 24.84 O HETATM 3146 O HOH 330 -28.213 42.867 93.588 1.00 8.11 O HETATM 3147 O HOH 331 -24.619 35.287 96.173 1.00 17.96 O B-factor Atom, residue or molecule Coordinates: X, Y,Z Residue number Chain if exists http://www.wwpdb.org/documentation/format33/sect9.html#ATOM

  29. RCSB – Protein Data Bank (pdb) Resolution: a measure of the underlying data quality. High-resolution structures have low values. R-value: Measures the quality of the atomic model obtained from the crystallographic data. Again the lower the better. Typical values are about 0.20.

  30. RCSB – Protein Data Bank (pdb)

  31. PdbSum • A database providing an overview of all biological macromolecular structures • Connected to UniProt find the sequence accession of a known PDB ID • Detailed description of many structure properties, e.g.: • ECnumber ( Enzyme Commission number) • Chains & ligands and their interactions • Secondary structure • FASTA sequence of structure… • …

  32. PdbSum http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/ PDB ID Free text Search by sequence

  33. PdbSum Useful tabs UniProt accession Chains & ligands

  34. PdbSum

  35. PdbSum Protein tab Secondary structure- from the PDB

  36. PdbSum Ligand tab The ligand’s structure

  37. Databases presentation • Summary • UniProt – UniProt accession or SwissProt ID. • NCBI- UniProt accession, SwissProt ID, GI for NCBI or by free text. • RCSB- search by PDB id, (or by free text) • PDBsum- search by PDB id, UniProt accession, (or by free text…)

  38. Databases presentation • Buzz words from this exercise • Protein-FASTA format, pdb file, resolution, ligand, chain, uniprotaccession,SwissProtID, GI number… • Databases - RCSB, PDB, PDBsum, UniProt, TrEmbl, SwissProt, NCBI

  39. Questions? GOOD LUCK!

  40. Swiss-Prot-Protein entries

  41. TrEMBL- Protein entries

More Related