1 / 43

An Overview of the RCSB Protein Data Bank

An Overview of the RCSB Protein Data Bank. http://www.pdb.org/ • info@rcsb.org. History of the PDB. 1970s Community discussions about how to establish a PDB Cold Spring Harbor meeting in protein crystallography PDB established at Brookhaven (October 1971; 7 structures) 1980s

lucky
Télécharger la présentation

An Overview of the RCSB Protein Data Bank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Overview of the RCSB Protein Data Bank http://www.pdb.org/ • info@rcsb.org

  2. History of the PDB 1970s • Community discussions about how to establish a PDB • Cold Spring Harbor meeting in protein crystallography • PDB established at Brookhaven (October 1971; 7 structures) 1980s • Number of structures increases as technology improves • Community discussions about requiring depositions • IUCr guidelines established • Number of structures deposited increases • Independent biological databases established – e.g., the NDB 1990s • mmCIF project completed • Structural genomics begins • PDB moves to RCSB 2000s • RCSB PDB renewed • wwPDB established

  3. PDB Mission To provide the most accurate, well-annotated data in the most timely and efficient way possible to facilitate new discoveries and advances in science

  4. Structural Biology • Understand biological processes through structural analyses • Several methods (X-ray, NMR, cryo-electron microscopy)

  5. Number of released entries Year

  6. Growth of Molecular Complexity

  7. Structural Genomics “The next step beyond the human genome project” From the NIH Request for Proposals for Structure Genomics Centers: “These studies should lead to an understanding of structure/function relationships and the ability to obtain structural models of all proteins identified by genomics. This project will require the determination of a large number of protein structures in a high-throughput mode.”

  8. The Rules Driving Structural Genomics • There is much information derived from structure that is not available from sequence alone yet there are 2-3 orders of magnitude more sequences that structures • There is a high likelihood that if two sequences are similar they will have similar structures • Two dissimilar sequences can share a similar structure as a result of divergent or convergent evolution • Similar structures may confer similar functions

  9. Challenges • Growth in number of structures • Increase in complexity of structures • New methods for structure determination • Demand for complex queries • Demand for more annotation • Integration with other genomic and proteomic information • Larger and more diverse community of users

  10. PDB Timeline 1993 1998 2003 2008 Total structures 1727 8942 23793 60000? # of structures deposited/year 792 2178 4831 9000? Average #of Web hits/day N/A 57000 180000 ?

  11. The Data Pipeline

  12. Structure Determination(X-ray) Publication Functional Annotation Target Selection Data Collection Structure Solution Structure Refinement Crystallomics PDB Deposition Isolation, Expression, Purification,Crystallization

  13. Data Processing Data Flow

  14. Depositor Validation MAXIT Data ADIT AutoDep Input Tool Database Loader Reports Final Files Metadata Dictionaries Data Views System for Data Collection and Archiving

  15. Data Processing System Features • Different dictionaries without software changes • Simple customization of both functionality and content • Automatically scales with changes in content • Can be distributed to multiple deposition sites • Reference data and standard nomenclature (ERFs)

  16. Data Content of Each PDB Entry • 1970’s • Name, source, reference, resolution, sequence,secondary structure, crystal data, coordinates, unstructured remarks • 1990’s • Name, source, reference,resolution, refinement details, data collection and processing details,symmetry details, biological unit information, missing residues, related entries, sequence, ligand and ions, secondary structure, crystal data, coordinates, few unstructured remarks

  17. Content Coverage

  18. Annotation and Validation • ADIT • Reviewing, adding, correcting entry information • Maxit • File format conversions • Blast Automation Tool results • Sequence discrepancies, protein names, synonyms, source info, EC number • Validation Server Reports • Format and nomenclature consistency • Sequence/coordinate mismatches • Geometrical checks (NUCheck, PROCHECK) • Experimental checks (SFCheck) • Ligand Depot, ChemDraw • RasMol for Visualization • PubMed, Citation Tracker, Citation Tool

  19. Data Uniformity • Sequence • Resolve anomalies relative to Swiss-Prot, GenBank • Resolve anomalies between sequence and atom • Atom nomenclature • Atom naming problems in 40% of structures • Redundant atom labels • Errors in chirality • Biologically active molecule described • Ligands • Names standardized • http://deposit.pdb.org/public-components-erf.cif • Biological assembly ftp://beta.rcsb.org/pub/pdb/uniformity/data/mmCIF/ The Protein Data Bank: Unifying the Archive. Nucleic Acids Research 2002, 30:245-248

  20. Additional Requirements of Structural Genomics • All data in Materials and Methods section of a journal should be captured • Tracking of all experiments must be publicly available

  21. Extending Data Dictionaries for Deposition • X-ray • Structure determination data items • http://deposit.pdb.org/mmcif/sg-data/xstal.html • NMR • Structure determination data items • http:// deposit.pdb.org /mmcif/sg-data/nmr.html • Protein Production • http:// deposit.pdb.org /mmcif/sg-data/protprod.html

  22. Current Integration Strategy • Collect bits of mmCIF output from each program step • Merge the mmCIF data from each step • Use ADIT deposition tool to enter remaining data and check results • Make all data files available in the representation of the exchange dictionary

  23. Target Registration DatabaseTargetDB • http://targetdb.pdb.org/ • All targets downloadable in XML (~51,000 Targets) • Targets downloaded from 18 centers weekly • Target search by: • Sequence (FASTA), project target ID, project site, status (selected, cloned, expressed, … in PDB), update date, protein name, source organism • Report output in HTML, FASTA, and XML • Integrates PDB entry sequences (~55,600 sequences) • Includes PDB pre-release sequence data • Provides links to related sequence databases • Open to all Structural Genomics projects • Summary reports of target or project progress

  24. Beyond TargetDB PepcDB Protein Expression, Purification, and Crystallization Database All information about targets including the protocols for protein production

  25. Incremental Data Pipeline

  26. Current Query System WWW User Interfaces SearchFields SearchLite Query Result Browser Structure Explorer CGI INTEGRATION LAYER DB INTEGRATION LAYER FLAT FILES KEYWORDSEARCH DERIVED DATA CORE DB BMCD FTP tree (download) POM SYBASE LUCENE

  27. Biological Assembly View Structure page Tutorial at http://www.rcsb.org/pdb/biounit_tutorial.html 1AEW Horse Apoferritin Hempstead, P. D., Yewdall, S. J., Fernie, A. R., Lawson, D. M., Artymiuk, P. J., Rice, D. W., Ford, G. C., Harrison, P. M.: Comparison of the three-dimensional structures of recombinant human H and horse L ferritins at high resolution. J Mol Biol268pp. 424 (1997)

  28. Structure Explorer Summary Page Go to EC site Search by EC number Go to NCBI Taxonomy Go to PubMed Abstract Search by author Search for related citations Search by Chemical Component

  29. 3-tier Architecture Separates database, applications and presentation Supports high access rates on multiple machines Serves very large data sets Design of the New PDB Database

  30. Navigation Persistent Search Box Integrated Help (Context-sensitive) Getting Started Persistent Navigation Bar Hierarchical Menu Items Site Search

  31. Browsing Gene Ontology Enzyme Classification Taxonomy Disease Ligands CATH/SCOP

  32. Searching PubMed Abstracts

  33. Detailed Reports

  34. Molecular Visualization Simple viewer built from Molecular BiologyToolkit http://mbt.sdsc.edu Envisioned to be a future query interface, e.g. “what other structures contain this ligand?” Molecular Biology Toolkit authors: John Moreland and Apostol Gramada 4HHB Fermi, G., Perutz, M. F., Shaanan, B., Fourme, R.: The crystal structure of human deoxyhaemoglobin at 1.74 A resolution. J Mol Biol175pp. 159 (1984)

  35. http://www.wwpdb.org/ • Worldwide PDB (wwPDB) • RCSB (Research Collaboratory for Structural Bioinformatics) • PDBj (Osaka University) • Macromolecular Structure Database (EBI) • To ensure that PDB files remain in a single archive to best serve the worldwide community of depositors and users

  36. http://www.pdb.org/ Operated by three members of the RCSB: Rutgers, The State University of New Jersey; San Diego Supercomputer Center at the University of California, San Diego; Center for Advanced Research in Biotechnology/UMBI/NIST. The RCSB PDB is supported by funds from the National Science Foundation (NSF), the National Institute of General Medical Sciences (NIGMS), the Office of Science, Department of Energy (DOE), the National Library of Medicine (NLM), the National Cancer Institute (NCI), the National Center for Research Resources (NCRR), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), and the National Institute of Neurological Disorders and Stroke (NINDS).

  37. RCSB PDB MSD-EBI PDBj at Osaka

  38. RCSB-PDB Team RCSB PDB Team: Ken Addess, Helen M. Berman, Wolfgang F. Bluhm, Phil Bourne, Kyle Burkhardt, Al Carlson, Li Chen, Sharon Cousin, Nita Deshpande, Shuchismita Dutta, Zukang Feng, Lew-Christiane Fernandez, Judith L. Flippen-Anderson, Gary Gilliland, Rachel Kramer Green,Vladimir Guranovic, Shri Jain,Jeff Merino-Ott, Rose Oughtred, Irina Persikova, Suzanne Richman, Melcoir Rosas, Kathryn Rosecrans, Bohdan Schneider, Wayne Townsend-Merino, Elizabeth Walker, John Westbrook, Huanwang Yang, Jasmin Yang, Christine Zardecki

More Related