Automated Protein Structure Determination Pipeline at the Joint Center for Structural Genomics

Each project moves through target selection through publication along the Target Pipeline. The Joint Center for Structural Genomics The Role of the Structure Determination Core in the JCSG • 1. Screen Crystals and Collect Data • Use Stanford AutoMounter to rapidly screen crystals without human intervention • Control data collection using Blu-Ice software interface • 2. Automatically Process Data (Xsolve, by G. Wolf) • Use common crystallographic programs running in parallel to accelerate structure determination Autoindex Integrate Scale Solve Trace • 3. Refine and Evaluate Structures • Automate protein model completion (Xpleo, by van den Bedem) • Develop automated quality control procedure • 4. Publish Structures • Partially automated structure note generation Mission: To establish a robust and scalable protein structure determination pipeline that will form the foundation for a large-scale cost effective production center for UCSD Bioinformatics Core John Wooley Adam Godzik Slawomir Grzechnik Lukasz Jaroszewski Sri Krishna Subramanian Andrew Morse Tamara Astakhova Lian Duan Piotr Kozbial Dana Weekes Natasha Sefcovic Prasad Burra Josie Alaoen Cindy Cook GNF & TSRI Crystallomics Core Scott Lesley Mark Knuth Dennis Carlton Thomas Clayton Kevin D. Murphy Christina Trout Marc Deller Daniel McMullan Heath Klock Polat Abdubek Claire Acosta Linda M. Columbus Julie Feuerhelm Joanna C. Hale Thamara Janaratne Hope Johnson Edward Nigoghossian Linda Okach Sebastian Sudek Aprilfawn White Ylva Elias Glen Spraggon Bernhard Geierstanger Sanjay Agarwalla Charlene Cho Bi-Ying Yeh Anna Grzechnik Jessica Canseco Mimmi Brown Stanford /SSRL Structure Determination Core Keith Hodgson Ashley Deacon Mitchell Miller Herbert Axelrod Hsiu-Ju (Jessica) Chiu Kevin Jin Christopher Rife Qingping Xu Silvya Oommachen Henry van den Bedem Scott Talafuse Ronald Reyes Abhinav Kumar Christine Trame Debanu Das Scientific Advisory Board Sir Tom Blundell Univ. Cambridge Homme Hellinga Duke University Medical Center James Naismith The Scottish Structural Proteomics facility Univ. St. Andrews James Paulson Consortium for Functional Glycomics, The Scripps Research Institute Robert Stroud Center for Structure of Membrane Proteins, Membrane Protein Expression Center UC San Francisco Soichi Wakatsuki Photon Factory, KEK, Japan James Wells UC San Francisco Todd Yeates UCLA-DOE, Inst. for Genomics and Proteomics Exploratory Projects Kurt Wüthrich Reto Horst Maggie Johnson Amaranth Chatterjee Michael Geralt Wojtek Augustyniak Pedro Serrano Bill Pedrini William Placzek NIH Protein Structure Initiative Grant U54 GM074898 Comparative analysis of novel proteins from the CATH family of zinc peptidases Debanu Das1,2, Abhinav Kumar1,2, Lukasz Jaroszewski1,3 and Ashley Deacon1,2 1Joint Center for Structural Genomics, 2Stanford Synchrotron Radiation Laboratory, Menlo Park, CA 94025, 3Burnham Institute, La Jolla, CA, 92037 Biomedical theme: Central Machinery of Life - proteins conserved in all kingdoms of life Biological theme: Complete coverage of Thermotoga maritima III. General structure and biochemistry These metallopeptidases show a high degree of structural conservation in the CATH domain which has a α/β/α sandwich architecture. The active site usually comprises of histidines and carboxylates interacting with two zinc ions. Despite the variety of molecular functions and substrate specificities of these proteins, the catalysis most likely involves a hydroxyl ion ligand involved in a nucleophilic attack. The full proteins often oligomerize and display some differences in their oligomerization state, however, the exact role of the oligomer in the molecular functionis still unclear. In some cases, dimer formation results inassembly of a productive catalytic site. Figure of the representative CATH structure from http://cathwww.biochem.ucl.ac.uk/cgi-bin/cath/GotoCath.pl?cath=3.40.630.10 I. Introduction II. Background and Significance CATH 3.40.630.10 proteins belong to PFAM clan CL0035 (Peptidase MH/MC/MF), and MEROPS peptidase (also termed proteases/proteinases/proteolytic enzymes) database clan MH/MC/MF of metallopeptidases. CL0035 has 7591 proteins in 8 Pfams: These proteins are involved in a variety of proteolytic activities, have a range of substrate specificities and are present in numerous microbial organisms, many of which are important human pathogens like S. aureus, S. typhimurium, T. vaginalis, M. tuberculosis, N. gonorrhea, N. meningitidis, C. trachomatis, G. intestinalis, and E. coli. Several of these proteins have been investigated for their therapeutic potential and diseases roles (Canavan’s disease, cancer therapy and prohormone/propeptide processing). IV. Progress of structure determination As part of its mission to increase structural coverage of protein families, JCSG is targeting proteins from the large CATH homologous superfamily 3.40.630.10 of zinc peptidases, which belong to the phosphorylase/hydrolase-like fold in SCOP and are comprised of proteins from several Pfam families (the peptidase_MH clan). Hidden Markov Models from the CATH database were used to identify sequences in the JCSG genome pool. PSI-Blast seeded with sequences of these CATH family members were used to find additional proteins. These two sets contained 226 unique targets. After removing targets with more than 30% sequence identity to any PDB structure or to any crystallized target from a structural genomics center, 161 targets remained. Further clustering at 90% (in order to avoid nearly identical sequences), yielded a set of 137 targets. To date we have solved 7 structures from this CATH family and 7 other targets have been crystallized. In addition, 16 structures have been solved by other worldwide structural genomics centers. We present our progress towards complete structural coverage of this family, highlighting common and variant structural features that support different molecular and cellular roles, focusing on active site residues, ligand binding, protein size and oligomerization state. This analysis may provide insights into structural themes that dictate protein function and also allows modeling of protein structures related by sequence. Our structures serve as a nucleation point for the design of further structure-based experiments to probe the biochemical and biomedical roles of these proteins. Current status of 137 targets Distribution of selected targetsacross Pfam families All targets selected in March 2007 Targets assigned in PfamA Targets unassigned in PfamA * * PFAM assigned based on sequence homology detected with FFAShttp://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl There are 3 targets not assigned by PfamA or FFAS. ** 7 targets indicated show significant FFAS match to both PF04389 and PF05450, possibly distant bacterial homologs to the eukaryotic nicastrin family. V. Structures solved by JCSG 2QJ8.pdb (HP10622H), 2.0Å, R/Rf= 20.7/25.4%, Unknown function, PF04952 Homolog involved in Canavan’s disease 2QVP.pdb (HP10645A), 2.0Å, R/Rf= 16.1/21.3% Unknown function, PF04952 Structure suggests target may be closer in homology To PF00246 proteins 2FVG.pdb (TM1049), 2.01Å, R/Rf= 20.3/24.4% Endoglucanase, PF05343 27 close homologs from important human pathogens 3B2Y.pdb (HP10645E), 1.74Å, R/Rfr=17.45/21.51% Unknown function, PF04952, Ni+2 bound Structure suggests target may be closer in homology To PF00246 proteins 2QYV.pdb (HP9625C), 2.11Å, R/Rf= 22.0, 24.4% Putative Xaa-His dipeptidase, PF01546, Zn+2 bound 7 close homologs from important human pathogens HP10625B, 2.3Å, work in progress PF01546 50 close homologs from important human pathogens Potential in cancer therapy 2RB7.pdb (HP1666A), 1.6Å, R/Rfr=15.4/18.0% Unknown function, PF01546 48 close homologs from important human pathogens Potential in cancer therapy XI. First structure of a dipeptidase in clan MH, 2QYV/HP9625C reveals a dimer VI. Phylogenetic tree and structure tree VII. Comparison of two proteins with >30% sequence identity within the same Pfam PF01546: 1CG2, 2RB7 1CG2:C-terminal glutamate moiety from folic acid and its analogues, such as methotrexate 2RB7: Unknown function, JCSG Common core ~290 aa, RMSD ~3.0 Å X. Active site comparisons 2RB7 (cyan)and1CG2, PF01546. Functions of proteins with solved structures and >30% seq id include diaminopimelate biosythesis (component of cell wall and lysine biosynthesis) dapE gene: succinyl-diaminopimelate desuccinylase activity; Carboxypeptidase G2: cleaves C-terminal glutamate moiety from folic acid and its analogues, such as methotrexate; N-acetyl-L-citrulline deacetylase; Peptidase T: tripeptidase, hydrolyzes tripeptides at their N-termini The 2QYV (PepD, MEROPS M20.007, clan MH, subfamily C) monomer is very similar in structure to the 1LFW monomer (PepV, MEROPS M20.004, subfamily A). Both are dipeptidases belonging to PF01546. However, 1LFW is known to function as a monomer in which the molecular structure mimics that of a dimer seen in most other proteins in this Pfam. PepD in E. coli and Prevotella albensis is seen to function as a dimer. 2QYV represents the first crystal structure of a PepD, revealing it to be dimeric in the crystal structure as well as by size exclusion chromatography. This novel structure serves as a starting point for further experiments to probe the effect of dimer formation on protein function. Sequence with >30% identity within a particular Pfam also cluster together in structure space Based on this information, it would now be possible to perform targeted experiments to determine substrate for the function of 2RB7, perform structure-based site-directed mutagenesis experiments and to also explore possiblity of exploiting therapeutic potential Acknowledgements Active site in 2RB7 For structures that cluster together at 30% level, structural conservation in the common core is the highest, Generally only slight rearrangement of secondary structural elements is observed (within the domain). fatcat.burnham.org/POSA http://www.phlogeny.fr VIII. Proteins with <30% sequence id. within the same Pfam PF01546: 2RB7, 2QYV (green) Common core ~250 aa, RMSD ~3.0 Å IX.Comparison between different Pfams 2RB7, 1XJO (brick), 2QJ8 (gold) Active site is 1CG2 is H112, D141, E200, E176, H385 Based on this, puutative active site in 2RB7 is H72, D99, D100, E138, E139, D162 Hydrolysis of methotrexate by 1CG2, implications in cancer and gene therapy Common core ~100 aa, RMSD 3.69 Å Least amount of common conserved core when structures in different Pfams in the same Pfam clan and CATH family are compared • XII. Inferences and further work: • In the quest for increasing structural coverage across protein families, it is expected that proteins similar in sequence within a protein family will be similar in structure. Increasing structural coverage provides better templates for modeling other proteins. The comparative structural analysis presented here provides experimental verification of the validity of this approach. • The 7 structures presented here provide a basis for enhancing the modeling of 2177 out of 7591 proteins (~29%) belonging to this Pfam clan. Furthermore, 3 of these JCSG structures provide the first examples of structures for proteins within a particular sequence cluster (2QYV, 2QJ8 and 3B2Y) and thus provide the basis for modeling 384 unique proteins (10 from organisms listed as top human pathogens) belonging to these 3 clusters from 2 different Pfams (PF01546 and PF04952). • 2QYV/HP9625C represents the first crystal structure of a dipeptidase PepD showing a dimer • Further analysis will be performed to try to understand evolutionary relationships between these proteins based on sequence-based phylogenetic trees and structure-based trees. • Attempts will be made to investigate use of these structures and their comparative analyses in understanding structural basis for enzyme function and substrate specificities by analysis of active site amino acids, and to attempt to exploit information for therapeutic purposes. Common core ~190 aa, RMSD ~3.0 Å PF04952: 2QJ8, 3B2Y (cyan) Larger rearrangements and extensions of secondary structural elements. Inserts and novel features more common. http://fatcat.burnham.org/POSA The JCSG is funded by the Protein Structure Initiative of the National Institutes of Health, National Institute of General Medical Sciences.SSRL operations is funded by DOE BES, and the SSRL Structural Molecular Biology program by DOE BER, NIH NCRR BTP and NIH NIGMS.

Automated Protein Structure Determination Pipeline at the Joint Center for Structural Genomics

Automated Protein Structure Determination Pipeline at the Joint Center for Structural Genomics

Presentation Transcript

Acknowledgements

Acknowledgements

Acknowledgements

Acknowledgements

Acknowledgements

Acknowledgements

Acknowledgements

Acknowledgements

Acknowledgements

Acknowledgements

Acknowledgements

Acknowledgements

Acknowledgements

Acknowledgements

Acknowledgements

Acknowledgements

Acknowledgements

Acknowledgements