1 / 1

From crystals to pdb: building a high throughput crystallography pipeline for structural genomics

HT Pipeline Processes, Bottlenecks and Leaks. target selection. HT Expression. HT Purification. HT Crystallization. HT Imaging. struc. validation. struc. refinement. annotation. publication. xtal screening. bl xtal mounting. data collection. phasing. tracing. imaging.

brand
Télécharger la présentation

From crystals to pdb: building a high throughput crystallography pipeline for structural genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HT Pipeline Processes, Bottlenecks and Leaks target selection HT Expression HT Purification HT Crystallization HT Imaging struc. validation struc. refinement annotation publication xtal screening bl xtal mounting data collection phasing tracing imaging crystallization expression purification harvesting cloning PDB From crystals to pdb: building a high throughput crystallography pipeline for structural genomics Chiu HJ1, Wolf G1, West W2, van den Bedem H1, Miller MD1, Zhang Z1, Morse A2, Wang X2, Xu Q1, Levin I1, von Delft F3, Elsliger MA3, Godzik A2, Grzechnik SK2 and Deacon AM1 1Stanford Synchrotron Radiation Laboratory, 2575 Sand Hill Road, Menlo Park, CA 94025. 2University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 3The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037 Installation of a Microsource X-ray generator at 9-2 Automation of protein model completion: an inverse kinematics approach Target Selection The Structure Determination Core (SDC) of the Joint Center for the Structural Genomics (JCSG) is dedicated to developing technologies, which streamline all the steps in the structure determination process from crystals to PDB-ready atomic coordinates. Over the last year the JCSG production capacity has increased dramatically. SDC has screened more than 7000 crystals from 192 protein targets. A total of 232 datasets from 106 targets have been collected and 90 structures have been solved. In order to handle the rapidly growing flow of experimental data, we have developed a set of crystallographic and database tools to both track and streamline our workflow. Crystal cassettes are shipped to SDC from the Crystallomics Core. All relevant crystal information is captured in the central JCSG database and is downloaded in a “Beamline Report”. Crystals are screened automatically using the Stanford Auto-Mounter and Blu-Ice software. The visual and diffraction properties of each crystal are recorded. A computer program, DISTIL, is under development to automatically analyze diffraction images and provide an objective screening evaluation for each crystal. The best crystals for each target are flagged for data collection. A computer program, Xsolve, is used for automatic crystallographic data processing and structure solution. A model building tool providing crystallographers with the best possible initial model for refinement is under development. The results of the analysis are uploaded to a Structure Solution Tracking System. A Refinement Tracking System requests weekly updates and collects all the data necessary for a peer-review Quality Control step, before the coordinates are deposited to the Protein Data Bank. • Manually Finalizing Model: • Labor intensive, time consuming • Existing aids are highly interactive Data flow parallels the experimental pipeline, harvesting ~300 parameters from 19 stages • Automatically Build Backbone Fragments: • Build candidate closing conformations using IK techniques (robotics) • Rank according to electron density fit and conformational likelihood • Subject top-ranking candidates to real-space, torsion angle SA refinement 1st Generation Hardware 6th Generation Software HT Data Collection 1st Generation Prototype 3rd Generation Software Increased screening capacity during SSRL shutdown Leverage existing infrastructure X-ray MicroMax-002 generator installed June 2003 SSRL automated screening system used >4200 crystals screened in 9 months • Results: • Closed missing fragments of up to 12 residues in length to within 0.6A all-atom RMSD in 2.8A-model HT Structure Determination 2nd Generation Structure Validation & Deposition Autosubmission of electronic publication Lotan et al. submitted van den Bedem et al. in preparation All data uploaded to JCSG DB Screening, collection and structure solution Work closely with BIC on implementation and debugging Still more features needed to handle expanding production The Joint Center for Structural Genomics JCSG production statistics (August 10, 2004) Active crystal report Total Crystals Screened at SDC 10778 Unique Targets Represented 356 TM/non-TM targets 299/57 Datasets collected 394 (288 TM, 106 non-TM) Unique Targets Represented 194 TM/non-TM targets 146/48 Structures solved 155 (94 MAD; 51 MR; 3 SAD; 7 NMR) (125 TM: 30 non-TM) Structure solution tracking Local SDC “dataset” database Mission: To establish a robust and scalable protein structure determination pipeline that will form the foundation for a large-scale cost effective production center for structural genomics. Xsolve: automation of structure determination All relevant crystal information is captured in the central JCSG database in the form of Beamline Report JCSG production statistics (August 10, 2004) Structural Genomics of Thermotoga maritima Autoindex Integrate Scale Solve Trace Diffraction properties Resolution Spot quality Diffraction strength MosflmAutoindex Mosflm Integrate Scala Scale Resolve Trace Solve Solve Growing reliance on the JCSG DB 500 crystals and 8 structures per month 20 cassettes (2000 crystals) inventory 30-40 structures in refinement Beamline 2.0 TB of diffraction images 0.5 TB of processing files >100,000 diffraction images • Main goals • Handle majority cases • Organize data and workflow • Ease information flow to JCSG DB • Allow integration of new programs. • Use parallel execution of jobs can be searched by Shipment ID Dewar Target ID Cassette/puck More to come… 22 targets: data collected, not yet solved 92 targets: diffraction better than 3.5Å, not yet solved Solve P4222 2 mols 3 . . . Solve P422 2 mols 3 . . . Solve P4122 1 mol 2  . . . Solve P4122 2 mol 3 . . . Solve P4222 1 mol 2  . . . Solve P422 1 mol 2  . . . 2004 developments Improve success rate: better autoindexing, determine optimal resolution for scaling sweeps More general: handle crystallographic details: re-indexing screw axes, merging sweeps More robust operation: catch timeouts, core dumps, infinite loops etc Implement parallelization: develop tools to monitor and control processing on a Linux cluster New program support: HKL2000, SHARP, SHELXD (not completely tested) Average resolution of structures in PDB2.0A Average protein chain length 260 aa Average number of residues in asu 480 aa Target ID Crystallization codition Visual properties Robust and automated crystal screening Acknowledgements Refinement Tracking System UCSD Bioinformatics Core John Wooley Adam Godzik Susan Taylor Slawomir Grzechnik Bill West Andrew Morse Jie Quyang Xianhong Wang Jaume Canaves Lukasz Jaroszewski Robert Schwarzenbacher Marc Robinson Rechavi Chris Edwards Olga Kirillova Ray Bean, Josie Alaoen GNF & TSRI Crystallomics Core Ray Stevens Scott Lesley Rebbeca Page Carina Grittini Glen Spraggon Andreas Kreusch Michael DiDonato Daniel McMullan Heath Klock Polat Abdubek Eileen Ambing Tanya Biorac Joanna C. Hale Justin Haugen Mike Hornsby Eric Koesema Edward Nigoghossian Kevin Quijano Megan Wemmer Aprilfawn White Juli Vincent Jeff Velasquez Kin Moy Vandana Sridhar Bernard Collins Thomas Clayton Stanford /SSRL Structure Determination Core Keith Hodgson Ashley Deacon Britt Hedman Guenter Wolf Mitch Miller Henry van den Bedem Qingping Xu Herbert Axelrod Christopher Rife Inna Levin R. Paul Phizackerley Amanda Prado John Kovarik Ross Floyd Irimpan Mathews Michael Solits Aina Cohen Paul Ellis T.maritima genome Initial design to production Large-scale capacity Shipping, storage and screening Used by JCSG since June 2002Implemented on all SSRL beamlines Category Number % Category Number % Scientific Advisory Board Carl-Ivar Brändén, Karolinska Inst., Stockholm (retired 2003) Elbert Branscomb, DOE Joint Genome Inst., Walnut Creek Stephen Cusack, EMBL – Outstation Grenoble Leroy Hood, Inst. for Systems Biology, Seattle John Kuriyan, U.C. Berkeley Erkki Ruoslahti, The Burnham Institute James Wells, Sunesis Pharmaceuticals, Inc. Charles Cantor. Sequenom, Inc. Todd Yeates, UCLA-DOE, Inst. for Genomics and Proteomics James Paulson, Consortium for Functional Glycomics, The Scripps Research Institute Integration with BLU-ICE Automated sample mounting Automated sample alignment Automated diffraction images Nucleic acid binding DNA binding DNA repair DNA replication factor Transcription factor RNA binding Structural Ribosomal protein Translation factor Motor Enzyme 170 109 11 3 37 43 52 12 5 600 9.2 5.9 0.5 0.1 1.9 2.3 2.8 0.6 0.2 32.4 Peptidase Protein Kinase Protein Phosphatase Signal transducer Cell adhesion Structural Protein Transporter Ion channel Ligand Binding or carrier Electron transporter Unknown or unclassified 27 17 8 32 1 61 202 3 255 52 713 1.5 0.9 0.4 1.7 0.0 3.3 10.9 0.2 13.8 2.8 38.5 Cassette kits distributed to PX user groups • A system to test the pipeline • Small bacterial genome • 1877 gene products • Proteins should express well in E. coli • Proteins from a thermophile may be more stable • Process entire genome • Establish trends in process e.g. crystallization. TSRI Administrative Core Ian Wilson Peter Kuhn Marc Elsliger Frank von Delft Tina Montgomery Gye Won Han Rong Chen Angela Walker Exploratory Projects Kurt Wüthrich (NMR) Linda Columbus Touraj Etezady-Esfarjani Wolfgang Peti Virgil Woods (DXMS) Total 1877 100% NIH Protein Structure Initiative Grant P50 GM62411

More Related