High-Throughput Protein Structure Determination using Shotgun Structural Proteomics

SHOTGUN STRUCTURAL PROTEOMICS RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON Given a heterogeneous mixture of proteins, how can we determine all their structures in a high-throughput and high-resolution manner?

MOTIVATION FOR DETERMINING PROTEIN STRUCTURE The functions necessary for life are undertaken by proteins. Protein function is mediated by protein three-dimensional structure. Knowing protein structure at high resolution will enable us to: Determine and understand molecular function. Understand substrate and ligand binding. Devise intelligent mutagenesis and biochemical experiments to understand biological function. Design therapeutics rationally. Design novel proteins. Knowing the structures of all proteins encoded by an organism’s genome will enable us to understand complex pathways and systems, and ultimately organismal behaviour and evolution. Applications in the area of medicine, nanotechnology, and biological computing.

HOW CAN WE DETERMINE STRUCTURE? One distance constraint for every six residues One distance constraint for every ten residues 0 2 4 6 Accuracy Experiment (X-ray, NMR) Computation (de novo) Computation (template-based) Hybrid (Iterative Bayesian interpretation of noisy NMR data with structure simulations) Cα RMSD

DISTANCE INFORMATION USING MASS SPECTROSCOPY MS MS Identify proteins with single crosslinks and fragment Identify crosslinked fragments Add crosslinkers VSKNT KEVN MS MKRS LVKQ Confirm sequence Repeat using different crosslinkers and isotope labelling

HOW AND WHY WILL THIS WORK? Perform experiments to obtain a number of distance constraints (one for every six residues for medium to high-resolution structures). Perform simulations based on high confidence constraints and use distance distributions from resulting structures to iteratively reinterpret the spectra (without repeating experiment) until we obtain a high-resolution structure. Computational aspects largely complete. Components of approach have been implemented by others in a limited way but are assembled here in a robust and unique manner. Method can handle: Impure protein purification (ex: structural genomics failures). Environment-dependent structures (ex: chaperones + effectors). Partially disordered proteins. Several proteins simultaneously (large scale). No need for proteolytic digestion (complicates things). Focus on structures from noisy data, unlike X-ray diffraction and NMR.

PLAN OF ACTION Begin computational studies using simulated data (with noise) and develop software to prioritise experiments (ex: crosslinker choices). Initial studies using UW Mass Spectrometry Center: Start with fairly pure mixtures >> not-so-pure mixtures >> 2-3 proteins >> handful of proteins >> Difficult proteins >> heterogenous mixtures >> whole proteomes. Advice from Aebersold, Kelleher. Team of 10-20 personnel working on crosslinking technology, protein enrichment, mass spectroscopy, structure calculation, parameterisation. Dedicated instrumentation through Pioneer Award, startup, MRI. Bayesian framework will be utilised to estimate accuracy/error: Avoid repeating past oversight with NMR. Obtain an R-factor like estimate as in X-ray diffraction. Comparison of generated spectra from models to actual spectra. Iterative reinterpretation of experimental data.

RECENT SUCCESSES AND SUITABILITY PROTEIN INHIBITOR DISCOVERY PROTEIN STRUCTURE DETERMINATION PROTINFO structure for 1aye 1.8 Å Cα RMSD for 70 residues http://protinfo.compbio.washington.edu Track record of notable successes (5 years). Excellent environment at UW/Seattle. Ability to unify components cohesively. Young and highly energetic. Right combination of computational skills and experimental design strategy to carry out the work. PROTEIN DESIGN/NANOTECHNOLOGY

OUTCOME AND EXPECTATIONS Structural genomics projects aim to obtain a representative structure of every protein family using X-ray diffraction and NMR methods and employ computational methods to fill in the gaps. However, several families of proteins will not be accessible by these structure determination methodologies, and computational methods alone are far from capable of consistently producing high resolution structures. Even in successful cases, the effect of the biological environment on protein structure is not accounted for. Our hybrid approach, which complements existing structural genomics efforts, will be used to rapidly obtain structures for entire proteomes in biologically relevant environments.

WHY ARE CURRENT METHODS NOT ADEQUATE? The major bottlenecks for both X-ray diffraction and NMR studies is producing sufficient quantities of the protein in a pure form to perform the experiments. Deviations from ideal behaviour in a protein sample result in slow and labour-intensive structure determination, if at all possible. These major structure determination techniques were developed at a time when our worldview of proteins was simple and did not account for environment-dependent structure formation, protein dynamics and conformational changes, and post-translational modifications. The vast majority of proteins will therefore be inaccessible to X-ray diffraction and NMR studies. Computational approaches do not have the resolution of experimental approaches and lack consistency.

CROSSLINKING POSSIBILITIES Seven chemical groups that can be crosslinked: amines (2), carboxyls (3), and thiols (2). Numerous distances for the ~42 (7 x 6) possible pairs of groups. For every 100 residues, there may be up to ten members of each group, but typically only one crosslink is possible at a particular distance out of the ~100 possible pairs. For every 100 residues, the total number of groups is ~20-40, resulting in a potential yield of 400-1600 distance constraints if all crosslink possibilities can occur.

DISTANCE INFORMATION USING KNOWN STRUCTURES Residue specific all-atom probability discriminatory function (RAPDF) distance bins Known structures atom-atom contacts AO AN AC … YOH 167 X167 contacts AO AN AC ... YOH AO AN AC … YOH s(dab) for contacts AO AN AC ... YOH Candidate structure atom-atom contacts AO AN AC … YOH NxN contacts AO AN AC ... YOH

STRUCTURES FROM SIMULATIONS USING RAPDF PROTINFO AB CASP6 prediction for T0281 4.3 Å Cα RMSD for all 70 residues (continuous RAPDF produces 2.1 Å RMSD structure) PROTINFO CM CASP6 prediction for T0271 2.4 Å Cα RMSD for all 142 residues (46% ID) Good correlation between RAPDF score and accuracy of structure. RAPDF is one of the first all-atom knowledge-based functions and is a standard by which other scoring functions are compared. RAPDF has contributed to our success at CASP when combined with our simulation protocols to sample protein conformational space efficiently.

DISTANCE INFORMATION USING NMR Nucleii of proteins emit RF radiation measured in the form of chemical shifts. Primary source of distance information between protons is due to NOE. Steps: experiment (labourious), chemical shift assignment (automated), peak assignment (nontrivial), and structure determination (partially automated) . • HHN N • Peak coordinates: 1.2359.738130.97 • Protons with consistent chemical shifts: • 43 VAL HG1 1.256 - - • 8 ILE HN9.748 130.95 59 LEU HB3 1.242 - - Bayesian estimation of contact probabilities: Prior Post. Dist. 43 VAL HG1 - 8 ILE HN 0.038 0.75 4.6 Å 59 LEU HB3 - 8 ILE HN 0.002 0.05 8.0 Å

STRUCTURES USING COMPUTATION AND EXPERIMENT PROTINFO NMR structure for mjnop 3.5 Å Cα RMSD for 50 residues (required manual interpretation for several months) PROTINFO NMR structure for 1aye 1.8 Å Cα RMSD for 70 residues Bayesian approach calculates the probability distribution of each NOE peak contributing to proton-proton distances in a protein. Approach is assignment free, fast, fully automated, tolerant of noise, incompleteness and ambiguity, and enables iterative reinterpretation of source experimental data based on simulated structures (90% complete).

DISTANCE INFORMATION USING MASS SPECTROSCOPY Add labelled and unlabelled crosslinkers to a heterogeneous mixture of proteins Relative abundance mass/charge Repeat with different fragmentation resolution, crosslinker types, isotope labelling MS Enrich (LC, biotin) Relative abundance mass/charge For each peak representing a protein with a single crosslinker: fragment MS Identify peaks consistent with crosslinked fragments and obtain distance constraints

INTERPRETING MASS SPECTRA …AKRS…LKYVT…SKL…ARKT… (4 x 3 = 12 possibilities, one true contact) AKR-LK ARK-KL AKRS-LKY Relative abundance Relative abundance mass/charge mass/charge Ambiguous peaks in spectra are disambiguated (either eliminated or prioritised) using different fragmentation resolution, database preferences, and iterative reinterpretation after structure simulations AKR-LK ARK-KL AKR-LK ARK-KL AKR-SK? Relative abundance Relative abundance mass/charge mass/charge Spurious peaks in spectra are eliminated using isotope labelling (look for precise shifts)

DISTANCE INFORMATION USING FRET Analogous to MS approach, but instead of peaks representing mass/charge ratios that identify two crosslinked residues (indirect distance information), we can obtain direct distance information. Express protein in an in vitro system to ensure single flurophore donor/acceptor pair for two residues in a protein. Use confocal microscopy setup to measure energy transfer for many donor/acceptor pairs. Distance is based on donor/acceptor type can be obtained for any pair of residues that do not cause loss of structure (determined by consistency across many pairs); tangential benefit of identifying structurally important residues. Ideal for measurement of long range distances and for large proteins.

High-Throughput Protein Structure Determination using Shotgun Structural Proteomics

High-Throughput Protein Structure Determination using Shotgun Structural Proteomics

Presentation Transcript

Protein Structure Prediction Ram Samudrala University of Washington

An Integrated Computational Framework for Systems Biology Ram Samudrala University of Washington

Modelling genome structure and function Ram Samudrala University of Washington

COMPUTATIONAL ENGINEERING OF BIONANOSTRUCTURES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON

SHOTGUN STRUCTURAL PROTEOMICS RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON Given a heterogeneous mixture o

Modelling, comparison, and analysis of proteomes Ram Samudrala University of Washington

Modelling the rice proteome Ram Samudrala University of Washington

Modelling Genome Structure and Function Ram Samudrala University of Washington

MODELLING INTERACTOMES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON

Computational engineering of bionanostructures Ram Samudrala University of Washington

COMPUTATIONAL BIOLOGY IN DRUG DISCOVERY RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON

INTERACTOMICS RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON

Modelling the rice proteome Ram Samudrala University of Washington

Modelling, comparison, and analysis of proteomes Ram Samudrala University of Washington

SHOTGUN STRUCTURAL PROTEOMICS RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON

Modelling proteomes Ram Samudrala University of Washington

Modelling Genome Structure and Function Ram Samudrala University of Washington

MODELLING INTERACTOMES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON

COMPUTATIONAL VACCINE DESIGN RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON

COMPUTATIONAL ENGINEERING OF BIONANOSTRUCTURES RAM SAMUDRALA ASSOCIATE PROFESSOR

Modelling genome structure and function Ram Samudrala University of Washington