270 likes | 309 Vues
Comparative modeling with MODELLER. http://salilab.org/modeller/ Ben Webb, Andrej Sali Lab UC San Francisco Maya Topf, Birkbeck College, London. Comparative modeling overview. Why build comparative models? Many more sequences available than structures (millions vs. tens of thousands)
E N D
Comparative modeling with MODELLER http://salilab.org/modeller/ Ben Webb, Andrej Sali Lab UC San Francisco Maya Topf, Birkbeck College, London
Comparative modeling overview • Why build comparative models? • Many more sequences available than structures (millions vs. tens of thousands) • Many applications (e.g. determination of function) rely on structural information • Structure is often more conserved than sequence (in 2005, 900K of 1.7M structures modeled), since evolution tends to preserve function
Comparative modeling overview • How does it work? • Extract information from known structures (one or more templates), and use to build the structure for the ‘target’ sequence • Should also consider information from other sources: physical force fields, statistics (e.g. PDB mining) • Classes of methods for comparative modeling • Assembly of rigid bodies (core, loops, sidechains) • Segment matching • Satisfaction of spatial restraints
Comparative modeling by satisfaction of spatial restraints - MODELLER A. Šali & T. Blundell. J. Mol. Biol. 234, 779, 1993. J.P. Overington & A. Šali. Prot. Sci. 3, 1582, 1994. A. Fiser, R. Do & A. Šali, Prot. Sci., 9, 1753, 2000.
1. Align sequence with structures • First, must determine the template structures • Simplistically, try to align the target sequence against every known structure’s sequence • In practice, this is too slow, so heuristics are used (e.g. BLAST) • Profile or HMM searches are generally more sensitive in difficult cases (e.g. Modeller’s profile.build method, or PSI-BLAST) • Could also use threading or other web servers • Alignment to templates generally uses global dynamic programming • Sequence-sequence: relies purely on a matrix of observed residue-residue mutation probabilities (‘align’) • Sequence-structure: gap insertion is penalized within secondary structure (helices etc.) (‘align2d’) • Other features and/or user-defined (‘salign’) or use an external program
2. Extract spatial restraints • Spatial restraints incorporate homology information, statistical preferences, and physical knowledge • Template Cα- Cα internal distances • Backbone dihedrals (φ/ψ) • Sidechain dihedrals givenresidue type of both targetand template • Force field stereochemistry(bond, angle, dihedral) • Statistical potentials • Other experimental constraints • etc.
3. Satisfy spatial restraints • All information is combined into a single objective function • Restraints and statistics are converted to an “energy” by taking the negative log • Force field (CHARMM 22) simply added in • Function is optimized by conjugate gradients and simulated annealing molecular dynamics, starting from the target sequence threaded onto template structure(s) • Multiple models are generally recommended; ‘best’ model or cluster or models chosen by simply taking the lowest objective function score, or using a model assessment method such as Modeller’s own DOPE or GA341, fit to EM density, or external programs such as PROSA or DFIRE
Typical errors in comparative models Incorrecttemplate Misalignment MODEL X-RAY TEMPLATE Region without a template Distortion/shifts in aligned regions Sidechain packing Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000.
Model Accuracy as a Function ofTarget-Template Sequence Identity Sánchez, R., Šali, A. Proc Natl Acad Sci U S A. 95 pp13597-602. (1998).
Cα equiv 122/137 RMSD 1.34Å Cα equiv 147/148 RMSD 0.41Å Cα equiv 90/134 RMSD 1.17Å HIGH ACCURACY MEDIUM ACCURACY LOW ACCURACY NM23 Seq id 77% CRABP Seq id 41% EDN Seq id 33% Scope for improvement: Sidechains Sidechains Core backbone Loops Sidechains Core backbone, Loops Alignment, Fold assignment X-RAY /MODEL Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000. Model accuracy
Applications of protein structure models D. Baker & A. Sali. Science 294, 93, 2001.
Loop modeling • Often, there are parts of the sequence which have no detectable templates (usually loops) • “Mini folding problem” – these loops must be sampled to get improved conformations • Database searches only complete for 4-6 residue loops • Modeller uses conformational search with a custom energy function optimized for loop modeling (statistical potential derived from PDB) • Fiser/Melo protocol (‘loopmodel’) • Newer DOPE + GB/SA protocol (‘dope_loopmodel’)
Accuracy of loop models as a function of amount of optimization
Problem:the structures may exhibit conformational changes (induced fit, target-template differences). • Solution: use flexible fitting to refine the structures in the map. ΔG refinement Fitting Structural Models in cryoEM maps • Problem: comparative models are often inaccurate. • Solution: Use cryoEM maps to assess the modelsbyrigid density fitting. • Problem:the resolution of the map can be too low for an unambiguous placement of a component. • Solution:use additional information to determine the assembly architecture. Topf & Sali. Curr Opin Struct Biol 2005.
Incorrecttemplates Rigid-body movements Misalignments Regions without a template Distortion and shifts of aligned regions Sidechain packing 20 Å 2 Å 10 Å Errors in Comparative Modelingvs. Resolution Rigid fitting
………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… θ ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… θ native probe r native probe r Rigid Density Fitting with MODELLER/Mod-EM • LE - Local exhaustivesearch (rotations only or rotations+translations) • MC - Monte Carlo in translation, with exhaustive rotation • SMC - Scanning of the map to find regions with high CC; LE or MC search ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… probe native probe probe MC LE SMC Rigid fitting Topf, Baker, John, Chiu & Sali. J Struct Biol 2005.
R2=0.6-0.7 Ranking: Native (1dxt ): 1 Best model: 2 Template (1hbg): 132 (8 Å), 139 (12 Å) Fitting score (CC) Quality of fit vs. quality of model Structural overlap Rigid model-fitting Topf, Baker, John, Chiu & Sali. J Struct Biol 2005.
Best-fitting model Most accurate model Native structure Template 1cid:2rhe 12% seq. identity 10 Å resolution Structural Overlap (Rank by CC) (1) (101) (11) (0) 1.00 0.69 0.55 0.73 cryoEM Density Map Selects an Accurate Model Rigid fitting
B A Cα RMSD = 5.2Å Structural Overlap = 62% Cα RMSD = 9.1Å Structural Overlap = 36% 8i1b:4fgf 14% seq. identity 10 Å resolution Iterative Alignment, Model Building and CC-based Assessment X-ray Initial model (A) Final model (B) Rigid fitting Topf, Baker, Marti-Renom, Chiu & Sali. J. Mol. Biol., 2006.
Modeling, Rigid and Flexible Fitting Protocol Density-based real-Space refinement while maintaining correct stereochemistry initial model final model x-ray structure 33% sequence identity RMSD: 5.4 Å -> 3.5 Å Flexible fitting
Single component fitting result Multi-component optimization result 20 Å resolution Arranging Components in a CryoEM Map of their Assembly with Keren Lasker & Haim Wolfson Simultaneous optimization of multi-component assembly. Assembly architecture
Programs, servers and databases LS-SNP Web Server http://salilab.org/LS-SNP/ Predicts functional impact of residue substitution PIBASE Database http://salilab.org/pibase/ Contains structurally defined protein interfaces MODLOOP Web Server http://salilab.org/modloop/ Models loops in protein structures CCPR Center for Computational Proteomics Research http://www.ccpr.ucsf.edu MODBASE Database http://salilab.org/modbase/ Fold assignments,alignments models, model assessments for all sequences related to a known structure DBALI Database http://salilab.org/DBAli/ Contains a comprehensive set of pairwise and multiple structure-based alignments MODWEB Web Server http://salilab.org/modweb/ Provides a web interface to MODPIPE MODELLER Program http://salilab.org/modeller/ Implements most operations in comparative modeling EVA Web Server http://salilab.org/eva/ Evaluates and ranks web servers for protein structure prediction ICEDB Database/LIMS http://nysgxrc.org Tracks targets for structural genomics by NYSGXRC MODPIPE Program Automatically calculates comparative models of many protein sequences LIGBASE Database Ligand binding sites and inheritance (accessible through MODBASE) External Resources PDB, Uniprot, GENBANK, NR, PIR, INTERPRO, Kinase Resource UCSC Genome Browser, CHIMERA, Pfam, SCOP, CATH
Useful resources • http://salilab.org/bioinformatics_resources.shtml
For further examples… • http://salilab.org/modeller/tutorial/
References • Protein Structure Prediction: • Marti-Renom el al. Annu. Rev. Biophys. Biomol. Struct. 29, 291-325, 2000. • Baker & Sali. Science 294, 93-96, 2001. • Comparative Modeling: • Marti-Renom et al. Annu. Rev. Biophys. Biomol. Struct. 29, 291-325, 2000. • Marti-Renom et al. Current Protocols in Protein Science 1, 2.9.1-2.9.22, 2002. • Shen & Sali. Protein Science 15, 2507 - 2524, 2006. • Eswar et al. Current Protocols in Bioinformatics, Supplement 15, 5.6.1-5.6.30, 2006. • Madhusudhan et al, The Proteomics Protocols Handbook. Humana Press Inc.,831-860, 2005. • MODELLER: • Sali & Blundell. J. Mol. Biol. 234, 779-815, 1993. • Density fitting: • Topf et al. J Struct Biol 2005 • Topf et al. J Mol. Biol. 2006 • Topf & Sali Curr Opin Struct Biol 2006
Acknowledgements Tel Aviv University Haim Wolfson Keren Lasker Baylor College Wah Chiu Matt Baker Yao Cong Irina Serysheva Mike Schmid UCSF Andrej Sali Lab Narayanan Eswar Ursula Pieper M. S. Madhusudhan Marc Marti-Renom Roberto Sanchez (MSSM) Min-yi Shen Andras Fiser (AECOM) David Eramian Mark Peterson Francisco Melo (Catholic U.) Ash Stuart (Rampallo Coll.) Eric Feyfant (GI) Valentin Ilyin (NE) Frank Alber Bino John (Pitsburg U.) Fred Davis Andrea Rossi Tom Goddard (Chimera group)