1 / 32

Structure and Motion

This research focuses on determining, classifying, and predicting 3D protein structures and modeling molecular energy through simulation. It also explores techniques for matching and scoring structural motifs, and uses adaptive bounding volume hierarchies and chain trees for efficient computation.

ferdinandj
Télécharger la présentation

Structure and Motion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structure and Motion Jean-Claude LatombeComputer Science Department Stanford University NSF-ITR Meeting on November 14, 2002

  2. Stanford’s Participants • PI’s: L. Guibas, J.C. Latombe, M. Levitt • Research Associate: P. Koehl • Postdocs: F. Schwarzer, A. Zomorodian • Graduate students: S. Apaydin (EE), S. Ieong (CS), R. Kolodny (CS), I. Lotan (CS), A. Nguyen (Sc. Comp.), D. Russel (CS), R. Singh (CS), C. Varma (CS) • Undergraduate students: J. Greenberg (CS),E. Berger (CS) • Collaborating faculty: • A. Brunger (Molecular & Cellular Physiology) • D. Brutlag (Biochemistry) • D. Donoho (Statistics) • J. Milgram (Math) • V. Pande (Chemistry)

  3. Problem Domains Biological functions derive from the structures (shapes) achieved by molecules through motions  Determination, classification, and prediction of 3D protein structures  Modeling of molecular energy and simulation of folding and binding motion

  4. What’s New/Interesting for Computer Science? • Massive amount of experimental data • Importance of similarities • Multiple representations of structure • Continuous energy functions • Many objects forming deformable chains • Many degrees of freedom • Ensemble properties of pathways

  5. clustered data smalllibrary data set Importance of similarities Segmentation/matching/scoring techniques E.g.: Libraries of protein fragments[Kolodny, Koehl, Guibas, Levitt, JMB (2002)]

  6. Complexity 2.26 (50 fragments of length 7) 2.7805AcRMS Complexity 10 (100 fragments of length 5) 0.9146A cRMS 1tim Approximations real protein

  7. Alignment of Structural Motifs [Singh and Saha; Kolodny and Linial] Problem: Determine if two structures share common motifs: • 2 (labelled) structures in R3 A={a1,a2,…,an}, B={b1,b2,…,bm} • Find subsequences sa and sb s.t the substructures {asa(1),asa(2),…, asa(l)} {bsb(1),bsb(2),…, bsb(l)} are similar • Twofold problem: alignment and correspondence • Score  Approximation  Complexity

  8. [R. Singh and M. Saha. Identifying Structural Motifs in Proteins.Pacific Symp. on Biocomputing, Jan. 2003.] Iterative Closest Point (Besl-McKay) for alignment:  Score: RMSD distance

  9. [R. Singh and M. Saha. Identifying Structural Motifs in Proteins.Pacific Symp. on Biocomputing, Jan. 2003.] Trypsin Trypsinactivesite

  10. [R. Singh and M. Saha. Identifying Structural Motifs in Proteins.Pacific Symp. on Biocomputing, Jan. 2003.] Trypsin active site against 42Trypsin like proteins

  11. Multiple representations of structure ProShape software[Koehl, Levitt (Stanford),Edelsbrunner (Duke)]

  12. Statistical potentials for proteins based on alpha complex [Guibas, Koehl, Zomorodian] • Decoys generated using “physical” potentials • Select best decoys using distance information

  13. Continuous energy function • Many objects in deformable chains Many pairs of objects, but relatively few are close enough to interact  Data structures that capture proximity, but undergo small or rare changes • During motion simulation • - detect steric clashes (self-collisions) • find pairs of atoms closer than cutoff • find which energy terms can be reused

  14. Other application domains: • Modular reconfigurable robots • Reconstructive surgery

  15. Fixed Bounding-Volume hierarchies don’t work • Instead, exploit what doesn’t change: chain topology Adaptive BV hierarchies[Guibas, Nguyen, Russel, Zhang] [Lotan, Schwarzer, Halperin, Latombe] (SOCG’02) sec17

  16. Wrapped bounding sphere hierarchies[Guibas, Nguyen, Russel, Zhang] (SoCG 2002) • WBSH undergoes small number of changes • Self-collision: • O(n logn ) in R2 O(n2-2/d) in Rd, d 3

  17. ChainTrees[Lotan, Schwarzer, Halperin, Latombe] (SoCG’02)

  18. ChainTrees[Lotan, Schwarzer, Halperin, Latombe] (SoCG’02) Assumption: Few degrees of freedom change at each motion step (e.g., Monte Carlo simulation) Updating: Finding interacting pairs: (in practice, sublinear)

  19. (755) (755) (68) (68) (144) (144) (374) (374) ChainTreesApplication to MC simulation (comparison to grid method) m = 1 m = 5

  20. Many degrees of freedom Tools to explore large dimensional conformational (structure) spaces: - Structure sampling [Kolodny, Levitt]- Finding nearest neighbors [Lotan, Schwarzer]

  21. cab bbc a b c d Sampling structures by combining fragments[Kolodny, Levitt] Library of protein fragments  Discrete set of candidate structures

  22. a3 a6 a0 am a5 a1 a2 a4 Nearest neighbors in high-dimensional space[Lotan, Schwarzer] Find k nearest neighbors of a given protein conformation in a set of n conformations (cRMS, dRMS) Idea: Cut backbone into m equal subsequences

  23. Nearest neighbors in high-dimensional space[Lotan and Schwarzer] 100,000 decoys of 1CTF (Park-Levitt set) Computation of 100 NN of each conformation ~80% of computed NNs are true NNskd-tree software from ANN library (U. Maryland)

  24. Ensemble properties of pathways  Stochastic nature of molecular motion requires characterizing average properties of many pathwaysProbabilistic conformational roadmapsApplications to protein folding and ligand-protein binding [Apaydin, Brutlag, Guestrin, Hsu, Latombe]

  25. HIV integrase [Du et al. ‘98] 1- pfold pfold Example: Probability of Folding pfold “We stress that we do not suggest using pfold as a transition coordinate for practical purposes as it is very computationally intensive.” Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition Coordinate for Protein Folding” Journal of Chemical Physics (1998). Folded set Unfolded set

  26. vi Pij vj Probabilistic Roadmap [Apaydin, Brutlag, Hsu, Guestrin, Latombe] (RECOMB’02, ECCB’02) Idea: Capture the stochastic nature of molecular motion by a network of randomly selected conformations and by assigning probabilities to edges

  27. U: Unfolded set F: Folded set =1 =1 Probabilistic Roadmap • One linear equation per node • Solution gives pfold for all nodes • No explicit simulation run • All pathways are taken into account • Sparse linear system l k j Pik Pil Pij m Pim i Pii Let fi = pfold(i) After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm

  28. Probabilistic Roadmap Correlation with MC Approach • 1ROP (repressor of primer) • 2 a helices • 6 DOF

  29. Probabilistic Roadmap Computation Times (1ROP) Monte Carlo: Over 106energy computations Over 11 days of computer time 49 conformations Roadmap: ~15,000energy computations 1 - 1.5 hours of computer time 5000 conformations ~4 orders of magnitude speedup!

  30. Interpretation of electron density maps Statistical potential Library of protein fragments Self-collision and energy maintenance Structure alignment ProShape software Tools for high-dimensional spaces Probabilistic roadmaps Biology Structure determination Modeling Shape representation Hierarchies Algorithms Deformation Motion planning Shape organization Software Alpha shapes Summary

  31. Future Work • Perform more substantial experimentsE.g., more realistic potentials in ChainTree and probabilistic roadmaps • Extend tools to solve more relevant problemsE.g., encode Molecular Dynamics into probabilistic roadmaps • Combine resultsE.g., use library of fragments to sample probabilistic roadmaps • Develop new algorithms/data structuresE.g., sparse spanners to capture proximity information

  32. Our Future: The BioX – Clark Center June 2003

More Related