160 likes | 286 Vues
This presentation explores the revolutionary approach to atomistic protein folding simulations using distributed computing platforms, like Folding@Home. It outlines the motivation behind studying protein dynamics and the challenges faced with traditional computational techniques. The talk introduces ensemble dynamics as an innovative method for efficient simulation, emphasizing the benefits of large-scale parallelization. By analyzing folding events in real-time atomic detail, we aim to enhance our understanding of protein misfolding related to diseases and further advancements in nanotechnology.
E N D
Atomistic Protein Folding Simulations on the Submillisecond Timescale Using Worldwide Distributed Computing Qing Lu CMSC 838 Presentation
Overview • Overview of talk • Motivation • Challenge • Methods • Ensemble Dynamics • Folding@Home • Evaluation • Observations CMSC 838T – Presentation
Motivation • Atomistic simulation of protein folding • understand dynamics of folding • real-time folding in full atomic detail • large-scale parallelization methods • Benefits • protein folding & disease • protein self-assemble to function • proteins misfold diseases • nanotechnology • nanomachines • self-assemble on the nanoscale CMSC 838T – Presentation
Challenge • Difficulties • limited by current computational techniques • fastest folding in microseconds • one CPU: 1ns/day, 30 years • 10,000 fold computational gap • 1,000 CPUs, 1 microsecond / day • traditional parallelization scheme • hard to scale to a large amount of processors • extremely fast communication • complexity of coordination • expensive supercomputers • cost • time-sharing CMSC 838T – Presentation
Method • ensemble dynamics • a new simulation algorithm • parallel simulation • Folding@Home • heterogeneous network, Internet • large-scale distributed platform CMSC 838T – Presentation
Simulation of Dynamics • free energy barrier • progress from one state to another: transition • thermal fluctuations to push system over free energy barrier • previous approaches: sampling • maybe stuck in meta-stable free energy minima • expensive computational cost of sampling CMSC 838T – Presentation
Ensemble Dynamics • application scenario • waiting time of transitions dominates total time • protein folding • transition: free energy barrier crossing • coupled simulations: transition coupling • Algorithm • M independent simulations from a initial condition • first simulation to cross free energy barrier • M times less to cross barrier than average time • restart M simulations with the new location after transition • Near linear speed up in #processors • exponential kinetics: f(t) = 1 – exp(-k t) • If k * t is small, f(t) = k * t • M simulations M * f(t) = M * k * t folding events CMSC 838T – Presentation
Limitations • barrier crossing probability • exponential assumptions • correct transition detection • transition: free energy barrier crossing • a large variance in energy: threshold • correct detection is not guaranteed • multiple possible transition • not addressed • selection of the first transition CMSC 838T – Presentation
Distributed Computing • Distributed simulations • M processors for each run • simulate folding in atomic detail on each processor • restart once a crossing barrier event occurs • Implementation: Folding@Home • worldwide distributed computing: Internet • started in October 2000 • more than 200,000 participants • 10,000 CPU-years in the first 12 months CMSC 838T – Presentation
Folding@Home CMSC 838T – Presentation
Folding@Home • client-server architecture • server assign jobs(work unit) to client • client sends back results after computation • ~100K data transfer between client and server • why is ensemble dynamics good for Folding@Home? • CPU intensive job: a few hours, often days • connection speed: modem, good enough • suitable for Folding@Home CMSC 838T – Presentation
Other@Home Work • SETI@Home • search for intelligent life outside Earth • data analysis of signals • FightAids@Home • find drug therapy for HIV • how drugs interact with various HIV virus mutations • distributed projects • Divide-and-Conquer • CPU intensive jobs • small pieces of data(kilobytes) transfer • communication not a major concern CMSC 838T – Presentation
Evaluation • Folding@Home • based on Tinker molecular dynamics code • voluntary participants worldwide, over 400,000 CPUs • simulate folding and unfolding • folding rates • simulations on small proteins CMSC 838T – Presentation
Folding Rates CMSC 838T – Presentation
Folding & Unfolding CMSC 838T – Presentation
Observations • Sampling • too expensive to run for a long timescales • waste too much time lingering in local energy minima • Ensemble dynamics • speed up simulations of dynamics • biological meaning of simulations results? • results on large protein folding? • limitations: correct transition detection, transition probability • Folding@Home • cheap way to achieve super computation power • huge distributed computing platform: over 400,000 CPUs • an efficient approach for CPU intensive job • Complexity of problems and size of data increase rapidly • find better algorithm is preferable to buying supercomputers CMSC 838T – Presentation