1 / 26

Scalable Peer-to-peer Network for Biological Simulations

Scalable Peer-to-peer Network for Biological Simulations. Shun-Yun Hu 2005/05/26. Outline. Introduction Voronoi-based Overlay Network (VON) Protein Folding Problem Conclusion. A Look at Simulations. Simulations are important tools in scientific research

shino
Télécharger la présentation

Scalable Peer-to-peer Network for Biological Simulations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Peer-to-peer Network for Biological Simulations Shun-Yun Hu 2005/05/26

  2. Outline • Introduction • Voronoi-based Overlay Network (VON) • Protein Folding Problem • Conclusion

  3. A Look at Simulations • Simulations are important tools in scientific research • Larger scale and higher resolution (more accurate and detailed simulations) are constantly sought • However, computational resource can be limited

  4. An Untapped Potential • 300 Million PCs on the Internet (2000 est.) • Up to 80% to 90% of CPU is wasted • Large supply of computing resource, growing rapidly

  5. An Example: SETI@Home • Search for Extraterrestrial Intelligence (SETI) • UC Berkeley Project launched in May 1999 • PC User downloads a screen saver • Calculations are done using idle CPU time • 2005/03 statistics (in 6 years) • 5.3 M world-wide participants • 2.2 M years of single-processor CPU • 54 teraflop machine (current top 3: 70.72, 51.87, 35.86)

  6. Simulation: Folding@Home • Stanford Project launched in Sept. 2000 • Seeks to determine protein’s 3D structure • Screensaver that downloads “work units” • 2002 Statistics: • 30,000 volunteers • 1 M days of single-processor CPU • Published 23 papers in: Science, Nature, Nature Structural Biology, PNAS, JMB, etc.

  7. The Grand Question • Can we build the ultimate simulator for large-scale simulation utilizing millions of computers world-wide? • Potential applications: • Nuclear reaction • Star clusters • Atomic-scale modeling in material science • Weather, earthquakes • Biology (protein, ecosystem, brain, ...)

  8. Current Limitations • Current methodology: • Client-server model (master & slaves) • clients request “work unit” to process • Communication is minimized • Clients do not communicate • Issues: • Only suitable for “embarrassingly parallel” simulations • Sophisticated server-side algorithm and management required • An alternative: peer-to-peer (P2P) computing

  9. What is Peer-to-Peer (P2P)? [Stoica et al. 2003] • Distributed systems without any centralized control or hierarchical organization • Runs software with equivalent functionality • Examples • File-sharing: Napster, Gnutella, eDonkey • VoIP: Skype • DHT: Chord, CAN, Pastry

  10. Peer-to-Peer Overlay A P2P overlay network source: [Keller & Simon 2003]

  11. Promise & Challenge of P2P • Promises • Growing resource, decentralized  Scalable • Commodity hardware  Affordable • Challenges • Topology maintenance  dynamic join/leave • Efficient content retrieval no global knowledge

  12. A Simulation Scenario • How can we utilize P2P for simulation-purpose? Answer: depends on what you want to simulate • We observe that many simulations… • are spatially-oriented (i.e. based on coordinate systems) • run in discrete time-steps • require synchronization at each time-step • exhibit localized interaction (i.e. short-range interaction) • example: molecular dynamics (MD) simulation

  13. Scenario Defined for P2P • Many simulated entities (nodes) on a 2D plane ( > 1,000) • Positions (coordinates) may change at each time-step • How to synchronize positions with those in Area of Interest (AOI)? Area of Interest

  14. P2P Design Goals • Observation: • the contents are information from AOI neighbors • P2P content discovery is a neighbor discovery problem • Solve the Neighbor Discovery Problem in a fully-distributed, message-efficient manner. • Specific goals: • Scalable  Limit & minimize message traffics • Fast  Direct connection with AOI neighbors

  15. Outline • Introduction • Voronoi-based Overlay Network (VON) • Protein Folding Problem • Conclusion

  16. Voronoi Diagram • 2D Plane partitioned into regions by sites, each region contains all the points closest to its site • Can be used to find k-nearest neighbor easily Neighbors Region Site

  17. Design Concepts Use Voronoi to solve the neighbor discovery problem • Identify enclosing and boundary neighbors • Each node constructs a Voronoi of all AOI neighbors • Enclosing neighbors are minimally maintained • Mutual collaboration in neighbor discovery

  18. Procedure (JOIN) 1)Joining node sends coordinates to any existing node Join request is forwarded to acceptor 2)Acceptorsends back its own neighbor list joining node connects with other nodes on the list Joining node Acceptor’s region

  19. Procedure (MOVE) 1) Positions sent to all neighbors, mark messages to B.N. B.N. checks for overlaps between mover’s AOI and its E.N. 2) Connect to new nodes upon notification by B.N. Disconnect any non-overlapped neighbor Boundary neighbors Non-overlapped neighbors New neighbors

  20. Outline • Introduction • Voronoi-based Overlay Network (VON) • Protein Folding Problem • Conclusion

  21. Protein Folding Problem • Find native state (lowest free energy) 3D structure given a 1D sequence of amino acids • Timescale limitation of classical MD methods • Secondary structure folds in 0.1 ~ 10 ms • Small protein folds in tens of ms • Current record: 1ms (villin headpiece) • full-atomic simulation of 1 ns takes one CPU day • 1,000 ~ 10,000 gap (it might take decades)

  22. Folding@Home Parallelization • Dynamics of complex system involves crossing of free energy barriers • Most time is spent in free energy minimum “waiting” • Possible to simulate using trajectories much shorter than folding time • “ensemble dynamics” (same coords, different velocities)

  23. Outline • Introduction • Voronoi-based Overlay Network (VON) • Protein Folding Problem • Conclusion

  24. Summary • Idle CPU and networks are untapped potential resources for large-scale simulation • Current approaches do not support simulations that require frequent synchronization / updates • A promising solution: Voronoi-based P2P Overlay • Leverage knowledge of each peer to maintain topology • Properties: scalable, efficient, fully-distributed • Enable simulations with frequent localized synchronization

  25. Acknowledgements • Dr. Jui-Fa Chen (陳瑞發老師) • Dr. Wei-Chuan Lin (林偉川老師) • Members of the Alpha Lab, TKU CS • Guan-Ming Liao (廖冠名) • Dr. Chin-Kun Hu (胡進錕老師) • LSCP, Institute of Physics, Academia Sinica • Joaquin Keller (France Telecomm R&D, Solipsis) • Bart Whitebook (butterfly.net) • Jon Watte (there.com) • Dr. Wen-Bing Horng (洪文斌老師) • Dr. Jiung-yao Huang (黃俊堯老師)

More Related