1 / 12

Inter-cluster Job Deployment by AgentTeamwork Sentinel Agents

Inter-cluster Job Deployment by AgentTeamwork Sentinel Agents. Emory Horvath CSS497 Spring 2006 Advisor: Dr. Munehiro Fukuda. What is Grid Computing?. Grid Computing seeks to pool together large numbers of computers, allowing unused CPU cycles to be shared for CPU-intensive tasks. Examples:

hopec
Télécharger la présentation

Inter-cluster Job Deployment by AgentTeamwork Sentinel Agents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inter-cluster Job Deployment by AgentTeamwork Sentinel Agents Emory Horvath CSS497 Spring 2006 Advisor: Dr. Munehiro Fukuda

  2. What is Grid Computing? • Grid Computing seeks to pool together large numbers of computers, allowing unused CPU cycles to be shared for CPU-intensive tasks. • Examples: • Condor • SETI@home • Issues: • Job coordination • Security • Software installation and maintenance • Fault tolerance

  3. What is AgentTeamwork? • Portable Java-based grid computing platform, based on the mobile agent paradigm. • Decentralized architecture, without a central manager. • Easy installation and participation. • Designed with fault tolerance in mind. • Participating computers run a Java process (UWPlace). • Each UWPlace can host one or more mobile agent Java processes (UWAgents). • Central FTP server hosts the list of available computers.

  4. Snapshot Methods Snapshot Methods GridTCP GridTCP User program wrapper User program wrapper Results Results snapshot snapshot snapshot User A User B FTP Server snapshots snapshots How AgentTeamwork Works User A’s Process User A’s Process User B’s Process TCP Communication Snapshot Methods GridTCP User program wrapper Sentinel Agent Sentinel Agent Sentinel Agent Commander Agent Resource Agent Resource Agent Commander Agent BookkeeperAgent Bookkeeper Agent

  5. User snapshot snapshot Single-Cluster Hierarchy Commander id 0 Job Submission XML Query Spawn eXist Resource id 1 Sentinel id 2 rank 0 Bookkeeper id 3 rank 0 Sensor id 5 Sentinel id 8 rank 1 Sentinel id 9 rank 2 Sentinel id 10 rank 3 Sentinel id 11 rank 4 Bookkeeper id 12 rank 1 Bookkeeper id 13 rank 2 Bookkeeper id 14 rank 3 Bookkeeper id 15 rank 4 id: agent id rank: MPI Rank Sentinel id 32 rank 5 Sentinel id 33 rank 6 Sentinel id 34 rank 7 Bookkeeper id 48 rank 5 Bookkeeper id 49 rank 6 Bookkeeper id 50 rank 7

  6. (2) Search for the latest snapshot (3) Retrieve the snapshot (4) Send a new agent (1) Detect a ping error New Sentinel id 11 rank 4 (5) Notify about the restart (0) Send a new snapshot periodically Single-Cluster Job Resumption Sentinel id 2 rank 0 MPI connections Sentinel id 8 rank 1 Sentinel id 9 rank 2 Sentinel id 10 rank 3 Sentinel id 11 rank 4 Bookkeeper id 15 rank 4

  7. Extending to Multiple Clusters • The existing AgentTeamwork system allows only job deployment within a single intranet cluster. • The primary focus of my project was to extend Agent Teamwork to allow job deployment and resumption across multiple clusters: • Rewrite and extend existing AgentTeamwork algorithms to support multiple clusters. • Rewrite job deployment code to deploy gateway tasks and remote-cluster jobs. • Integrate new gateway-enabled Java socket functionality. • Rewrite job-resumption code to resume failed remote clusters and remote compute nodes.

  8. Cluster gateway 0 User Desktop computers Sentinel id 8 rank -8 Sentinel id 9 rank X Cluster gateway 1, 2, and 3 Sentinel id 32 rank 0 Sentinel id 33 rank -33 Sentinel id 34 rank -34 Sentinel id 35 rank -35 Sentinel id 36 rank X+1 Sentinel id 37 rank X+2 Sentinel id 38 rank X+3 Sentinel id 39 rank X+4 Sentinel id 128 rank 1 Sentinel id 129 rank 2 Sentinel id 130 rank 3 Sentinel id 131 rank 4 Sentinel id 132 rank 6 Cluster 1 Cluster 3 Cluster 2 Cluster 0 Sentinel id 512 rank 5 Sentinel id 528 rank 7 Sentinel id 529 rank 8 Sentinel id 530 rank 9 Sentinel id 531 rank 10 Multiple-Cluster Hierarchy Commander id 0 Sentinel id 2 Resource id 1 Bookkeeper id 3 rank 0

  9. User Extra Cluster gateway Cluster Gateway Sentinel id 132 rank 6 Compute Node Extra Cluster Cluster 1 Extra Node New Sentinel Extra Node Sentinel id 528 rank 7 Sentinel id 529 rank 8 Sentinel id 530 rank 9 Sentinel id 531 rank 10 Compute Node Compute Node Compute Node Compute Node Multiple-Cluster Job Resumption Commander id 0 Sentinel id 2 Resource id 1 Bookkeeper id 3 rank 0 Desktop computers Cluster gateway 0 Sentinel id 8 rank -8 Cluster gateway 1 Cluster 0 Sentinel id 33 rank -33 Sentinel id 32 rank 0 Sentinel id 128 rank 1 Sentinel id 129 rank 2 Sentinel id 130 rank 3 Sentinel id 131 rank 4 Sentinel id 512 rank 5

  10. Other Current & Ongoing Tasks • AgentTeamwork is an ongoing project, with parallel contributions by many other team members: • RMI to Java Socket enhancements, developed by Duncan Smith, were integrated. • Agent file I/O enhancements (Jumpei Miyauchi), and sensor agent enhancements (Jun Morisaki) were also integrated. • Although I am presenting now, I will be continuing on the project over the summer: • Completion of inter-cluster fault tolerance and job redeployment. • Completion of inter-cluster performance tests • Assisting Cuong Ngo as needed with the implementation of dynamic resource allocation.

  11. Acknowledgements • Professor Fukuda, my advisor. • NSF Middleware Initiative. • The UW-Bothell CSS Program. • Graphics and other slide content contributed by Prof. Fukuda from earlier AgentTeamwork presentations and papers.

  12. Questions?

More Related