html5-img
1 / 28

Designing a Java-based Grid Scheduler using Commodity Services

Designing a Java-based Grid Scheduler using Commodity Services. Patrick Wendel Arnold Fung Moustafa Ghanem Yike Guo Discovery Net InforSense Department of Computing London Imperial College, London. Outline. Discovery Net Project Platform Workflow Execution Design Deployment

caron
Télécharger la présentation

Designing a Java-based Grid Scheduler using Commodity Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing a Java-based Grid Scheduler using Commodity Services Patrick Wendel Arnold Fung Moustafa Ghanem Yike Guo Discovery Net InforSense Department of Computing London Imperial College, London

  2. Outline • Discovery Net Project • Platform • Workflow Execution • Design • Deployment • Conclusions – Future Works All Hands Meeting, Nottingham

  3. Discovery Net • Multidisciplinary project funded by the EPSRC under the UK e-Science programme (started Oct 2002, ended March 05) • Develop an infrastructure for integrating various types of data sources, software and hardware resources, targeted at e-Scientists. • Applications to: • Life Sciences • High throughput genomics and proteomics • Real-time Environmental Monitoring • High throughput dispersed air sensing technology • Geo-Hazard modelling • Earthquake modelling through satellite imagery • The project covered many areas including infrastructure, applications and algorithms (e.g. text mining) • Produced the Discovery Net platform which aims to integrate, compose, coordinate and deploy knowledge discovery services using a workflow technology. All Hands Meeting, Nottingham

  4. Online Sources Files Discovery Net Overview Web/Grid Service Distribution to Scientists Portal / Dashboard Application Business Process Rapid Application Deployment Integrative Analytics Workflow Environment Automation & Scheduling Interactive Solution Building Interactive Knowledge Discovery Dynamic Data & App Integration Data Applications Components Data Processing Tools Analysis Services Third Party Tools Computations Services Multiple data sources Web Services Grid Services SQL Databases All Hands Meeting, Nottingham Excel

  5. Latest Applications • Rail Network Data Analysis • Collaboration between the London e-Science Centre, and AEA Technology Rail funded by DfT • Project showed how it is possible to analyse the large amounts of data available within the rail industry using e-Science methods and Grid computing • Imaging Applications • Project using imageodesy algorithms • Medical imaging • Combinatorial Chemistry • TOPCOMBI (EU Project) 22 partners All Hands Meeting, Nottingham

  6. End Users Capability Providers Grid Technologists SIMDAT • EU-funded project • 4 years • Start date: September 1st, 2004 • 26 partners • www.simdat.org • InforSense is technology champion for workflow systems • Pharma applications • Automotive applications • Knowledge services application All Hands Meeting, Nottingham

  7. SIMDAT • Work conducted within SIMDAT (EU-funded project) • Extended workflow engine to support B2B use case scenario in the automotive, pharmaceutical • Integration with GRIA • Prototypes for coupled workflow engines • Prototypes for workflow engine interoperability All Hands Meeting, Nottingham

  8. Interface Web Portal Workflow Client Tool Web Service Activities Data Management Enactment/Execution Submission Activity Definition Data Access Verification Activity Authorisation Table Management Execution/ Optimisation Intermediate Results Workflows Workflow Storage Persistent Results Monitoring Workflow Authorisation Interaction Data Authorisation Workflow Execution History Authorisation Modules All Hands Meeting, Nottingham

  9. Workflow? • Data-flow Dependency graph • Workflow construction paradigm: • Visual graph construction (layout, annotation) • Aided construction through application-specific wizards • Using workflows provides: • A simple rapid application development environment • Visual representation of the process • Re-usable, maintainable and shared processes • Workflow-based knowledge management (provenance, audit, policies, warehousing) • Handling of basic parallel programming constructs (concurrent executions of branches, pipelining of executions for certain type of data and certain activities, interface for data-parallel activity implementations) • Coupling with data sets management All Hands Meeting, Nottingham

  10. Interface • Client interface: • Workflow construction, verification, execution, monitoring • Supports visualisation and interactive activities (activities executed in the client) • Synchronised with activity repository (using JWS) • Web Portal and Web Services endpoint, for accessing workflows as Services All Hands Meeting, Nottingham

  11. Server-side Architecture Presentation JSP Struts Portlets (JSR 168) HTTP Servlets Data Transfer Code Download Cache/Results Access Web Service Services Stateless Stateful CMP Message-driven Component Mgmt Task Management Job DataMgmt ExecutionHandler Topics Queues Jobs queue Jobs topic Status topic Generic Services Container-Managed Persistence for EJB Messaging (JMS) P2P, Publish/Subscribe Security (JAAS) Logging (Log4J) Database Connectivity (JDBC) Naming Service (JNDI) Management Service (JMX) Plugin Framework (JPF) Software Delivery (JWS) Repositories User/Group Workflows Results Interm. Results Jobs Jobs History Activities All Hands Meeting, Nottingham

  12. Distributed Execution • Activity-level distributed computing • SSH (data streaming), SGE, LSF… • Web Services, GRIA, HTTPClient (Groovy) • Workflow-level (scheduling of overall execution): • Depends on usage and type of workflow: • Developing prototype workflows: • Iterative refinement • Caching and reuse of intermediate results within a user session • Stateless production workflow: • Entire workflow executed for different input/parameters • Scheduling • Stateful production workflow as services: • Workflow executed following a process/guide • Execution engine must be able to reuse results cached All Hands Meeting, Nottingham

  13. Granularity of an execution • Architecture based on the Java EE stack, which provides a hosting environment for the activities (context, security, logging, access to resources and application-level environment information) • Each workflow execution is handled by one or more threads running in a Java server, while usually tasks submitted to grid schedulers are OS processes. • Periodic monitoring information generated by each activity (not only by the workflow engine) sent back to the client tool or portal. • What’s the best way to handle task scheduling in that context? All Hands Meeting, Nottingham

  14. Requirements Summary • From the Discovery Net architecture: • Workflow execution and activity reliant on JEE services • Scheduling should depend on the need to reuse and the availability of intermediate results for the workflow • Additional constraints: • Execution servers can be distributed over WAN • Based only on standard Grid infrastructure or JEE Services • No direct communication between execution servers and client tool All Hands Meeting, Nottingham

  15. First attempt: Grid Scheduler • Submit execution to SGE • Issues: • Cannot start an instance of the server for each execution (only one instance of JBoss at a time, except adding new configurations for each execution). • Start up cost of the server not negligible for some workflows. • The execution server needs to connect back to the submission server and setup a two-way communication channel. • How is the client notified of new status? All Hands Meeting, Nottingham

  16. Second attempt: AS Clustering • Application server level clustering • CMP Entity bean Clustering • Experiment with JBoss Clustering (based on JGroups) • Issues • Application Server Clustering not fully standardized. Different issues on different application servers • Cluster configuration based on JGroups, only supported static clusters (set of IPs) or join protocol based on broadcast (may be better now?) • Modifications of the clustering code required to ensure that a unique instance of the Entity bean representing the task is created and used throughout the execution • Not designed for long running tasks All Hands Meeting, Nottingham

  17. Stateless Session Bean as entry point (Task Management Service): Mapping to IIOP/RMI or SOAP/HTTP Stateful CMP Entity Bean to represent the state of the task (workflow, cached results, monitoring information) Job JMS Queue to submit requests ExecutionHandler Message-Driven Bean to handle the requests Job Topic to send control commands to the execution Status Topic to send back information from the execution Scheduling policy implemented by the JMS Queue service provider: Default using round-robin Integrated with SGE using simple scripts to find out a potential execution server (extended to check whether the execution server is started or should be started) Customised implementation to check for workflow cached intermediate results Number of concurrent executions on each execution server defined by the size of the pool for the ExecutionHandler MDB Third Attempt: Using Java Services All Hands Meeting, Nottingham

  18. Design Execution Server 1 Web Portal submit Task Management Service (Stateless) Execution Server N Client Tool publish load/save Services Provided Messaging Service Provider Persistence Service subscribe subscribe All Hands Meeting, Nottingham

  19. Submission Execution Server - Message-triggered ExecutionHandler receives notification Task Management Service submit Client Tool - Create JobEntity - JobEntity activated -Publish to Job queue publish load/save Services Provided subscribe Job queue Persistence Provider for JobEntity All Hands Meeting, Nottingham

  20. Control Execution Server Task Management Service - JobEntity receives notification Control (pause/resume/kill) Client Tool -Publish control request on Job topic publish load/save Services Provided Subscribe to messages for Job ID Job topic Persistence Provider for JobEntity All Hands Meeting, Nottingham

  21. Monitoring Execution Server Task Management Service • Update Job entity state • - Publish status update Client Tool Subscribe to messages for Job ID update publish Services Provided Status Topic Persistence Provider for JobEntity All Hands Meeting, Nottingham

  22. Management • Status update period: The ExecutionHandler is in charge of checking regularly (base period) if the monitoring information of the workflow has changed, increase the period if it has not (up to a maximum update period) and notify the Status Topic if it has. • Failure detection: The server hosting the Task Management service also checks for tasks for which the time since the last update is significantly higher than the maximum update period. • Security Context: All the execution servers can have dedicated JAAS configuration. To avoid the issue of having to re-authenticate the user who submitted the workflow, execution servers use a customised login module to handle the delegation. All Hands Meeting, Nottingham

  23. Deployment • JBoss 3.2, JBossMQ, Hibernate • LAN: Using faster native Java protocols (RMI/JRMP) and call back • WAN: Using HTTP-based and polling based protocols All Hands Meeting, Nottingham

  24. LAN Deployment Execution Server 1 Web Service/HTTP IIOP RMI/JRMP Task Management Service RMI/JRMP Execution Server N Client Tool RMI/JRMP RMI/JRMP TCP TCP Services Provided Messaging Service Provider Persistence Service All Hands Meeting, Nottingham

  25. WAN Deployment Firewall/NAT Firewall/NAT Execution Server 1 Web Service/HTTP IIOP HTTP Task Management Service HTTP Execution Server N Client Tool HTTP HTTP TCP TCP Services Provided Messaging Service Provider Persistence Service All Hands Meeting, Nottingham

  26. Evaluation • Reliability: • Dependent on reliability of CMP provider, JMS provider • Task Management service is stateless • Execution Servers do not hold the state of the task (only intermediate results) • LAN configuration, used for running nightly regression test workflows, over a heterogeneous cluster (Linux servers + desktop PCs) • Deployed on production clusters (with limited connectivity from the slaves to the outside network) • WAN configuration adding several seconds of delay depending on the workflow: • Workflow submission is still synchronous RPC using tunnelled JRMP • Monitoring information using Java serialisation as well All Hands Meeting, Nottingham

  27. Conclusion • Simple, scalable solution based on Java EE commodity services, instead of working around Grid submission APIs, yet customisable to use any command-line based scheduler, resource monitor or workflow specific policies. • The implementation is not bound to any network protocol. • Issues • To have custom policies rely on the flexibility of the JMS provider • No software delivery mechanism for execution servers (unlike the client). You have to install it. • Reliance on JEE services performances? • Why bother about having a hosting environment for workflow execution? All Hands Meeting, Nottingham

  28. Future Works • Use the workflow structure to refine the scheduling algorithm, taking into account information about the workflow (such as the number of branches and pipelined activities) • User-defined rules/scripts to define workflow-level or activity-level scheduling policy/rules. All Hands Meeting, Nottingham

More Related