Community Services Project “Enabling communities of collaborating users and services on the Grid”

Community Services Project“Enabling communities of collaborating users and services on the Grid” Jon B. Weissman Distributed Computing Systems Group Department of Computer Science University of Minnesota

Outline • Motivation and Vision • Why network services and the Grid? • What are the technical challenges? • Project Details • System architecture • Service model • Middleware • Related Work • Conclusion and Future Work

Motivation: Why Services? solver • What is a network service? • Software that can be remotely executed across the network • High-end network services crucial to scientific application communities • Putting network services on-line will increase collaboration and productivity • Other Benefits • Service provider maintains, tunes, and upgrades service automatically • User need not have high-end resources • User need not become an expert in high performance computing

Grid Network Service Benefits • Synergistic with the “Grid” • Ensemble of geographically-dispersed resources • Network services encapsulated Grid resources • physical resources (computer, storage, instruments) • soft resources (software components) • Make Grid resources “invisible” • Common access protocols • Emerging standard for Grid-based network services (OGSA) • dynamic Web service • Grid Services form the basis of Virtual Organization concept • A VO is a “Grid-let” specific to a user community

Service Modalities • User view “out-sourcing” • I need to locate genomic, computing, and storage services • “I want to compare my source sequence library against all known target sets” • Clear separation between user and service providers • Service provider view “deployment” • I need to deploy my service in the Grid for community use and/or “personal use” (in-sourced) • Resource provider view “hosting” • I need to host services to support my user community, possibly justify my cost, generate revenue, etc • Community-based metric • Amount of “science” that can be done: all parties gain benefit • Other metrics could be applicable: cost, barter, etc.

CHALLENGE: Dynamism • assembling services, users, and resources may be performed without pre-planning • environments/VOs must be flexible and adaptive => dynamic service deployment

Dynamic Service Deployment • Want new services can be added to Grid while it is running … and remotely deployed to: • scale service deployment with demand • augment the capabilities of a VO by adding new services • deploy a Grid service on a newly added/discovered pool of resources • enable a new version of a Grid service to replace an old one • Service model must support adaptation at many levels • service architecture must adapt to new/replacement services • service must adapt to demand, resource availability • service and service architecture must adapt to faults

Issues • Scheduling • Where to deploy? When to re-deploy? How long to deploy? • Where to ship a service request? How many resources to grant it? • Fault Tolerance • How to enable self-managing services? How to mask failure?

Community Services Project • System Architecture • Seonho Kim, Byoung-Dai Lee • Middleware • Byoung-Dai Lee, Darin England, Anusha Iyer, Lakshman Rao Abburi • Testbeds • Seonho Kim • Applications • Byoung-Dai Lee, Darin England, Murali Sangubhatla

Grid Stack

Adaptive Grid Services • Divide services into system and application categories + expose (and support) adaptivity • System services have generally utilility • adaptive resource provider (ARP) • provide leased resource pools (CPUs, storage, etc) -> ACP, ASP, etc • “pre-installed” • Application services are more specific • high-end services – high resource requirements • parallel equation solver, gene sequence comparison • adaptive application grid service (AGS) • AGS has a front-end and a back-end • AGS back-end is hosted upon one an ARP

(AGS) Service Lifecycle • Packaging • leverage middleware • Install • decide on front-end location • Deploy • decide on back-end location • Initialize • Access • Teardown

Information Service Request/Response Registry Service Instance Creation Register/Query ARP Deploy module Lease Manager Query module Allocation module Client Dynamic Service Architecture resources AGS back-end Service Installer lease AGS_factory Request Manager AGS Front-end Resource Monitor Runtime Prediction Service AGSI AGS Deployer SOAP/HTTP AGSI AGSI Performance DB Status DB past workload … AGS Repository Home Site Remote Site Globus GT3 Grid Service Platform

Request/Response Service Instance Creation Dynamic Loading Tomcat Servlet Engine Instance Tomcat Servlet Engine AGS Factory Instance AGS Front-end AGS Deployer AXIS Framework (SOAP Engine) AXIS Framework (SOAP Engine) Dynamically Deployed AGS_factory Web-app Webapp Loader ARP Tomcat Manager SOAP/HTTP WAR WAR ARP Host Node Tomcat Servlet Engine Webapp Deployer Home Site AGS Factory Instance Member Node Remote Site

Component API , ARP // service-specific interfaces Key decision-makers: AGS_front_end decides where a service request will be sent AGS_deployer decides where to deploy or re-deploy a service

AGS API Key idea: service must respond to resource fluctuation

ARP API Key idea: resources are allocated and leased in type-specific lots ARP exposes its features

Community Service Testbed ARP AGS Front-end AGS Factory Univ. Minnesota Supercomputing Institute (Solaris) ARP Deployed Services LAN WAN AGS Front-end Univ. Virginia (Solaris) ARP AGS Factory AGS Factory Deployed Services Univ. Minnesota CS department (Linux/Solaris) Deployed Services Client Client

System Architecture Results

Deployment/Installation Cost SOAP penalty is about a factor of 2 (WAN)

Impact of SOAP Buffer Size SOAP buffers must be sized appropriately

Deployment Deployment Cost : Impact of the total library size • Reconfiguration cost is linear in the package size and is on the order of a few seconds • Transfer cost a package size and is on the order of a few seconds (WAN) for both install/deploy • Total cost on the order of seconds

End-to-end Latency Client-> Front-end Front-end -> Back-end Instance creation Handle returned to Front-end Service latency (time in msecs)

End-end Cost (Eigenvalue service)

Middleware • Common middleware inside service components • scheduling, performance prediction • Scheduling • Where to send a service request? How many resources to grant it? How many resources to lease to a service? • Where to deploy? • Performance prediction • Key to scheduling • Common = reusable • Observation: best performance predictor, scheduling technique highly dependent on the service and Grid

Solution: “Mixture of Experts” AGS back-end Request Manager Scheduling Policies Run-Time Predictors Run-Time History DB Scheduling Policies Adaptive code library Policy 1 Policy 1 Predictor 1 Policy 2 Policy 2 Predictor 2 Policy M Policy M Predictor N AGS front-end Service codes Request Result “Meta-level” algorithms – combinations of point algorithms

Middleware Results

Performance Prediction • Meta-level performance prediction

Scheduling: Where to send request? Where: may depend on service and workload.

Scheduling: How many resources? Service is leased a resource pool. How many resources to give to each request? depends on workload. Meta-level policies is the next step.

Current Work

Optimizations • Incremental Service Deployment • Service Caching

Stochastic Leasing Model • How many resources to lease a service? • Tradeoffs • Holding resources has an associated cost proportional to length • Dynamic releasing/reacquiring may be more responsive to demand and availability, but resource cost may be more expensive • Probabilistic demand • Random demand, random execution times • Developed a dynamic programming model • Models cost tradeoffs, and provides optimal leasing policy

DP vs. Static Leasing ~20% improvement Less variance

Testbed 2004 CS Service Register ACP Information Service Resource Monitor beo cluster katmai.cs ACP ACP WAN denali cluster LAN ACP ASP ACP ASP ASP IBM machines Linux machines ACP ADCS windows Solaris machines ACP GC AGS Front-end N-body AGS Front-end Solver AGS Front-end Storage AGS Front-end SGI machines MSI sitka.cs beo1.cs s1.msi a1.msi User Interface Status Monitor SSH fairbanks AGS Client Requests Demo (Laptop)

Remote Storage Web Service (ASP) • Grid service to provide remote archival storage • Transparent but restricted access to cluster of storage disks in ADCS lab

Related Projects • Service environments and testbeds • NetSolve (Dongarra, U.Tennessee) • Ninf (Matsuoka, Tokyo IT) • Open Grid Services Architecture (OGSA) • Component architecture and interface • XCAT (Gannon, U. Indiana) • H2O (Sundarem, Emory) • Composable Services (Karamcheti, NYU) • Internet Server environment • Sharc (Shenoy, U. Mass) • Muse (Chase, Duke)

Summary • Community Service Project • Dynamic Grid Infrastructure • Architecture, Middleware, Testbeds • Addressing Dynamics and Reuse • Service demand, Grid resources • Adaptation at several levels • Meta-level strategies to promote reuse • For more info: community-services.cs.umn.edu • Thanks to DOE and NSF

Future Work • Customization • how to expose and configure services to meet specific user needs: performance, fault tolerance, etc • Data-intensive Services • large amounts of distributed data • deploy services that can process/analyze this data • extend our middleware and system architecture • Multiple Services • applications may wish to use multiple services together: pipelines are a common in high-end scientific applications • Customized environments • collections of customized services configured for specific applications

Questions?

Community Services Project “Enabling communities of collaborating users and services on the Grid”