Interoperability of Resource Description Across Grid Domain Boundaries

Interoperability of Resource Description across Grid domain boundaries John Brooke j.m.brooke@man.ac.uk E-Science NorthWest @ University of Manchester UK http://www.esnw.ac.uk

Grid Interoperability • In European and Japanese Grid projects there are two major middleware systems deployed, Globus (US) and Unicore (Europe/Japan). • Globus is mainly deployed in physics-based projects and Unicore in projects with complex heterogeneous architectures (e.g. specialist HPC architectures). • The FP 5 project GRIP began looking at the question of how resource requests could be handled from Unicore to Globus and the FP6 project takes this work forward into the world of service-based architectures (e.g. OGSA)

Starting point - GRIP • EU Funded FP5 Project as part of Information Society Technologies Programme IST 2001-32257 • http://www.grid-interoperability.org/

Grids as Virtual Organizations Used in paper Anatomy of the Grid (Foster, Kesselman, Tuecke) “ … Grid concept is coordinated resource sharing in dynamic, multi-institutional virtual organizations …” The link to the Power Grid concept is that the Power Grid is resource sharing by producers to provide a unified service to consumers. A large unresolved question is how do Virtual Organisations federate across security boundaries (e.g. firewalls) and organisational boundaries (resource allocation). There are important boundaries at site firewalls leading to a concept of large scale Grids as a federation of smaller Grids called MiniGrid structure (IEEE Grid Workshop Dec 2000, Bangalore)

Invariants in distributed computation • To draw an analogy with the current situation in Grid resouce description, I refer to the status of physics in the 17th and 18th centuries.It was not clear what the invariant quantities were that persisted through changes in physical phenomena. • Gradually quantities such as momentum, energy, electric charge were isolated and their invariance expressed in the form of Conservation Laws. • Without Conservation Laws, a precise science of physics is inconceivable. We must have constraints and invariants or analysis is impossible. • In this talk I examine how an original request for resources is translated into different resource description frameworks while still remaining the same request (invariance of computational resource)

The analogy with a power grid The power grid delivers electrical power in the form of a wave (A/C wave) The form of the wave can change over the Grid but there is a universal (scalar) measure of power, Power = voltage x current. This universal measure facilitates the underlying economy of the power grid. Since it is indifferent to the way the power is produced (gas, coal, hydro etc…) different production centres can all switch into the same Grid. To define the abstractions necessary for a Computational Grid we MUST understand what we mean by computational resource.

What is computational power? Is there an equivalent of voltage x current? Megaflops? Power is a rate of delivery of energy, so should we take Mflops/second. However this is application dependent. Consider two different computations Seti@home. Time factors not important. Distributed collaborative working on a CFD problem with computation and visualization of results in multiple locations. Time and synchronicity are important! But both may use exactly the same number of Mflops. The request for resource originally comes from an application, we will call this the resource requestor space.

An abstract space for job-costing • Define a job as a vector of computational resources • (r1,r2,…,rn) • A Grid resource advertises a cost function for each resource • (c1,c2,…,cn) • Cost function takes vector argument to produce job cost • (r1*c1 + r2*c2 + … + rn*cn)

A Dual Job-Space Thus we have a space of “requests” defined as a vector space of the computational needs of users over a Grid. For many jobs most of the entries in the vector will be null. We have another space of “services” who can produce “cost vectors” for costing for the user jobs (providing they can accommodate them). This is an example of a dual vector space. A strictly defined dual space is probably too rigid but can provide a basis for simulations. The abstract job requirements will need to be agreed. It may be a task for a broker to translate a job specification to a “user job” for a given Grid node. A Mini-Grid can help to investigate a given Dual Job-Space with vectors of known length.

4 - Dual Space Scalar cost in tokens 1 Job vector Cost 2 Cost vector User Job

Computational resource Computational jobs ask questions about the internal structure of the provider of computational power in a manner that an electrically powered device does not. For example, do we require specific compilers, libraries, disk resource, visualization servers? What if it goes wrong, do we get support? If we transfer data and methods of analysis over the Internet is it secure? A resource broker for high performance computation is a different order of complexity to a broker for an electricity supplier.

Emergent behaviour Given this complexity, self-sustaining global Grids are likely to emerge rather than be planned. Planned Grids can be important for specific tasks, the LCG is an example. They are not required to be self-sustaining and questions of accounting and resource transfer are not of central interest. Self-sustaining (in a financial sense) Grids must have ways of accounting for and costing resource. This allows for trading mechanisms between suppliers of computational resource analogous to trading in energy and power Grids

Fractal structure and complexity Grids are envisaged as having internal structure and also external links. Via the external links (WANS, intercontinental networks) Grids can be federated. Action of joining Grids raises interesting research questions: • 1. How do we conceptualise the joining of two Grids? • 2. Is there a minimum set of services that defines a Grid. • 3. Are there environments for distributed services and computing that are not Grids (e.g. do not have domain boundaries) • Here we examine the following problem: • “How do we federate Grids that describe resources in different languages and via different abstractions”

Resource Requestor and Provider Spaces • Resource requestor space (RR), in terms of what the user wants: e.g. Relocatable Weather Model, 10^6 points, 24 hours, full topography. • Resource Provider space (RP), 128 processors, Origin 3000 architecture, 40 Gigabytes Memory, 1000 Gigabytes disk space, 100 Mb/s connection. • We may even forward on requests from one resource provider to another, recasting of O3000 job in terms of IA64 cluster, gives different resource set. • Linkage and staging of different stages of workflow require environmental support, a hosting environment.

Relocatable local meteo model • Derivation of • an initial data set and • lateral boundary data sets • for LM from data from the Global Model (GME) of DWD • Size of data: 1-20 GB • Steps: • Extraction of GME data from Oracle data base • Interpolation of GME data onto the LM model grid

request RR space RP space request B A sync RP space RR space RP space Request referral C D Figure 1: Request from RR space at A mapped into resource providers at B and C, with C forwarding a request formulated in RR space to RP space at D. B and D synchronize at end of workflow before results returned to the initiator A. RR and RP Spaces

Resume -1 • We have shown how some concepts from abstract vector spaces MAY be able to provide a definition of Computational Resource. • We do not know as yet what conservation laws or constraints could apply to such an abstraction and whether these would be useful in analysing distributed computing. • We believe that we can show convincingly that simple scalar measures such as Megaflops are inadequate to the task. • This invalidates the “league table” concept such as the Top 500 computers. Computational resource will be increasingly judged by its utility within a given infrastructure.

Resource request matching • We will examine the concept of conservation of resource request by examining a Grid component specifically designed for this task, namely a resource broker. • The resource broker finds appropriate resources to run a users application request across a federated Grid. • Important: we do not demand uniformity of resource description terms.

Core Functions for Grids Acknowledgements to Bill Johnston of LBL

An example of a core function • The GGF Document “Core Functions for Production Grids” is attempting to define Grids by the minimal set of functions that a Grid must implement to be “usable” • The GRIP project has contributed a document abstracting the functionality required for resource brokerage • We examine this as a typical example of this approach which provides an alternative viewpoint to the “Grid Sevices” (OGSA) approach. • More specifically, the resource brokerage function has an intimate connection with the concept of “Computational Resource”.

Abstract Functions for a resource broker • Resource discovery, for workflows as well as single jobs. • Resource capability checking, do the offering sites have ALL necessary capability and environmental support for instantiating the workflow. • Inclusion of Quality of Service policies in the offers. • Information necessary for the negotiation between client and provider and mechanisms for ensuring contract compliance. • Document submitted to GPA-RG group of GGF.

Brokers as VOs Users VirtualOrganisationBrokers OrganizationFirewalls SystemBrokers ComputeResources

Resource Client Broker Service Broker Service Resource Broker Service Client Resource Broker Service Client Broker Service Resource Broker Service Client Broker Service Replication Resource Client Broker Service Resource VO Layer Specialist Layer Site Layer Federated Brokering

Persistent Virtual Environments Clients Other Brokers Banking Services MetaschedulingService Broker Site Feedback Policy Manager Chargeable Schedulable GridServices Workflow Manager Resource Usage Monitor Brokering and OGSA Services

Resource Broker Resource Database TSI Network Job Supervisor Unicore Gateway Unicore Client OGSA Server A GT3/4 User Database Possible OGSA Broker • Interoperating OGSA services

Local Brokering Configurations Client Client Gateway Gateway Broker NJS Broker NJS NJS NJS R-GMA IDB TSI/Host GT3 Host Host Host Site-Wide Brokering Normal EUROGRID/GRIP Brokering

Basic Broker-NJS Communication Delegate Directly in Longer Term NJS Broker Request Brokering Dispatch Delegated Brokering Get Results Local Resource Checker and Stub Interface Request Local Resource Check Get Results Provide Result Retrieve SystemConfigurationInformation IDB Service

Site Configuration Gateway Users Contact NJSes or Broker (for site-wide brokering) Delegate (site-wide brokering only) Delegate (site-wide brokering only) NJS Broker NJS LRC LRC IDB IDB Potential to Share (Partial?) IDBs between NJSes (CSAR Config?) TSI TSI SuppliesDynamic Datato IDB TSI

Look up signing identity IDB NJS UUDB Verify delegatedidentities Look upconfiguration Broker hosted in NJS AbstractBroker TicketManager Look up staticresources Get & check signed tickets (contracts) HostVsite Map SingleVsiteBroker WholeUsiteBroker Use R-GMA to provideinformation for all Usitecomponent hosts Delegate to Grid architecture-specificengine for local resource check Delegate to application-domain expert code R-GMA LocalResourceChecker ExpertBroker Experts may use LRC Pass untranslatable resources to Unicore resource checker UnicoreRC Globus2RC Globus3RC DWDLMExpert ICMExpert Other Look updynamicresources Look up resources Delegate resource domain translation Converts UNICORE resource requests to XPath search terms for GT3 Index Service & set of untranslatable resources to use UNCORE standard techniques upon. Compute Resource TSI GRAM MDS SimpleTranslator GT3 OntologicalTranslator Look up resources Look up translations appropriateto target Globus resource schema SimpleTranslator converts delegated UNICORE resource requests into LDAP search terms for GT2 MDS & set of untranslatable resources to use UNICORE standard techniques upon. Key: Ontology UNICORE Components EUROGRID Broker Globus Components GRIP Broker Whole-Site Broker Inheritance relation UoM Broker architecture To outside world

Broker functions • A simple Resource Check request: “Can this job run here”, checks static qualities like software resources (e.g. Gaussian98) as well as capacity resources like quotas (disk quotas, CPU, etc.) • A Quality of Service request: returns a range of turnaround time, and cost, as part of a Ticket. If the Ticket is presented (within its lifetime) with the job, the turnaround and cost estimates should be met.

Resume - 2 • We address the resource description question by considering the design of a resource broker that can cross grid domain boundaries. We have shown how such brokers can unify different mini-Grids into a coherent VO. • The federated structure of the VO leads naturally to a federated structure for the broker. • Since we do not assume that resource description mechanisms are uniform (federated mini-Grids may be controlled by different middleware) we must have translator modules in the broker architecture.

Grid Resource Description Problem • Two Independent Grid Systems • Unicore (http://www.unicore.org/) • Globus (http://www.globus.org/) • Both Need to Describe Systems that run Compute Jobs • Very Different Description Languages • Unicore’s Resource model, part of the AJO Framework • Globus’s GLUE Schema (DataTAG, iVGDL) for GT2 and GT3 • For interoperability, we want to take a Unicore job and run it on Globus resources • Therefore, we need to translate the Job’s Resource Requirements between the two Systems

Two Types of Integration • Right here, Right now: • Integrate existing features such as the way Unicore and Globus currently describe hardware resources • Best done by evolution, preserving much of the character of the legacy system components • The future: • Integrating future features such as the way Unicore and Globus will describe Software Resources • Best done by revolution, introducing a new system, reached by consensus between the two teams of architects

Methodology for translation Reconcile Master Ontology Unicore Ontology GLUE Ontology Reconcile Reconcile Unicore Ontology GLUE Ontology

Methodology for translation • Develop an ontology for the Unicore resource terminology • Develop an ontology for the Globus resource terminology • Map concepts in the Unicore ontology to concepts in the Globus ontology • We assume a consensus between the concepts in Unicore and GLUE Reconcile Master Ontology Unicore Ontology GLUE Ontology Reconcile Reconcile Unicore Ontology GLUE Ontology

Methodology for translation servce • Address Data Transformation Issues for Translating Attributes • Find a technology that has these characteristics: • can model the two ontologies • has support for linking abstract concepts to code fragments • easily allows someone to update mappings • is appropriate for a video conferencing setting • writes modelling information to a file format that can be used by other applications • Use the data files created by the application to run the translator service.

An Ontology Building Life-cycle Identify purpose and scope Consistency Checking Knowledge acquisition Building Language and representation Conceptualisation Integrating existing ontologies Available development tools Encoding Ontology Learning Evaluation

Ontology evolution:From Architects to Software Service User UnicoreArchitect Unicore Service provider Brief Descriptions Infer a lot from context GLUEArchitect Syntactic descriptions Interface descriptions Invocation descriptions Semantic mining Globus

Unicore: Modelling Resources

GLUE: Modelling resources

GLUE: Marking up transcripts

GLUE: Provenance Information

GLUE: Container Classes • GLUE has container classes that include “Computing Element”, “Cluster”, “Subcluster” and “Host”. From the heading “Representing Information”, the GLUE document indicates: “…hosts are composed into sub-clusters, sub-clusters are grouped into clusters, and then computing elements refer to one or more clusters.” • These container objects may hold any number optional auxiliary classes that actually describe the GRID features.

GLUE: Auxiliary Classes • The documentation provides few details about the nature of a Host other than that it is a “physical computing element”. Much of the meaning for Host has to be derived from what it might contain. Consider the following two valid definitions: • A Host is a physical computing element characterized by Main Memory, a Benchmark, a Network Adapter and an Operating System • A Host is a physical computing element characterized by an Architecture, a Processor and an Operating System.

Map concepts between ontologies • Unicore and GLUE have different philosophies for describing resources :-( • In Unicore, the resources are described in terms of resource requests • In GLUE, resources are described in terms of the availability of resources.

Compatible Concepts

Translation Service Prototype

Conclusions • Interoperability of grid resource requests is at the heart of the abstract idea of computational resource that can cross Grid domain boundaries • We wish to provide application users with seamless access to resources, they should not need to know details of the machines on which they run. • High level abstractions do not yet exist as standards, so we have to create ontologies that can translate differing modelling abstractions for Grid resources. • Our current translations lose much information in crossing between current middleware systems (e.g. Globus and Unicore).

Continuation of interoperability research Research Centre Jülich (Project manager) Consorzio Interuniversitario per il Calcolo Automatico dell’Italia Nord Orientale Fujitsu Laboratories of Europe University of Warsaw Intel GmbH University of Manchester T-Systems SfR http://www.unigrids.org

Interoperability of Resource Description Across Grid Domain Boundaries