Universit à degli Studi di Pisa Dipartimento di Informatica

SERVICE AND RESOURCE DISCOVERY SUPPORTS OVER P2P OVERLAYSEMANUELE CARLINI, MASSIMO COPPOLA,DOMENICO LAFORENZA,PATRIZIO DAZZI, LAURA RICCIInternational Conference on Ultra Modern Telecommunications, ICUMTSaint Petersburg, October 12-14th, 2009 Università degli Studi di PisaDipartimento di Informatica

INTRODUCTION Grid environments exploit a huge amount of geographically scattered computing resources Main features of large computational grids Dynamicenvironment Huge amount of heterogeneous resources Complex middlewares for accessing the resources XtreemOS: a research project funded by the European Commission main goal: definition of an Open Source, Grid enabled Operating System scalable and transparent management of large computational platforms federation of several virtual organizations users exploit the distributed system through a standard operating system interface

SRDS: SERVICE AND RESOURCE DISCOVERY SRDS: a basic service of XtreemOS providing a highlydistributed directory service SRDS main features enables resource look-up and exploitation in a multi-VO environment hides the effect of scale when exploiting individual systems may be exploited by different clients other modules of XtreemOS applications supports different kind of queries key-based multi-attribute range queries over dynamic attributes

SRDS ARCHITECTURE SRDS exploits a set of P2P overlays where each overlay includes nodes from different virtual organizations The choice of the P2P model enables scalability low overhead fault tolerance management of information in a dynamic environment SRDS services are exploited by different clients, each one with different requirements. to cope with the diversity of these requirements, several P2P overlays characterized by different features have been defined (Distributed Hash Tables, structured overlays,...)‏

SRDS: THE ARCHITECTURE Facade: an easy-to-extend multiple interface protocols Query Provider (QP): set of modules for client query translation Information Management Layer(IML): common interface to DHT-like overlays ADS(Application Directory Service) = Facade+ QP + IML RSS Resource Selection Service a P2P overlayallowing scalable resource location in large overlays Scalaris , Overlay Weaver: DHT with different characteristics

SRDS MAIN MODULES: ADS AND RSS RSS (Resource Selection Service) supports resource discovery through queries on constant value attributes CPU = IA32, MEM 2[4GB;), BANDWIDTH  [512Kb=s;), DISK  [128GB;),OS  fLinux 2.6.19-1.2895, . . . , Linux 2.6.20-1.2944} ADS (Application Directory Service) supports complex queries over dynamic attributes Example: the RSS selects a set of resources matching whose static attributes match the query constraints. the descriptors of these resources are stored in the ADS. the dynamic state of the resources (for instance, current free memory) is monitored through the ADS RSS acts a machete, while ADS acts like a 'bistury'

RSS: RESOURCE SELECTION SERVICE Supports resource discovery through multi attributerange queries over a set of static attributes, i.e.constant-valued attributes, known at inizialization time. RSS main features each node represents its own attributes in the overlay no delegation of the resource information to other nodes, like in DHT-based approaches speed up resource location

ADS: THE QUERY PROVIDER (QP)‏ Query Provider Layer: provides a set of modules devoted to query translation Implements a set of algorithms for the interpretation of the queries of different SRDS clients For instance, a job directory service is required to monitor the state of the jobs of an application/VO when a new job is created, the client submits an AddJob to the SRDS the AddJob operation is interpreted by a QP modules which translates it into a sequence of operations on the underlying DHT Check of the existence of a proper job directory service, if it does not exist, it requires its creation Insertion of the job ID into the DHT Insertion further information about the jobs under proper keys to suppor inverse queries The QP makes all these steps transparent for the user

ADS: THE INFORMATION MANAGEMENT LAYER (IML)‏ Namespaces defines the context where the key is used. For instance different name space for different job directories ADS (Application Directory Service) provides an implementation of namespaces over DHT receives from a QP module an abstract operation: OPQP = { op, keyM, valueM, NSpace, ClientType, ClientID } provides an implementation of namespaces generates an operation for the underlying DHT in the proper namespace OPDHT ={ op, keyD, valueD, auxinfo } where valueD: generally equals valueM keyD:may differ from keyM because of namespace implementation auxinfo: data expiration timeouts, user-defined secrets,....

EXPLOITING NAMESPACES: AN EXAMPLE Network coordinates (NC) embedding system embed latency such as round trip times among nodes into some geometric space Each node is assigned network coordinates in the geometric space Unmisured round trip times is estimated by computing the distance between two nodes in the geometric space To support direct queries, i.e. given the IP of the nodes return its network coordinates inverse queries given the X/Y coordinate of the node, find the the IP of the 'nearest' neighbours' the ADS exploits three different namespaces: IP, X, Y each namespace may be mapped on a different DHT or on the same DHT and may have different characteristics

NAMESPACE IMPLEMENTATION • Different choices for the implementation of the namepsaces: • a different DHT for each namespace • a set of namespaces on the same DHT

NAMESPACE IMPLEMENTATION Single Ring Approach: DHT key is prefixed by the an identifier of the name space main drawback: DHT features, like replication strategy, fault repair strategy,... cannot be tuned according to the name space Multiple Ring Approach On demand ring creation Parameters and policies of the DHT ring are customized at ring set-up time Some rings may always remain active include essential key space, for instance resource directories Smaller rings may have a shorter lifespan application rings, for instance job directory for a given application,....

NAMESPACE IMPLEMENTATION The Current version of the ADS exploits two different rings, based on two different DHT, Scalaris, Overlay Weaver Scalaris A transactional based DHT Provides consistent replication of data Overlay Weaver implements different DHT Chord, Pastry, CAN,... define a routing layer common to all the DHTs. The Overlay Weaver Architecture

COMPLEX QUERIES ON DHT DHT supports only basic key-value queries More complex queries may be submitted by the SRDS clients Multidimensional range queries on dynamic attributes Examples exact match query: Arch.='x86' and CPU-Speed='3 Ghz' and RAM='256MB' partial match queries: CPU-Speed='3 Ghz' and RAM='256MB' (and Arch.=*)‏ range queries 1Ghz<CPU-Speed<'3Ghz' and 512MB<RAM<1Gb similarity queries (o nearest neighbour queries)‏ require the definition of a metric in the attribute space the user submits an exact match query, which defines a point P in the attribute space. P may not correspond to any resource. output: k resources nearest to P, according to the defined metric

RANGE QUERY SUPPORT SRDS supports multiattribute range queries • an approach based on the MAAN proposal • exploits the Chord DHT • Resource pubblication • Each resource is described by k pairs (ai, vi)‏ • A locality preserving hashing function maps the • value of each attribute onto the DHT • H(vi) = (vi - vimin) x (2m -1) / (vimax – vimin)‏ • 2m : dimension of the key space • The descriptor of each resource is published • onto k DHT nodes

RANGE QUERY SUPPORT Consider a multi attribute range query a1[v1l, v1u], ...ak[vkl,...vku] The hashing function maps the range of each attribute onto a DHT range Selectivity of an attribute Si = 2m/ H(viu) – H(vil) The dominant attribute ai= [vil,..viu] with the highest selectivity is choosen. The query is sent to H(vil) and is propagated on a DHT arc A till it reaches H(viu)‏ Each node on the A checks if the query satisfies all the query constraints The results are collected along A and sent by the H(viu)‏ to the querying node

PUBLICATION OPTIMIZATION • SRDS optimizes the publication process of the resources defined by MAAN • Publication optimization: exploits soft state cache to store the routing results obtained during the publication process • Routing on the DHT is avoided if the routing path to a node is stored in the cache

PUBLICATION OPTIMIZATION A second optimization is defined to avoid the publication of 'unpopular' attributes Popularity of an attribute A = number of times A is chosen as dominant in a query depends on the query distribution Descriptors associated with low popularity attributes are updated with lower frequency Popularity is dinamically refined in a distributed fashion by the nodes receiving the queries estimated at target nodes receiving the query and sent back to publishing nodes by put-reply messages

SRDS EVALUATION • testing environment: Grid 5000 Platform, nodes belons to different Grid 5000 clusters • all nodes publish information every 30s • a large fraction of nodes run queries every 100 ms.

JOB DIRECTORY SERVICE EVALUATION • 20-120 nodes belonging to two clusters of the Grid 5K platform • each node performs publications over the DHT at fixed 30 seconds rate • time interval between different requests 200 milliseconds • Latency of different operations are measured • AddJob requires a set of put/get operations • RequestJob: a single DHT get

CONCLUSIONS SRDS: a service and resourse discovery support developed for the XtreemOs distributed operating system Provides scalable and customisable information query support over large platforms Future works: testing SRDS on a large computing platform dynamic definition of namespaces on different DHTs definition of hierarchical name spaces investigation of further strategies for range queries (multi attribute range and neighbours query)‏

Universit à degli Studi di Pisa Dipartimento di Informatica

Universit à degli Studi di Pisa Dipartimento di Informatica

Presentation Transcript

Paolo Ferragina Dipartimento di Informatica Università di Pisa

UNIVERSIT Á DEGLI STUDI DI SALERNO

UNIVERSIT À DEGLI STUDI DI SALERNO

Dipartimento di Informatica - Università degli studi di Torino