The NorduGrid project:Using Globus toolkit for building Grid infrastructurepresented by Aleksandr Konstantinov Mattias Ellert Aleksandr Konstantinov Balázs Kónya Oxana Smirnova Anders Wäänänen
Introduction • Launched in spring 2001, with the aim of creating a Grid infrastructure in the Nordic countries. • Partners from Denmark, Norway, Sweden, and Finland. • Powered mainly by ATLAS groups (Lund, Copenhagen, Stockholm, Uppsala, Oslo). • Relatively short term project - ends in October 2002. • Relies on very limited human resources (3 full-time researchers, few part-time ones) with funding from NorduNet2. • More info http://www.nordugrid.org/
Introduction (cont.) • The purpose of the project is to create and operate functional testbed. • Use approved tools => Globus ToolkitTM (developed at Argonne National Laboratory and University of Southern California) and tools developed at European Data Grid project. • Aim at High Energy Physics applications - take into account while choosing what to implement first. • No temporary solutions (it is better not to implement something, than to be forced to provide backward compatibility for limited solution).
Globus ToolkitTM evaluation • Widely accepted de-facto standard for Grid computing. • Provides collection of (mostly) robust protocols, libraries and low-level services. • Security built-in. • Continuously evolving (??). • Missing few important high-level services: • grid-level scheduler • job data stagein/stageout • user-friendly grid entry points (simple user-interface, web portals, etc.) • grid-level authorization system • grid-level accounting and quotas
NorduGrid requirements • No single point of failure • No central sandbox (unlike EDG) • Lightweight brokering integrated into User Interface • Job should not be Computing Element (cluster) specific • Non grid-aware jobs allowed ("grid functionality" is provided by middleware on Computing Element) • Job runs in as restrictive environment as possible (do not expect network on computing nodes) • Minimal environment is provided on Computing Element • Adequate and full (enough) information provided by InfoSystem • Natural computing unit is cluster • Queue, job and user information
Information System • NorduGrid operates an MDS based, hierarchically distributed Information System: • new information model for clusters, queues, jobs, users, SE, RC • efficient providers • all the job monitoring, resource discovery, status monitoring and brokering are exclusively built on top of the MDS • MDS hierarchy with dynamic site registrations
Information System(example) cluster entry job entry user entry queue entry
Grid Manager - cluster middleware • Provide job control and data handling functionality (HEP applications requirements are first priority). • The Grid Manager is based on Globus ToolkitTM libraries and services. The following parts of Globus are used: • GridFTP- fast and reliable data access for Grid • GASS Copy interface - support for different data access protocols • Replica Catalog - metadata storage • GRAM - resource request • RSL - expandable Resource Specification Language
Grid Manager (features) • Stage in input data and executables. Possible sources: • Job submission machine. • GridFTP (preferred), FTP, HTTP or HTTPS servers. • Files registered in Globus Replica Catalog. Secure authentication. Destination is chosen automatically or can be forced. • Stage out output data. Possible destinations: • Keep on cluster till user downloads. • GridFTP, FTP, HTTP or HTTPS servers. • Files can be registered in Globus Replica Catalog. Destination and protocol are obtained from Location information.
Grid Manager (features) • E-mail notification of job status changes. • Support for software runtime environment configuration. • Jobs will be started with environment setup properly for requested application • Customizable GridFTP server • local access through plugins • certificate oriented local file system access plugin • job submission/access plugin - start job/upload input files/download output files through the same interface • Limitation: Data is handled only at that beginning and end of the job. User must provide information about input and output data.
Extensions to RSL(evaluation) RSL stands for Resource Specification Language. Introduced to communicate job requirements to the Global Resource Allocation Manager (GRAM). Useful features: • Allows basic logical expressions • Set of attributes is expandable • Unknown attributes are passed through. • Allows different parts to be processed at different levels. • Can be used to assist in writing brokers or filters which refine an RSL specification
Extensions to RSL(new attributes) To support additional features new attributes introduced. The most important are inputFiles=(<file> [<location>]) ...- list of files to be transferred to the computing node from a given location. outputFiles=(<file> [<location>]) ... -list of files to be preserved after the job completion and transferred to a given location. executables=<file1> <file2> ...-list of files to be given executable permissions. notify=<options> <email> ... -E-mail notification on job status change.
Extensions to RSL (new attributes) runTimeEnvironment=<string>... - application-specific runtime environment (e.g., ATLAS-3.2.1) middleware=<string> -required middleware (e.g., NorduGrid-0.3.0) cluster=<string> -specific cluster request rerun=<number> -number of attempts to re-run the job lifeTime=<number> -maximum time for the session directory to remain on the execution node (can not override local policy) ftpThreads=<number> -number of GridFTP threads to be used for file transfers
User Interface The NorduGrid toolkit user interface consists of a set of commands that can be executed from the command line • ngsub - for job submission • ngstat - to obtain the status of jobs and clusters • ngcat - to display the stdout or stderr of a running job • ngget - to retrieve the result from a finished job • ngkill - to kill a running job • ngclean - to delete a job from a remote cluster • ngsync - to recreate local information about jobs
User Interface • Job request is done through xRSL • processes user-level xRSL request and transforms to one suitable for GM • user-friendly values for some attributes • conditional submission and xRSL transformation • Performs brokering • analyzes information about the different clusters obtained from the MDS servers • from all suitable queues one is chosen randomly, with a weight proportional to the amount of free computing resources • Passes modified job request to GM through GRAM or GridFTP interface and uploads input files.
User Authentication Management • Using Globus certificates • NorduGrid Certification Authority established • Access control through gridmapfiles • User access control is delegated to Virtual Organization managers • Gridmapfiles are generated automatically from VO database • GSI enabled secure LDAP server • contains the Subject Names of the user's certificates • VO managers • User Groups and Group Managers • Local site adminisrators have total control over their gridmapfiles
Applications It is possible to run any application with predefined set of input and output data • From as simple as "Hello World" ngsub '&(executable=/bin/echo)(arguments="Hello World")(stdout=out.txt)'
Applications (cont.) • to as difficult as Atlas Data Challenge ngsub '&(executable = prod)(arguments = "0002" "2" "100") (stdout = atlas.0002.log)(join = yes) (replicacollection = ldap://grid.uio.no/lc=ATLAS,rc=NorduGrid,dc=nordugrid, dc=org) (inputfiles = ("atlsim.makefile" "") ("atlas.kumac" "") ("gen0017_1.root" "rc:///gen0017_1.root") ) (outputfiles = ("atlas.0002.zebra" "rc:///results/atlas.0002.zebra") ("atlas.0002.his" "") ) (runtimeenvironment="ATLAS-3.2.0") (middleware="NorduGrid")'
Conclusions • The minimal environment for Grid computing is established. • Globus tools alone are not enough for convenient usage, but provide solid base. • Additional layer of tools/services were developed to provide required infrastructure. • A lot of things to do: • Runtime data handling. • Accounting. • Better support for different LRMS. • Enhanced Information System - more stability, access control, better and richer information providers etc. • ...