EGEE JRA1 Activity Frédéric Hemmer EGEE Middleware Manager

EGEE JRA1 Activity Frédéric HemmerEGEE Middleware Manager DILIGENT Kickoff Meeting 6-8 September 2004, Pisa, Italy www.eu-egee.org EGEE is a project funded by the European Union under contract INFSO-RI-508833

Contents • JRA1 Activity • JRA1 in EGEE • Objectives/Organization • Deliverables • Timing • Processes • The gLite components • Architectural design • Current Implementations • From Prototype to Release • Current prototype status • Integration and testing • Release plan • Relationship to LCG-2 • Answers to specific questions from Pasquale Padano Diligent Kickoff Meeting, 6-8 September 2004 - 2

EGEE Project Structure 24% Joint Research 28%Networking • JRA1: Middleware Engineering and Integration - 17% • JRA2: Quality Assurance - 1.5% • JRA3: Security - 3% • JRA4: Network Services Development - 2.5% • NA1:Management • NA2:Dissemination and Outreach • NA3: User Training and Education • NA4:Application Identification and Support • NA5:Policy and International Cooperation Emphasis in EGEE is on operating a production grid and supporting the end-users 48% Services • SA1: Grid Operations, Support and Management • SA2: Network Resource Provision Diligent Kickoff Meeting, 6-8 September 2004 - 3

The pilot applications • High Energy Physics with LHC Computing Grid (www.cern.ch/lcg) relies on a Grid infrastructure to store and analyse petabytes of real and simulated data. LCG is a major source of resources, requirements and a hard deadlines with no conventional solution available • In Biomedics several communities are facing equally daunting challenges to cope with the flood of bioinformatics and healthcare data. Need to access large and distributed non-homogeneous data and important on-demand computing requirements Diligent Kickoff Meeting, 6-8 September 2004 - 4

VDT EDG . . . LCG-1 LCG-2 EGEE-1 EGEE-2 AliEn LCG . . . Globus 2 based Web services based EGEE EGEE Implementation • From day 1 (1st April 2004) Production grid service based on the LCG infrastructure running LCG-2 grid middleware (SA) LCG-2 will be maintained until the new generation has proven itself (fallback solution) • In parallel develop a “next generation” grid facility Produce a new set of grid services according to evolving standards (Web Services) Run a development service providing early access for evaluation purposes Will replace LCG-2 on production facility in 2005 Diligent Kickoff Meeting, 6-8 September 2004 - 5

EGEE Middleware Implementation • LCG-2 • Current base for production services • This is providing essential experience on operating and managing a global grid service • Evolved with certified new or improved services from the preproduction • Pre-production Service • Early application access for new developments • Certification of selected components from gLite • Starts with LCG-2 • Migrate new mware in 2005 • Organising smooth/gradual transition from LCG-2 to GLite for production operations LCG-2 (=EGEE-0) 2004 prototyping prototyping product 2005 product LCG-3 (=EGEE-x?) Diligent Kickoff Meeting, 6-8 September 2004 - 6

Objectives of the EGEE Middleware activity • Provide robust, supportable middleware components • Select, re-engineer, integrate identified Grid Services • Evolve towards Services Oriented Architecture • Adopt emerging OGSI standards* • Multiple platforms • Selection of Middleware based on requirements of • The Applications (Bio & HEP) • In particular requirements from LCG’s ARDA & HepCALII • The Operations • E.g. deployment, updates, packaging, etc.. • Support and evolve of the middleware components • Evolution towards OGSI* • Define a re-engineering process • Address multiplatform, multiple implementations and interoperability issues • Define defect handling processes and responsibilities *: Now questioned given the WSRF announcement on January 20, 2004. The strategy is to use plain Web Services and review the situation towards the end of the year (GT4). Diligent Kickoff Meeting, 6-8 September 2004 - 7

EGEE Middleware SoftwareClusters • Hardening and re-engineering of existing middleware functionality, leveraging the experience of partners • Activity concentrated in few major centers and organized in “Software clusters” • Key services: • Data Management (CERN) • Information Collection (UK) • Resource Brokering, Accounting (Italy-Czech Republic) • Quality Assurance (France) • Grid Security (Northern Europe) • Middleware Integration (CERN) • Middleware Testing (CERN) Diligent Kickoff Meeting, 6-8 September 2004 - 8

JRA1 Partners efforts Diligent Kickoff Meeting, 6-8 September 2004 - 9

Milestones and Deliverables for 1st Year Diligent Kickoff Meeting, 6-8 September 2004 - 10

Middleware & ARDA • Architecture Roadmap Towards Distributed Analysis • ARDA report has influenced considerably the EGEE Middleware activity • Reference included in the Technical Annex • Group of Middleware providers met as of December 2003 • Monthly meetings (design & implementation) • Goal to define and provide Middleware components as described in the ARDA report • Participants from experienced Middleware providers: AliEn, EDG, VDT… • Started providing a prototype Middleware with fast release cycles • ARDA Project has been established • It is a distinct project, focused on the usage of the Middleware within the HEP LHC experiments • Providing resources to HEP to help delivering end to end analysis prototypes • Providing an organization to discuss and agree on Middleware components Diligent Kickoff Meeting, 6-8 September 2004 - 11

Characteristics of the new middleware • Integrate/Reengineer/Develop a lightweight stack of generic middleware useful to LHC experiments and BioMedicals based upon existing components • Biomedical applications have important security requirements (e.g. confidentiality) that need to be addressed. • Focus is on re-engineering and hardening existing components • Early prototype and fast feedback • Use a service oriented approach A note on OGSI/WSRF/WS/…. • Still discussing – nothing has settled yet • Need to take a step back • Focus on the service decomposition, semantics, interplay rather than the envelope • WS seems to provide a useful abstraction • Widely used in industry, Grid projects, Internet computing (Google, Amazon,…) • Need to follow standardization efforts to be able to adopt them once settled • WS-I seems to be the most obvious standard to adhere to for the time being Diligent Kickoff Meeting, 6-8 September 2004 - 13

Design Team • Formed in December 2003 • Current members: • UK: Steve Fisher • IT/CZ: Francesco Prelz • Nordic: David Groep • VDT: Miron Livny • CERN: Predrag Buncic, Peter Kunszt, Frédéric Hemmer, Erwin Laure • Monthly meetings in EU or USA • Ad-hoc participation of experts from Globus, POOL, Security as needed • Started service design based on component breakdown defined by the LCG ARDA RTAG • Leverage experiences and existing components from AliEn, VDT, and EDG. • A workingdocument • Overall design & API’s • https://edms.cern.ch/document/458972 Diligent Kickoff Meeting, 6-8 September 2004 - 14

Design Team Approach • Started intense technical discussion to • Break down the proposed architecture to real components • Identify critical components (and what existing software to use for the first instance of a prototype) • Define semantics and interfaces of these component • Focus on key services discussed; exploit existing components • Initially an ad-hoc prototype installation at CERN and Madison • First instance made available in May 2004 • Open only to a small user community • Still debugging many (trivial – and non trivial) problems • Expect frequent changes (also API changes) based on user feedback and integration of further services • Currently ~2 releases/month • Enter a rapid feedback cycle • Continue with the design of remaining services • Enrich/harden existing services based on early user-feedback • Prototype will be used as a vehicle to evolve the software towards what will be EGEE Middleware • Based on Operations and real user feedback from HEP & BioMedicals • With short release cycles • Plan is to deploy prototype on the LCG pre-production service • Target is September 2004 for some components Diligent Kickoff Meeting, 6-8 September 2004 - 15

Integration • A master Software Configuration Plan has been elaborated • Compliant with internationally agreed standards (ISO 10007-2003 E, IEEE SCM Guidelines series) • Most EGEE stakeholders have already been involved in the process to make sure everybody is aware of, contributes to and uses the plan • Most Middleware has evolved to be SCM compliant • An EGEE JRA1 Developer's Guide is being finalized in collaboration with JRA2 (Quality Assurance) based on the SCM Plan • SCM Contents • Configuration and Version Control • Build Systems • Release Process • Other Configuration and Change Control Procedures • https://edms.cern.ch/document/446241 Diligent Kickoff Meeting, 6-8 September 2004 - 17

Testing • The 3 initial testing sites are CERN, NIKHEF and RAL • More sites can join the testing activity at a later stage ! • Must fulfil site requirements • Testing activities will be driven by the test plan document (delivered last week: https://edms.cern.ch/file/473264/1/EGEE-JRA1-TEC-473264-Testplan-v1.0.pdf) • Test plan developed based on user requirements documents: • Application requirements from NA4: HEPCAL I&II, AWG documents, Bio-informatics requirements documents from EDG • Deployment requirements being discussed with SA1 • ARDA working document for core Grid services • Security: work with JRA3 to design and plan security testing • The test plan is a live document: it will evolve to remain consistent with the evolution of the software • Coordination with NA4 testing and external groups (e.g. Globus) established Diligent Kickoff Meeting, 6-8 September 2004 - 18

Testing (II) Diligent Kickoff Meeting, 6-8 September 2004 - 19

EGEE Middleware Architecture Diligent Kickoff Meeting, 6-8 September 2004 - 20

Architecture Guiding Principles • Lightweight (existing) services • Easily and quickly deployable • Use existing services where possible asbasis for re-engineering • Interoperability • Allow for multiple implementations • Resilience and Fault Tolerance • Co-existence with deployed infrastructure • Run as an application (e.g. on LCG-2; Grid3) • Reduce requirements on site components • Basically globus and SRM • Co-existence (and convergence) with LCG-2 and Grid3 are essential for the EGEE Grid service • Service oriented approach • WSRF still being standardized • No mature WSRF implementations exist to date, no clear picture about the impact of WSRF hence: start with plain WS • WSRF compliance is not an immediate goal, but we follow the WSRF evolution • WS-I compliance is important Diligent Kickoff Meeting, 6-8 September 2004 - 21

gLite Services Service decomposition based on ARDA blueprint Diligent Kickoff Meeting, 6-8 September 2004 - 22

WMS Diligent Kickoff Meeting, 6-8 September 2004 - 23

CE • Works in push and pull mode • Site policy enforcement CEA … Computing Element Acceptance JC … Job Controller MON … Monitoring LRMS … Local Resource Management System Diligent Kickoff Meeting, 6-8 September 2004 - 24

Data Management • Scheduled data transfers (like jobs) • SRM based storage • Reliable file transfer Diligent Kickoff Meeting, 6-8 September 2004 - 25

Storage Element Interfaces • SRM interface • Management and control • SRM (with possible evolution) • Posix-like File I/O • File Access • Open, read, write • Not real posix (like rfio) Control SRM interface POSIXAPI File I/O User rfio dcap chirp aio dCache NeST Castor Disk Diligent Kickoff Meeting, 6-8 September 2004 - 26

strategic QoS tactical Portability Storage Element • ‘Strategic’ SE • High QoS: reliable, safe.. • Has usually an MSS • Place to keep important data • Needs people to keep running • Heavyweight • ‘Tactical’ SE • Volatile, ‘lightweight’ space • Enables sites to participate in an opportunistic manner • Best effort • Collaboration with LCG Deployment Diligent Kickoff Meeting, 6-8 September 2004 - 27

Metadata Catalog Metadata Replica Catalog File Catalog LFN GUID SURL SURL SURL Catalogs • File Catalog • Filesystem-like view on logical file names • Replica Catalog • Keep track of replicas of the same file • (Meta Data Catalog) • Attributes of files on the logical level • Boundary between generic middleware and application layer Diligent Kickoff Meeting, 6-8 September 2004 - 28

Security PEP and PDP will in most cases be part of the secured service, rather than separated services Diligent Kickoff Meeting, 6-8 September 2004 - 29

Information and Monitoring • Exploit GGF GMA architecture • Simple producer/consumer model Diligent Kickoff Meeting, 6-8 September 2004 - 30

Current Implementations • WMS • AliEn TaskQueue • EDG WMS (plus new TaskQueue and Information Supermarket) • EDG L&B • CE • Globus Gatekeeper • CondorG (CondorC) • “Pull component” • AliEn CE • EGEE MON (tbi by IT/CZ cluster) Diligent Kickoff Meeting, 6-8 September 2004 - 31

SE External SRM implementations dCache, Castor, … LCG disk pool manager (tbi) AliEn aio (re-factored to gLite-I/O)( Catalogs AliEn FileCatalog RLS Combined Catalog (tbi) Data Scheduling Stork Phedex (concepts) Data mgmt interface and VO data scheduler (tbi) Data Transfer GridFTP Metadata Catalog Simple interface defined Assumption that it will be managed by experiments Information & Monitoring R-GMA For application monitoring the aim is to plug in other systems as well. Current Implementations Cont’d Diligent Kickoff Meeting, 6-8 September 2004 - 32

Security VOMS as Attribute Authority and VO mgmt myProxy as proxy store gridmapfile and GSI security as enforcement Plan is to move to more fine-grained authorization (e.g. ACLs) Plans with globus to provide a set-uid service on CE Accounting EDG HLR User Interface AliEn shell CLIs and APIs Move to autogenerated APIs from WSDL GAS New development Package manager Explore existing solutions Currents Implementations Cont’d Diligent Kickoff Meeting, 6-8 September 2004 - 33

Deployment considerations • Interoperability and co-existence • Exploit different service implementations • E.g. Castor and dCache SRM implementations • Require minimal support from development environment • Sites required to run globus and SRM (might not be required for tactical storage) • Flexible service deployment • Multiple services running on the same physical machine (if possible) • Platform support • Goal is to have portable middleware • Building & Integration on RHEL 3 and windows • Initial testing (at least 3 sites) using different Linux flavors (including free distributions) • Service autonomy • User may talk to services directly or through other services (like access service) • Open source software license • Based on EDG license Diligent Kickoff Meeting, 6-8 September 2004 - 34

From Prototype to Release • Prototype setup at 2 sites (CERN & Madison) • ~45 users registered • 2nd release mid August • Many bugfixes • New functionalities • More nodes (more powerful) being added at CERN • EDG WMS added at CNAF • Being ported to SLC3 (by end of September) • Second VO being set up (core services at Wisconsin) • Will use Globus RLS Diligent Kickoff Meeting, 6-8 September 2004 - 35

From Prototype to Release • Continuous integration system set up • First results being put forward to testing team • SLC3 version of prototype provided • SCM convergence being reached • SCM plan: https://edms.cern.ch/document/446241 • Common service configuration being worked out • Deployment scenarios being worked out • Testing sites (RAL, NIKHEF, CERN) up and running • Focus on prototype installation and testing • >50 bugs/problems/change requests confirmed • Plans being made for focused testing of gLite components for pre-production service • Test plan being defined • Based upon architecture document, release plan, and NA4 requirements • https://edms.cern.ch/document/477697/ Diligent Kickoff Meeting, 6-8 September 2004 - 36

Release Plan • Incremental Releases • Being tracked at weekly basis • https://edms.cern.ch/document/468699 • Priorities for testing and integration defined • Goal is to have core services by the end of this year Diligent Kickoff Meeting, 6-8 September 2004 - 37

LCG LCG-2 (=EGEE-0) 2004 prototyping prototyping product 2005 product LCG-3 EGEE-1 gLite and LCG-2 LCG-2focus on production, large-scale data handling • Theservice for the 2004 data challenges • Provides experience on operating and managing a global grid service • Development programme driven by data challenge experience • Data handling • Strengthening the infrastructure • Operation, VO management • Evolves to LCG-3 as components progressively replaced with new middleware-- target is to minimise the discontinuities of migration to the new generation • Aim for migration plan by end of year gLitefocus on analysis • Developed by EGEE project in collaboration with VDT (US) • LHC applications and users closely involved in prototyping & development (ARDA project) • Short development cycles • Co-existence with LCG-2 • Profit as far as possible from LCG-2 infrastructure, experience  Ease deployment – avoid separate hardware • As far as possible - completed components integrated in LCG-2  improved testing, easier displacement of LCG-2 les robertson - cern-it-38 Diligent Kickoff Meeting, 6-8 September 2004 - 38

Pre-Production Service • Independent components can move from prototype to SA1 for deployment on pre-production service after testing • Rule of thumb: release plan + 1 month • First components (tentatively end of September): • CE • File I/O Diligent Kickoff Meeting, 6-8 September 2004 - 39

Summary • Next generation middleware being designed and assembled • Prototype first tangible outcome • BUT this is a PROTOTYPE! • Architectural and design work well advanced • Architecture document sent to EU • Subject to changes • Design document (draft) exists: https://edms.cern.ch/document/487871/ • Feedback welcomed • Incremental changes to prototype • Feedback from applications and operations essential! • Detailed release plan worked out • https://edms.cern.ch/document/468699 • First components for pre-production service during autumn • Continuous integration and testing scheme defined and adopted • Technology Risk • Will WS allow for all upcoming requirements? • Divergence to standards Diligent Kickoff Meeting, 6-8 September 2004 - 40

Links • JRA1 homepage • http://egee-jra1.web.cern.ch/egee-jra1/ • Architecture document • https://edms.cern.ch/document/476451/ • Release plan • https://edms.cern.ch/document/468699 • Prototype installation • http://egee-jra1.web.cern.ch/egee-jra1/Prototype/testbed.htm • Test plan • https://edms.cern.ch/document/477697/ • Design document • https://edms.cern.ch/document/487871/ Diligent Kickoff Meeting, 6-8 September 2004 - 41

Questions from Pasquale Pagano/CNR Diligent Kickoff Meeting, 6-8 September 2004 - 42

Testbeds • 1.1: Will the public testbed (like the GILDA testbed) be equipped with gLite services, i.e. the DILIGENT project will have access to a grid infrastructure based on gLite services? • Presumably yes, when the Middleware will have stabilized • The plan is first to deploy gLite (or part of) onto the SA1 preproduction service first. • DILIGENT will however have access to the development/prototype testbed if they wish • But would need to be coordinated in some way (via NA4?) Diligent Kickoff Meeting, 6-8 September 2004 - 43

LCG-2 (=EGEE-0) 2004 prototyping prototyping product 2005 product LCG-3 EGEE-1 Testbeds (II) • 1.2: The gLite software version 1 is planned to be released at PM12. Will the software be installed on the ‘EGEE production infrastructure’ at that time? If not, when do you plan to perform this step? • Almost certainly not. The plan is to gradually integrate with EGEE/LCG infrastructure during 2005. Diligent Kickoff Meeting, 6-8 September 2004 - 44

Testbeds (III) • 1.3: The deliverable ‘EGEE Middleware Architecture’ states that the gLite services can be deployed and used independently. Can we have more details? Can we install and use one CE without other services needed for authentication and authorization? • Some of the services of course depends on each other. You cannot use CE without authentication unless you modify the code. But you can use authentication services without using anything else ;-) • Catalogs for example can be used independently from anything else; same for Storage Elements. Diligent Kickoff Meeting, 6-8 September 2004 - 45

Testbeds (IV) • 1.4: Can we have more technical details on the hosting environment that will be used to host gLite services? What about relations with other GRID initiatives and tools, e.g. Globus Toolkit xy, Condor-G, Nimrod-G, etc.? • In general gLite will be available for SLC3 (RHE3), with gcc and icc, as well as Windows (compile or client only). Some services will require Databases (MySql, Oracle likely) or/and application servers (Tomcat/Axis). There are currently dependencies on many other packages, such as Globus 2.4, Condor-G/C. A detailed list of external dependencies will be made available: http://www.glite.org/glite/documentation/external-dependencies.htm • For data management the web services will run in application containers (Tomcat), other services will run on the operating system without a hosting environment (e.g. Glite-I/O). In data management and transfer services we work in close collaboration with other projects to interoperate with and use their tools: Globus (gridftp, RLS), Condor (Stork) and SRM implementations (GGF/GSM). Diligent Kickoff Meeting, 6-8 September 2004 - 46

External dependencies Diligent Kickoff Meeting, 6-8 September 2004 - 47

VO (I) • 2.1: Do you provide any tool for VO creation and management? • Yes, VOMS and VOMS admin in general. But admittedly, this needs to be improved, in a view to support even short lived VO’s. • 2.2: Can we have details on VOMS and its use in gLite? • VOMS allows a VO manager to handle the registration of users and put them into groups, assign them roles. Users can acquire a VOMS credential at starting their session. The extra information in this credential is to be used for authorization and accounting in GLite services (job submission, catalogs, CE, SE). • There is a pretty good presentation of VOMS in general at http://www.dma.unina.it/~murli/SummerSchool/presentations/GGFSchool-new.ppt Diligent Kickoff Meeting, 6-8 September 2004 - 48

VO (II) • 2.3: The gLite services are organised as a Service Oriented Architecture. Will a service Registry exist? What about its scope? Who will manage and maintain it? • We currently exploit R-GMA as VO service registry.Alternative solutions (like the LCG-2 ldap based infrastructure) arepossible as well. • Services will register themselves in R-GMA • 2.4: Which is the process that a real organisation has to follow in order to add a resource to a specific VO? • the organization has to configure its CE and SE to accept requests from a VO and has to register this resource in the VO's service registry. Diligent Kickoff Meeting, 6-8 September 2004 - 49

VO (III) • 2.5: What is the Policy Repository component? What are the VO-level policies information that it can hold? • The global policy repository holds VO related services policies, like • VO/Production group has priority over VO/Users group‘ for job submission in the queue assigned to the VO • VO/Users group has a quota of 30%; VO/Production group has 95% quota' from the SE space assigned to the VO • 2.6: Where service-level policies are stored? • Service level policies have two aspects: VO related and site (owner) related. The VO related aspects are stored in a VO-level policy repository, while the site specific part is stored locally on the site (or within the service as part of its configuration). Site related policies override VO related policies. Diligent Kickoff Meeting, 6-8 September 2004 - 50

Technical aspects (I) • 3.1: Why only some gLite services expose a Web Service interface? • Some services use already established protocols (e.g. gridftp and rfio) and we keep them for compatibility reasons (already deployed clients and servers). • Some services will just be too slow with web Services (e.g. Posix I/O) • 3.2: Will you provide any tool for writing new services (e.g.: a service container, reusable classes or interfaces that implement standard protocols, deployment tools, etc.)? Can the DILIGENT team be able to extend the gLite framework with other services? • gLite is NOT a framework, rather a set of Services • We use existing service containers, such as Tomcat (and Apache) and do not intend to write any new one. JRA3/Security will provide reusable components for extending these framweorks with grid specific authentication (proxy certificate, delegation) and authorization (VOMS attributes, policies) tasks, which the DILIGENT team can reuse as-is. • We can also offer our experience and examples of patterns for building, deploying and configuring services and clients in various languages and environments, but they should not be treated like frameworks. Diligent Kickoff Meeting, 6-8 September 2004 - 51

Technical aspects (II) • 3.3: The DILIGENT project needs to dynamically deploy and activate services on the grid. We would like to adopt mechanisms used for running jobs in order to achieve this activity, i.e. select the best CE, move, deploy and run the service. What is the support offered by gLite? • running web services from a WN could be problematical unless the right ports are open and both inbound and outbound connectivity is offered. It is expected that the client APIs will be used on a WN but that a service will not. However there is nothing (other than firewalls and network policies) preventing it. • (for data management) we have not foreseen running services as jobs, therefore we didn't plan to provide any support for this. • However application containers (e.g. Tomcat) offer the possibility of remote deployment and configuration through their own mechanism and one can build upon them to build dynamic services. We do not have anything special (apart the common security configuration) in our (data management) services, which would prevent this kind of usage. • Diligent services might be dynamically deployed in the same wayas our schedD. This is certainly a topic we need a more in depthdiscussion with them. Diligent Kickoff Meeting, 6-8 September 2004 - 52

EGEE JRA1 Activity Frédéric Hemmer EGEE Middleware Manager