1 / 29

Overview of the gLite middleware

Overview of the gLite middleware. Yaodong Cheng, IHEP EUChinaGrid Tutorial Beijing, 25.11.2006. outline. Background and approach adopted Components Overview Status of Deployment Summary and Conclusions. Grid Aim. Grid Systems & Applications aim is to:. Integrate Virtualise Manage.

zody
Télécharger la présentation

Overview of the gLite middleware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of the gLite middleware Yaodong Cheng, IHEP EUChinaGrid Tutorial Beijing, 25.11.2006

  2. outline • Background and approach adopted • Components Overview • Status of Deployment • Summary and Conclusions

  3. Grid Aim • Grid Systems & Applications aim is to: • Integrate • Virtualise • Manage RESOURCEs and SERVICEs across different Vos VO: Virtual Organization – Individuals and/or Institutes having direct access to resources

  4. Many VOs need sharing of resource through services Accessing Allocating Monitoring Accounting Grid Middleware –Layer between servicesand physical resources gLite: Lightweight Middleware for Grid Computing Grid Middleware

  5. gLite Background • gLite (pronounced "gee-lite") : next generation Lightweight middleware for grid computing • Exploit experience and existing components from: • GGF(Global Grid Forum)/OGF(Open Grid Forum) • Open Grid Services Architecture - OGSA • VDT (Condor, globus) 、EDG/LCG、AliEN、… • gLite is a distribution that combines components from many different providers!

  6. gLite Architecture • The gLite Grid services follow a Service Oriented Architecture • facilitate interoperability among Grid services • allow easier compliance with upcoming standards • Pluggable components • Architecture is not bound to specific implementations • services are expected to work together • services can be deployed and used independently • Business friendly open source license • Plan to switch to Apache-2

  7. LCG-2 gLite 2004 prototyping prototyping product 2005 product 2006 gLite 3.0 The Release of gLite3.0 • Convergence of LCG 2.7.0 and gLite 1.5.0 in spring 2006 • Continuity on the production infrastructure ensured usability by applications • Initial focus on the new Job Management • Migration to the ETICS build system • ETICS project started in January • Reorganization of the work according to the new process • EGEE Technical Coordination Group and Task Forces • Start of the EGEE SA3 Activity for integration and certification • “Continuous release process” • No big-bang releases!

  8. gLite process Development Directives Error Fixing Software Serious problem Integration Certification Pre-Production Deployment Packages Testbed Deployment Problem Fail Production Infrastructure Pre-Production Deployment Fail Integration Tests Pass Functional Tests Pass Fail Installation Guide, Release Notes, etc Scalability Tests Release Pass

  9. TCG & EMT • The EGEE Technical Coordination Group (TCG) defines the priorities for middleware development and certification • Members from LHC experiments and other EGEE-NA4 applications, and form EGEE Technical activities • Collects requirements from the applications • Prioritizes the requirements • Approves the JRA1 and SA3 work plans (Focus on short term) • The Engineering Management Team (EMT) coordinates the production of gLite releases • Members from SA3, JRA1, SA1 and VDT • Decides what and when to release components and patches • Follows critical bugs fixing individually • Works according to TCG directives

  10. Summary of current activities • Security • Enabling glexec on Worker Nodes • Address user and security policy requirements in VOMS,VOMSAdmin • Proxy renewal library repackaged without WMS dependencies • Shibboleth short-lived credential service and interaction with VOMS • Job Management • Improvement in functionality and performance on WMS and LB • Preparation for the deployment of the DGAS accounting system • Development and test on the preview test-bed of the new components • ICE-CREAM, G-PBox including LCAS/LCMAPS plug-ins, Job Provenance • Data Management (mainly from LCG) • Adding support for SRM v2.2 in DPM, GFAL and FTS • Working on new Encrypted Data Storage based on GFAL/LFC • Improvements in LFC distributed service • FTS proxy renewal and delegation • Information • Improvements in R-GMA • Development for GLUE 1.3 (from LCG)

  11. Middleware in EGEE Applications • Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware • Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory • Foundation Grid Middleware will be deployed on the EGEE infrastructure • Must be complete and robust • Should allow interoperation with other major grid infrastructures • Should not assume the use of Higher-Level Grid Higher-Level Grid Services Workload Management Replica Management Visualization Workflows Grid economies etc. Foundation Grid Middleware Security model and infrastructure Computing (CE) & Storage Elements (SE) Accounting Information providers and monitoring

  12. Grid Foundation: Security • Authentication based on X.509 PKI infrastructure • Certificate Authorities (CA) issue (long lived) certificates identifying individuals (much like a passport) • Commonly used in web browsers to authenticate to sites • Trust between CAs and sites is established (offline) • In order to reduce vulnerability, on the Grid user identification is done by using (short lived) proxies of their certificates • Proxies can • Be delegated to a service such that it can act on the user’s behalf • Include additional attributes (like VO information via the VO Membership Service VOMS) • Be stored in an external proxy store (MyProxy) • Be renewed (in case they are about to expire)

  13. Bare certificates are not enough for defining user capabilities on the Grid Users belong to VO’s, to groups inside a VO and may have special roles VOMS provides a way to add attributes to a certificate proxy: mutual authentication of client and server VOMS produces a signed Attribute Certificate (AC) the client produces a new proxy that contains the attributes The attributes are used to provide the user with additional capabilities according to the VO policies. Authentication Request OK C=IT/O=INFN /L=CNAF/CN=Pinco Palla/CN=proxy Query AuthDB VOMSAC VOMSAC VOMS client Grid Foundation: VOMS

  14. 2171 LDAP 2172 LDAP 2173 LDAP Update DB & Modify DB Swap DBs 2170 Port Fwd 2170 Port Fwd Grid foundation: Information Systems • Generic Information Provider (GIP) • Provides LDIF information about a grid service in accordance to the GLUE Schema • BDII • LDAP database that is updated by a process • More than one DBs is used separate read and write • A port forwarder is used internally to select the correct DB GIP Cache Provider Plugin LDIF File Config File

  15. Grid foundation: Information Systems • R-GMA: provides a uniform method to access and publish distributed information and monitoring data • Used for job and infrastructure monitoring in gLite 3.0 Working to add authorization • Service Discovery: • Provides a standard set of methods for locating Grid services • Currently supports R-GMA, BDII and XML files as backends • Will add local cache of information • Used by some DM and WMS components in gLite 3.0

  16. Grid foundation: Computing Element • the service representing a computing resource • main functionality is job management (job submission, etc) • may be used by a generic client: an end-user interacting directly with the Computing Element, or the Workload Manager • work in push model or pull model for job submission • LCG-CE: based on GT2 GRAM • gLite-CE: based on GSI enabled Condor-C • CREAM: new lightweight web service CE • BLAH: interfaces the CE and the local batch system • CEMon: Web service to publish status of a computing resource to clients

  17. Grid foundation: Storage Element • Storage Element • Common interface: SRMv1,migrating to SRMv2 • Various implementation from LCG and other external projects • disk-based: DPM, dCache; tape-based: Castor, dCache • Posix-like file access: • Grid File Access Layer (GFAL) by LCG • Support for ACL in the SRM layer • Support for SRMv2.2 is added now. • gLite I/O • Support for ACLs from the file catalog and interfaced to Hydra for data encryption • Not certified in gLite 3.0. To be dismissed when all functionalities are in GFAL.

  18. Grid foundation: Accounting • APEL: Uses R-GMA to propagate and display job accounting information for infrastructure monitoring • Reads LRMS log files provided by LCG-CE and BLAH • Preparing an update for gLite 3.0 to use the files form BLAH • DGAS: Collects, stores and transfers accounting data. Compliant with privacy requirements • Reads LRMS log files provided by LCG-CE and BLAH. • Stores information in a site database (HLR) and optionally in a central HLR. Access granted to user, site and VO administrators • Not yet certified in gLite 3.0. Deployment plan: • certify and activate local sensors and site HLR in parallel with APEL • replace APEL sensors with DGAS (DGAS2APEL) • certify and activate central HLR. Test scalability to the PS scale

  19. High Level Services: Catalogues • File Catalogs • LFC from LCG • In June: interface to POOL • In the summer: LFC replication and backup • Fireman • Not certified in gLite 3.0. To be dismissed when all functionalities are in LFC • Hydra: stores keys for data encryption • Being interfaced to GFAL (finish by July) • Currently only one instance, but in future there will be 3 instances: at least 2 need to be available for decryption • Not yet certified in gLite 3.0. Certification will start soon • AMGA Metadata Catalog: generic metadata catalogue • Joint JRA1-NA4 (ARDA) development. Used mainly by Biomed • Not yet certified in gLite 3.0. Certification will start soon

  20. High Level Services: File transfer • FTS: Reliable, scalable and customizable file transfer • Manages transfers through channels • mono-directional network pipes between two sites • Web service interface • Automatic discovery of services • Support for different user and administrative roles • Adding support for pre-staging and new proxy renewal schema • In the medium term add support for SRMv2, delegation, VOMS-aware proxy renewal

  21. High Level Services: Workload management • WMS helps the user accessing computing resources • Resource brokering, management of job input/output, ... • LCG-RB: GT2 + Condor-G • To be replaced when the gLite WMS proves reliability • gLite WMS: Web service (WMProxy) + Condor-G • Management of complex workflows (DAGs) and compound jobs • bulk submission and shared input sandboxes • support for input files on different servers (scattered sandboxes) • Support for shallow resubmission of jobs • Job File Perusal: file peeking during job execution • Supports collection of information from CEMon, BDII, R-GMA and from DLI and StorageIndex data management interfaces • Support for parallel jobs (MPI) when the home dir is not shared • Deployed for the first time on the PS with gLite 3.0

  22. Direct Acyclic Graph (DAG) is a set of jobs where the input, output, or execution of one or more jobs depends on one or more other jobs A Collection is a group of jobs with no dependencies basically a collection of JDL’s A Parametric job is a job having one or more attributes in the JDL that vary their values according to parameters Using compound jobs it is possible to have one shot submission of a (possibly very large, up to thousands) group of jobs Submission time reduction Single call to WMProxy server Single Authentication and Authorization process Sharing of files between jobs Availability of both a single Job Id to manage the group as a whole and an Id for each single job in the group nodeA nodeB nodeC nodeE nodeD High Level Services: Workflows

  23. Logging and Bookkeeping service Tracks jobs during their lifetime (in terms of events) LBProxy for fast access L&B API and CLI to query jobs Support for “CE reputability ranking“: maintains recent statistics of job failures at CE’s and feeds back to WMS to aid planning Job Provenance: stores long term job information Supports job rerun If deployed will also help unloading the L&B Not yet certified in gLite 3.0. Certification will start when requested by the TCG High Level Services: Job Information

  24. High Level Services: Job Priorities • GPBOX: Interface to define, store and propagate fine-grained VO policies • Based on VOMS groups and roles • Enforcement of policies at sites: sites may accept/reject policies • Not yet certified in gLite 3.0. Certification will start when requested by the TCG. • Current plans: test job prioritization without GPBOX: • Mapping of VOMS groups to batch system shares (via GIDs?) • Two queues (long/short) for ATLAS & CMS • Publish info on the share in the CE GLUE 1.2 schema (VOView) • The gLite WMS is being modified to support GLUE 1.2 • Testing with GPBOX if the “Service Class” is published • WMS match-making depending on submitter VOMS certificate • But no ranking of resources based on priority offered yet • Settings are not dynamic (via e-mail or CE updates)

  25. Status of gLite Deployment • gLite-3.0.0 was released on 4th May.Stage upgrade site from LCG 2.7 • The T1 centers should all have upgraded before 23rd May • The deadline for All other sites (in particular T2s) in SC4 is 31st May. • LCG resources (14/06/2006) • Active Sites: 182 • Available CPU: 30534 • Available Storage (TB): 9018

  26. gLite development plan • Complete migration to VDT 1.3.X and support for SL(C)4 and 64-bit • Complete migration to the ETICS build system • Work according to work plans available at: https://twiki.cern.ch/twiki/bin/view/EGEE/EGEEgLiteWorkPlans • In particular: • Continue work on making all services VOMS-aware • Including job priorities • Improve error reporting and logging of services • Improve performances, in particular WMS and LB • Support for all batch systems in the production infrastructure on the CE • Use the information pass-through by BLAH to control job execution on CE • Complete support to SRM v2.2 • Complete the new Encrypted Data Storage based on GFAL/LFC • Complete and test glexec on Worker Nodes • Standardization of usage records for accounting • Interoperation with other projects and adherence to standards • Collaboration with EUChinaGrid on IPv6 compliance

  27. Summary • gLite 3.0 is an important milestone in EGEE program • New components from gLite 1.X being deployed for the first time on the Production Infrastructure • Address requirements in terms of functionality and scalability • Components deployed for the first time need extensive testing! • New organization in EGEE II • New build and integration environment form ETICS • More controlled software process and certification • Development is client driven (TCG) • Development is continuing to provide increased robustness, usability and functionality • Collaboration with other projects for interoperability and definition/adoption of international standards

  28. Further information • EGEE http://cern.ch/egee • gLite http://cern.ch/glite • euchinagrid: http://www.euchinagrid.org

  29. www.glite.org

More Related