280 likes | 433 Vues
Elena Slabospitskaya IHEP NA3 manager for Russia. An overview of the EGEE infrastructure and middleware. EGEE is funded by the European Union under contract IST-2003-508833. Sources of information. LCG-2 User Guide https://edms.cern.ch/file/454439//LCG-2-UserGuide.html LCG Releases
E N D
Elena Slabospitskaya IHEP NA3 manager for Russia An overview of the EGEE infrastructure and middleware EGEE is funded by the European Union under contract IST-2003-508833
Sources of information LCG-2 User Guide https://edms.cern.ch/file/454439//LCG-2-UserGuide.html LCG Releases http://grid-deployment.web.cern.ch/grid-deployment/cgi-bin/index.cgi?var=releasesLCG-2 Install Notes (for administrators) LCG-2 Manual Installation Guide (for administrators) https://edms.cern.ch/file/434070//LCG2Install.htmlSite with EDG Tutorials http://hep-proj-grid-tutorials.web.cern.ch/hep-proj-grid-tutorials/
Overall 1. GSI – Grid Security Infrastructure 2. Infi\ormation System 3. Job Management 4. Data Management 5. Monitoring System Conclusions
Main Logical Machine Types (1) Distributed system - Acollection of (probably heterogeneous) automata whose distribution is transparent to the user so that the system appears as one local machine. UI CE UI RMS CERN PS CE RB MSU BD I I MSU SE SE Moscow,SINP MSU SE SE Moscow, ITEP Dubna, JINR UI Protvino, IHEP CE CE UI SE SE SE SE
UI – User Interface CE – Grid Gate and Worker Nodes GG – Globus Gatekeeper, Globus Resource Allocation Manager, master server of Local Resource Management System, local Logging and Bookkeepering server SE – Classic Storage Element – GridFTP server SE may control large disk arrays or Mass Storage System(MSS). This storage resources are managed by Storage Resource Manager (SRM). SRM is interacting with OS, MSS and with protocols (to perform file transfer operations) As MSS, LCG-2 support dcache disk pool (GridFTP and rfio), tape archiving system - Castor( GridFTP and rfio) and Enstore(GridFTP ). RB -Resource Broker RMS -Replica Management System BDII – Berkeley DB Information Index PS – proxy server Main Logical Machine Types (2)
How do I login on the Grid ? Two basic concepts: Authentication: Who am I? “Equivalent” to a pass port, ID card etc. Authorisation: What can I do? Certain permissions, duties etc. The Grid Security Infrastructure (GSI) in LCG-2 enables secure authentication and communication over an open network . GSI is based on public key encryption, X.509 certificates, and the Secure Sockets Layer (SSL) communication protocol.
- Provides information about grid resourses and their status - GLUE (Grid Laboratory for a Uniform Environment) schema – common conceptual data model for CE, SE and binding CE-SE. -MDS (Monitoring and Discovery Service) from Globus has been adopted as a provider of IS. - IS implements Glue schema using OpenLDAP – Lightweight Directory Acess Protocol - GRIS – Grid Resource Information System – local on CE and SE - GIIS – Grid Index Information Service – site (CE) - BDII -Berkeley DB Information Index Information System .
A LDAP Information System is based on entries. Each entries describes an object – person, computer etc and has unique Distinquished Name (DN). Which kind of information can be stored in each entry is specified in an LDAP schema Directory Information Tree (DIT) – a tree of directory entries Directory Information Tree
Workload Management System (WMS) services is usually run at Resource Broker. Job management Network Server (NS), which accepts the incoming job requests from the UI, and provides for the job control functionality. Workload Manager, which is the core component of the system. Match-Maker (also called Resource Broker), whose duty is finding the best resource matching the requirements of a job (match-making process). Job Adapter, which prepares the environment for the job and its final description, before passing it to the Job Control Service. Job Control Service (JCS), which finally performs the actual job management operations (job submission, removal...) Logging and Bookkeeping service (LB) . The LB logs all job management Grid events, which can then be retrieved by users or system administrators for monitoring or troubleshooting.
A Job Submission Example Replica Catalogue (RC) Information Service (IS) UI JDL Resource Broker (RB) Storage Element (SE) Logging & Book-keeping (LB) Job Submission Service (JSS) Compute Element CE)
A Job Submission Example submitted Replica Catalogue (RC) Information Service (IS) Input Sandbox UI JDL Job Submit Event Resource Broker (RB) Storage Element (SE) Logging & Book-keeping (LB) Job Submission Service (JSS) Compute Element (CE) Job Status
A Job Submission Example submitted Replica Catalogue (RC) Information Service (IS) UI JDL waiting Resource Broker (RB) Storage Element (SE) Logging & Book-keeping (LB) Job Submission Service (JSS) Compute Element (CE) Job Status
A Job Submission Example submitted Replica Catalogue (RC) Information Service (IS) UI JDL waiting ready Resource Broker (RB) Storage Element (SE) Logging & Book-keeping (LB) Job Submission Service (JSS) Compute Element (CE) Job Status
A Job Submission Example submitted Information Service (IS) UI JDL waiting ready scheduled BrokerInfo Logging & Book-keeping (LB) Job Submission Service (JSS) Job Status Replica Catalogue (RC) Resource Broker (RB) Storage Element (SE) Compute Element (CE)
A Job Submission Example submitted Replica Catalogue (RC) Information Service (IS) UI JDL waiting ready Input Sandbox scheduled Resource Broker (RB) running Storage Element (SE) Logging & Book-keeping (LB) Job Submission Service (JSS) Compute Element (CE) Job Status
A Job Submission Example submitted Replica Catalogue (RC) Information Service (IS) UI JDL waiting ready scheduled Resource Broker (RB) running Storage Element (SE) Logging & Book-keeping (LB) Job Submission Service (JSS) Compute Element (CE) Job Status Job Status
A Job Submission Example submitted Replica Catalogue Information Service UI JDL waiting ready scheduled Resource Broker running Storage Element done Logging & Book-keeping Job Submission Service Compute Element Job Status Job Status
A Job Submission Example submitted Replica Catalogue Information Service UI JDL waiting ready scheduled Resource Broker running Storage Element done Logging & Book-keeping Job Submission Service outputready Output Sandbox Compute Element Job Status Job Status
A Job Submission Example submitted Replica Catalogue (RC) Information Service (IS) UI JDL waiting ready scheduled Output Sandbox Resource Broker (RB) running Storage Element (SE) done Logging & Book-keeping (LB) Job Submission Service (JS) outputready Compute Element (CE) cleared Job Status
Possible Job States SUBMITTED WAITING READY SCHEDULED ABORTED DONE(cancelled) RUNNING DONE(failed) DONE(ok) OUTPUTREADY CLEARED
Data Management Data Naming SURL Storage URL An SURL is a locator for a physical file srm://lxshare0282.cern.ch:8443/castor/cern.ch/home/dteam/generated/2004-02-11/ A SURL is often called PFN (Physical File Name) filed8f59bcf-5c85-11d8-bbf3-c59c9bed1519 UUID Universally Unique IDentifier A UUID is a 128 bits long number GUID Grid Unique IDentifier A UUID generated by the Replica Management System guid:e4fbe9b0-5c85-11d8-bbf3-c59c9bed1519 LFN Logical File Name A Logical File Name is a user defined alias to a GUID. TURL Transport URL A Transport URL is returned by a SRM in response to a request for a way to access a SURL. lfn:anjita-demo0236-2004-11-02 rfio://lxshare0282.cern.ch//data/dt/stage/filec0fabd63-5cba- 11d8-ba4c-e2aa3666572b.4003
REPLICA MANAGEMENT SYSTEM (RMS) The main services offered by the RMS are: the Replica Location Service (RLS) and the Replica Metadata Catalog (RMC). The RLS maintains information about the physical location of the replicas (mapping with the GUIDs). It is composed of several Local Replica Catalogs (LRCs) which hold the information of replicas for a single VO. The RMC stores the mapping between GUIDs and the respective aliases (LFNs) associated with them, and maintains other metada information (sizes, dates, ownerships...) The last component of the Data Management framework is the Replica Manager. The Replica Manager presents a single interface for the RMS to the user, and interacts with the other services.
CONCLUSIONS • The EGEE Grid requires resources, an infrastructure and middleware that allows for: • Authentication and Authorization • Information services • Job and Data Management • Monitoring and fault recovery
Appendix. Data Management Services SRM Storage Resource Manager A high-level interface to a storage system. RLS Replica Location Service The distributed service providing the mappings between GUIDs and SURLs. An RLS has two components: LRC and RLI LRC Local Replica Catalog The catalog storing GUID to SURL mappings, along with SURL attributes for a given site, or a single Storage Re- source Manager at a site. RLI Replica Location Index The catalog storing information about which Local Replica Catalogs have GUID to SURL mappings for a par- ticular GUID. It thus provides the link between different LRCs, allowing for distributed indexing and querying of the Catalogs. RMC Replica Metadata Catalog The catalog storing LFN aliases for GUID, as well as at- tributes on GUIDs and LFNs. ROS Replica Optimization Service A service providing information to guide selection be- tween replicas located at different sites. This is based on network information collected from available network monitors. MDS- Monitoring and Discovery Service LCFG -Local ConFiguration System - Edinburgh http://lspitsky.home.cern.ch/lspitsky/