1 / 38

EGEE Project and Middleware Overview

CYCLOPS. EGEE Project and Middleware Overview. Marco Verlato. CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece. Outline. Introduction The EGEE project Infrastructure Applications Operations and Support The EGEE Middleware: gLite Grid access services Security services

milla
Télécharger la présentation

EGEE Project and Middleware Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CYCLOPS EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece

  2. Outline • Introduction • The EGEE project • Infrastructure • Applications • Operations and Support • The EGEE Middleware: gLite • Grid access services • Security services • Information & Monitoring services • Data Management services • Job Management services • Further information

  3. What is a Grid? • “A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities” Ian Foster -- Carl Kesselman, 1998 • “A grid is a combination of networked resources and the corresponding middleware, which provides services for the user” Erwin Laure, EGEE T.D., ISSGC2007 • The users of a Grid are divided into Virtual Organisations (VOs), abstract entities grouping users, institutions and resources, e.g.: the 4 LHC experiments, the community of biomedical researchers, etc

  4. What is a Grid? • It relies on advanced software, called middleware • Middleware automatically finds the data the scientist needs, and the computingpowerto analyse it • Middleware balances the load on different resources. It also handles security, accounting, monitoringand much more

  5. Enabling Grid for E-sciencE project Flagship Grid infrastructure project co-funded by the European Commission starting from April 2004 Entering now in the 3° phase • Archeology • Astronomy • Astrophysics • Civil Protection • Comp. Chemistry • Earth Sciences • Finance • Fusion • Geophysics • High Energy Physics • Life Sciences • Multimedia • Material Sciences • … >250 sites 48 countries >50,000 CPUs >20 PetaBytes >10,000 users >150 VOs >150,000 jobs/day

  6. Disciplines and users ~8000 users listed in registered VOs Digital libraries, disaster recovery, computational sciences, etc. http://cic.gridops.org/index.php?section=home&page=volist

  7. Types of applications • Simulation • LHC Monte Carlo simulations; Fusion; WISDOM • Jobs needing significant processing power; Large number of independent jobs; limited input data; significant output data • Bulk Processing • HEP ; Processing of satellite data • Distributed input data; Large amount of input and output data; Job management (WMS); Metadata services; complex data structures • Parallel Jobs • Climate models, computational chemistry • Large number of independent but communicating jobs; Need for simultaneous access to large number of CPUs; MPI libraries • Short-response delays • Prototyping new applications; grid Monitoring grid; Interactivity • Limited input & output data; processing needs but fast response and quality of service • Workflow • Medical imaging; flood analysis • Complex analysis algorithms; complex dependencies between jobs • Commercial Applications • Non-open source software; Geocluster (seismic platform); FlexX (molecular docking); Matlab, Mathematics; Idl, … • License server associated to an application deployment model

  8. High Energy Physics Applications pp @ √s=14 TeV L : 1034/cm2/s L: 2.1032 /cm2/s Chambres à muons Trajectographe Calorimètre - 2,5 million collisions per second LVL1: 10 KHz, LVL3: 50-100 Hz 25 MB/sec digitized recording 40 million collisions per second LVL1: 1 kHz, LVL3: 100 Hz 0.1 to 1 GB/sec digitized recording

  9. In silico drug discovery • Diseases such as HIV/AIDS, SRAS, Bird Flu etc. are a threat to public health due to world wide exchanges and circulation of persons • Grids open new perspectives to in silico drug discovery • Reduced cost and adding an accelerating factor in the search for new drugs • International collaboration is required for: • Early detection • Epidemiological watch • Prevention • Search for new drugs • Search for vaccines • Avian influenza: • bird casualties

  10. Wide In Silico Docking On Malaria http://wisdom.healthgrid.org/

  11. Earth Sciences Applications Flood of a Danube river-Cascade of models (meteorology,hydraulic ,hydrodynamic….) UISAV(SK)- ESA, UTV(IT), KNMI(NL), IPSL(FR)- Production and validation of 7 years of Ozone profiles from GOME Rapid Earthquake analysis (mechanism and epicenter) 50- 100CPUs IPGP(FR) Geocluster for Academy and industry CGG(FR)- Data mining Meteorology & Space Weather (GCRAS, RU) DKRZ(DE)- Data access studies, climate impacts on agriculture Modelling seawater intrusion in costal aquifer (SWIMED) CRS4(IT),INAT(TU),Univ.Neuchâtel(CH)- Specfem3D: Seismic application. Benchmark for MPI (2 to 2000 CPUs) (IPGP,FR) Air Pollution model- BAS(BG) Mars atmosphere CETP( FR):

  12. EGEE workload in 2007 Data: 25Pb stored 11Pb transferred CPU: 114 Million hours Estimated cost if performed with Amazon’s EC2 and S3: € 47,486,548 http://gridview.cern.ch/GRIDVIEW/same_index.phphttp://calculator.s3.amazonaws.com/calc5.html?

  13. EGEE-II to EGEE-III • EGEE-III • To be co-funded under European Commission call INFRA-2007-1.2.3 • 32M€ EC funds compared to 36M€ for EGEE-II • Key objectives • Expand/optimise existing EGEE infrastructure, include more resources and user communities • Prepare migration from a project-based model to a sustainable federated infrastructure based on National Grid Initiatives • 2 year period – May 2008 to April 2010 • No gap between EGEE-II and EGEE-III (1 month extension to EGEE-II) • Similar consortium • Now structured on a national basis (National Grid Initiatives/Joint Research Units)

  14. European Grid Initiative (EGI) • Need to prepare permanent, common Grid infrastructure • Ensure the long-term sustainability of the European e-Infrastructure independent of short project funding cycles • Coordinate the integration and interaction between National Grid Infrastructures (NGIs) • Operate the production Grid infrastructure on a European level for a wide range of scientific disciplines Must be no gap in the support of the production grid

  15. EGEE operations Operations Coord. Centre (OCC) - management, oversight of all operational and support activities Regional Operations Centres (ROC) - providing the core of the support infrastructure, each supporting a number of resource centres within its region Resource Centres (RC) - providing resources (computing, storage, network…) -At FZK, coordination and management of user support, single point of contact for users

  16. Monitoring Visualization 16

  17. The EGEE support infrastructure • ROC C • ROC B • RC A ROC N VO Support C • RC A VO Support B RC A VO Support A • RC B • RC B RC B • RC C • RC C RC C VO TPM C • ROC C • ROC B ROC N VO TPM B VO TPM A CIC Portal GGUS Central System COD Deployment support Middleware support Deployment support Network Support TPM Middleware support Middleware support Network Support Middleware support Other Grids Other Grids Other Grids Middleware support Middleware support Middleware support Other Grids Other Grids Other Grids

  18. The GILDA t-Infrastructure (https://gilda.ct.infn.it) • 20 sites in 3 continents • > 11000 certificates issued, >20% renewed at least once • > 250 courses, training events, official university curricula • > 2,000,000 hits on the web site from >100 different countries • > 4.5 TB of training material downloaded from the web site

  19. e-Infrastructures adopting gLite e-Infrastructures interoperable or in pro- gress to be made interoperable with gLite ~80 countries “linked” together ! e-Infrastructure projects & others Grids

  20. LCG-2 gLite 2004 prototyping prototyping product 2005 product 2006 gLite 3.0 EGEE Middleware Distribution • Combines components from different providers • Condor and Globus (via VDT) • LCG (LHC Computing Grid) • EDG (European Data Grid) • Others • After prototyping phases in 2004 and 2005 convergence with LCG-2 distribution reached in May 2006 • gLite 3.0 released in May 2006, current release is 3.1 • Develop a lightweight stack of generic middleware useful to EGEE applications • Pluggable components – cater for different implementations • Follow SOA approach, WS-I compliant where possible • Focus now is on re-engineering and hardening • Business friendly open source license: Apache 2.0

  21. The middleware structure • Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware • Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory • Foundation Grid Middleware will be deployed on the EGEE infrastructure • Must be complete and robust • Should allow interoperation with other major grid infrastructures • Should not assume the use of Higher-Level Grid Services

  22. gLite services orchestration User Interface Workload Management Logging & Bookkeeping Information System submit query discover services retrieve update credential publish state submit publish state query retrieve File and ReplicaCatalogs Site X Computing Element Storage Element AuthorizationService

  23. gLite services decomposition Access CLI API Information & Monitoring Services Security Services Authorization Information &Monitoring Job Monitoring Auditing Authentication Data Services Job Mgmt. Services JobProvenance PackageManager MetadataCatalog File & ReplicaCatalog Accounting StorageElement DataMovement WorkloadManagement ComputingElement Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf

  24. Grid Access • The access point to the EGEE Grid is the User Interface (UI) • It provides the CLI toolsto access the functionalities offered by the gLite Services • They allowto perform some basic Grid operations: • create the user proxy needed for authentication/authorization • retrieve the status of different resources from the Information System • copy, replicate and delete files from the Grid • list all theresources suitable to execute a given job • submitjobs for execution • cancel jobs • retrievethe output of finished jobs • show the status of submitted jobs • retrieve the logging and bookkeeping information of jobs • It provides the APIsto allow the development ofGrid-enabled applications

  25. Security Services • GSI Authentication based on PKI X.509 SSL infrastructure • Certificate Authorities (CA) issue (long lived) certificates identifying individuals (much like a passport) • to reduce vulnerability, on the Grid user identification is done by using (short lived) proxies of their certificates (they can be stored on MyProxy servers) • users belong to VO’s, to groups inside a VO and may have special roles • VOMSprovides a way to add attributes to a certificate proxy

  26. Information & Monitoring Services / 1 BDII top-level Berkeley DatabaseInformation Index Queries WMS WN 2 minutes BDII site-level Site UI FTS - Based on ldap - Standardized information provider (GIP) - GLUE-1.3 schema - Top level Used with 230+ sites - Roughly 60 instances in EGEE BDII resource MDS GRIS provider provider

  27. For users R-GMA appears similar to a single relational database Implementation of OGF’s Grid Monitoring Architecture (GMA) Rich set of APIs (WebBrowsers, Java, C/C++, Python) Typical deployment consists of Producer and Consumer Services on a one per site basis (MON box), and a centralized Registry and Schema Publish Tuples Producer application Producer Service API SQL“INSERT” Register Registry Service Query Tuples SQL“SELECT” Locate Send Query Consumer application Consumer Service API Receive Tuples Schema Service SQL“CREATE TABLE” Information & Monitoring Services/ 2

  28. GridICE monitoring tool

  29. Data Services /1 • Need common interface to storage resources • Storage Resource Manager (SRM) • Need to keep track where data is stored • File and Replica Catalogs • Need scheduled, reliable file transfer • File transfer services • Need a way to describe files’ content and query them • Metadata catalog • Heterogeneity • Data is stored on different storage systems using different access technologies • Distribution • Data is stored in different locations – in most cases there is no shared file system or common namespace • Data needs to be moved between different locations • Data description • Data are stored as files: need a way to describe files and locate them according to their contents

  30. Data Services /2 The Storage Resource Manager interface is the basis for the gLite Storage Elements (SE) • hides the storage system implementation • handles the authorization based on VOMS credentials • posix-like access to SRM via GFAL (Grid File Access Layer) The LCG File Catalogue (LFC) keeps track of file replicas on the grid Logical File Name (LFN) An alias created by a user to refer to some item of data Global Unique Identifier (GUID) A non-human-readable unique identifier for an item of data Site URL (SURL) Gives indication on which place (Storage Element) the file is actually found. Understood by the SRM interface Transport URL (TURL) Temporary locator of a replica+access protocol, understood by the backend MSS

  31. Job Management Services /1 • the Computing Element (CE) is the front-end to the local farm (cluster, batch system) • several implementation : Torque/Maui, PBS, LSF, Condor, SGE • CE is usually installed on the master node of the farm: slave nodes run the Worker Node • typically CE runs also the site BDII providing information to the top BDII • software application is installed on CE on a shared area • The CE receives users’ job from the WMS • there are different queues with different priorities • jobs are sent to the batch system which executes them on WN • Output is then copied back to WMS

  32. CREAM: Web Service Computing Element Cream WSDL allows defining custom user interface C++ CLI interface allows direct submission Lightweight Fast notification of job status changes via CEMon Improved security no “fork-scheduler” Will support for bulk jobs on the CE optimization of staging of input sandboxes for jobs with shared files ICE: Interface to Cream Environment being integrated in WMS for submissions to CREAM Job Management Services /2

  33. ENEA-Grid approach to provide access to AIX • A solution of current known limitations: • gLite must be installed on each WN  only Intel/SL machines • gLite WN must communicate with RB security/firewall management issues it works also with NFS or GPFS it works also with rsh or ssh Invasiveness of the grid middleware and firewall requirements are minimized !

  34. WMS: Resource brokering, workflow management, I/O data management Web Service interface: WMProxy Task Queue: keep non matched jobs Information SuperMarket: optimized cache of information system Match Maker: assigns jobs to resources according to user requirements (possibly including data location) Job submission & monitoring Condor-G ICE (to CREAM) External interactions: Information System Data Catalogs Logging&Bookkeeping Policy Management systems Job Management Services /3

  35. Direct Acyclic Graph (DAG) is a set of jobs where the input, output, or execution of one or more jobs depends on one or more other jobs A Collection is a group of jobs with no dependencies basically a collection of JDL’s nodeA nodeB nodeC nodeE nodeD Advanced scheduling • A Parametric job is a job having one or more attributes in the JDL that vary their values according to parameters • Using compound jobs it is possible to have one shot submission of a (possibly very large, up to thousands) group of jobs • Submission time reduction • Single call to WMProxy server • Single Authentication and Authorization process • Sharing of files between jobs • Availability of both a single Job Id to manage the group as a whole and an Id for each single job in the group

  36. Logging & Bookkeping (LB) • Tracks jobs in terms of eventsgathered from various gLite components • Process them to give a higher level view on the job states • Provide interfaces for quering L&B, register for notifications • Often deployed on the same machine of the WMS, but can be remote

  37. glite-wms-job-submit myjob.jdl Myjob.jdl Executable = “gridTest”; StdError = “stderr.log”; StdOutput = “stdout.log”; InputSandbox = {“/home/joda/test/gridTest”}; OutputSandbox = {“stderr.log”, “stdout.log”}; InputData = “lfn:testbed0-00019”; DataAccessProtocol = “gridftp”; Requirements = other.Architecture==“INTEL” && \ other.OpSys==“LINUX”; Rank = “other.GlueHostBenchmarkSF00”; Input Sandbox JDL voms-proxy-init Job Submit Event Input Sandbox Output Sandbox Job Output Sandbox Job Status Job submission example User Interface Information Service Replica Catalog JDL Author. Service Resource Broker Storage Element Logging & Book-keeping GSI data acc/transf Job Submission Service Computing Element

  38. NEW!!! Further information • 2nd Iberian Grid Infrastructure Conference: 12-14 May 2008, Porto (Portugal), joint with CYCLOPS Project Conference www.ibergrid.eu/2008 • EGEE’08 Conference: 22-26 September 2008, Istanbul (Turkey) www.eu-egee.org/egee08 • EGEE digital library: egee.lib.ed.ac.uk • Needs certificate (GILDA or national CA in browser) • EGEE www.eu-egee.org • gLite www.glite.org • GILDA https://gilda.ct.infn.it/ • LCG lcg.web.cern.ch/LCG • Open Grid Forum www.gridforum.org • Globus Alliance www.globus.org • VDT www.cs.wisc.edu/vdt/

More Related