1 / 24

Network in EGEE Building end-to-end network services for the Grid

Network in EGEE Building end-to-end network services for the Grid. Mathieu Goutelle – CNRS UREC, France EGEE-II SA2 “Networking support” mathieu.goutelle@urec.cnrs.fr. Outline. Short presentation of EGEE, The network in EGEE: Network services?

sharla
Télécharger la présentation

Network in EGEE Building end-to-end network services for the Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network in EGEEBuilding end-to-end network servicesfor the Grid Mathieu Goutelle – CNRS UREC, France EGEE-II SA2 “Networking support” mathieu.goutelle@urec.cnrs.fr

  2. Outline • Short presentation of EGEE, • The network in EGEE: • Network services? • EGEE focus on end-to-end services in a multi-domain context. • Network services: • Resource reservation, • Service Level Agreement. • Operational services: • Monitoring, • EGEE Network Operational Centre. • Summary & conclusion GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  3. EGEE in a nutshell… • EGEE: • 1 April 2004 – 31 March 2006 • 71 partners in 27 countries, federated in regional Grids • EGEE-II: • 1 April 2006 – 31 March 2008 • 91 partners in 32 countries • 13 Federations • Objectives: • Large-scale, production-quality infrastructure for e-Science • Attracting new resources and users from industry as well asscience • Improving and maintaining “gLite” Grid middleware GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  4. EGEE in a nutshell… • More than 20 applications from 7 domains: • Astrophysics: • MAGIC, Planck • Computational Chemistry • Earth Sciences: • Earth Observation, Solid Earth Physics, Hydrology, Climate • Financial Simulation: • E-GRID • Fusion • Geophysics: • EGEODE • High Energy Physics: • 4 LHC experiments (ALICE, ATLAS, CMS, LHCb) • BaBar, CDF, DØ, ZEUS • Life Sciences: • Bioinformatics (Drug Discovery, GPS@, Xmipp_MLrefine, etc.) • Medical imaging (GATE, CDSS, gPTM3D, SiMRI 3D, etc.) • Multimedia • Material Sciences • … GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  5. EGEE Infrastructure Scale (June 2006): ~ 200 sites in 40 countries ~ 25 000 CPUs > 10 PB storage > 35 000 jobs per day > 100 Virtual Organizations Country participating in EGEE GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  6. Network infrastructure Connects 32 NRENs Over 3M users GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  7. Network infrastructure (cont.) GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  8. End-to-end network services? • What type of services? • Network services are available to the EGEE sites: • Premium IP and similar (QBSS e.g.), • “lightpath” or network resource reservation, • IPv6, multicast… • Operational services are available to the EGEE sites: • Monitoring of the network (local & backbone), • Operational data (incident, maintenance). • How to ensure the service continuity along the path? • In the last mile? • In a multi-domain context? • What about service availability, interface standardization, inter-domain agreements, etc. GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  9. EGEE focus • Network services: • Network resource reservation: • Bandwidth Allocation and Reservation (BAR), • Dedicated talk on that subject (see session 1, “End to End Bandwidth Allocation and Reservation for Grid applications”). • Service Level Agreement (SLAs): • End-to-end SLAs? • Operational services: • Monitoring: • Network Performance Monitoring (NPM), • Dedicated talk on that subject (see session 2, “Federated Network Performance Monitoring for the Grid”). • Coordination of operational actions: • Concept of the EGEE Network Operational Centre (ENOC). GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  10. Network resource reservation • Based on the framework currently being built by the GÉANT2 project: • Hides the multi-domain, multiple technologies issues; • Provides at the Grid level: • A seamless interface for service requests at the “customer” layer; • High-level view of the network, with request of characteristics and not of a particular service; • Reduced configuration lead-time; • A description of the service level. • Issues remain: • A component (BAR, see dedicated talk) gives access to these interfaces at the middleware layer, but the application layer is not yet ready; • Need of sub-management of the macroscopic reserved resource at the Grid level; • What about domains outside the GÉANT2 cloud? GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  11. Quick look at the BAR architecture • Clear demarcation between the Grid and the network: • The network is hidden from the Grid (technology, multi-domain issues…); • The Grid is hidden to the network (only knows one “EGEE” user); • Allows a two-stage process (reservation & activation) suitable in a Grid context; GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  12. SLAs • “SLAs”? • Description of the characteristics of the service provided (e.g. after a successful resource reservation request); • Provided by each domain crossed by the data path; • Either manually filled in by a human or automatically if the request is all handled by software. • Definition of templates in cooperation with GÉANT2: • Based on previous work inside EGEE and answers from GÉANT2 to some open issues (procedures, demarcation point…) • SLA template: • Administrative part (contact, duration, troubleshooting procedures); • SLS (Service Level Specification) part. • The SLA is formed using the individual SLAs provided by all domains along the end-to-end path. GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  13. SLAs (cont.) border-to-border connectivity end-to-end connectivity • EGEE end-to-end SLA template: • Concatenation of the individual SLAs in each participating domains; • SLA between the border of the NRENs cloud (border-to-border SLA); • Difficulty to accommodate and take into account the “last mile”: • If the “last-mile” network is not participating (no resource reservation system, no SLA, etc.); • Try to address this with static information on these networks to provide service characteristics to the user/application. GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  14. SLA institution • All domains involved in network services provisioning to EGEE as part of the existing network infrastructure hierarchy have to be categorized as one of: • Compliant with the Premium IP service, • Supportive of the Premium IP service, • Indifferent to the Premium IP service. GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  15. EGEE focus • Network services: • Network resource reservation: • Bandwidth Allocation and Reservation (BAR), • Dedicated talk on that subject (see session 1, “End to End Bandwidth Allocation and Reservation for Grid applications”). • Service Level Agreement (SLAs): • End-to-end SLAs? • Operational services: • Monitoring: • Network Performance Monitoring (NPM), • Dedicated talk on that subject (see session 2, “Federated Network Performance Monitoring for the Grid”). • Operational Interface with the network: • Concept of the EGEE Network Operational Centre (ENOC). GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  16. Monitoring • Not Yet Another Monitoring Framework! • Role of a Mediator between the various monitoring frameworks and the various clients (diagnostic tools, middleware, etc.); • Network Performance Monitoring (NPM) gives access to data collected at existing monitoring frameworks (site, backbone); • Use of the NMWG interface to access those frameworks and republish data; • Special requirements for some middleware components for faster access to data. GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  17. Operational Interface • The network infrastructure of EGEE is mainly served by a set of NRENs via GÉANT2; • Need of an entity coordinating all the NOCs involved and the Grid Operations: • Concept of an end-to-end Coordination Unit (GÉANT2); • Providing an end-to-end operational support. • A single point of contact as an operational interface between EGEE and GÉANT2/NRENs dealing with: • Network problems troubleshooting, • Interactions with network providers and Grid sites, • Notifications from NRENs, • Network SLA installation and monitoring. • Two Functional Entities inside EGEE: • EGEE Network Operational Centre (ENOC); • A Network Trouble Ticket Manager – GGUS. GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  18. EGEE Network SupportUnits NRENs ENOC GGUS GÉANT2 Users ENOC • From the EGEE point of view: • GGUS acts as the first line support (interacts with the user); • Support units are the second level support; • From the NRENs’ point of view: • EGEE (via the ENOC) is a single entity; • The ENOC is the only point of contact for the NRENs (submitter of the problem). GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  19. ENOC (cont.) • Main challenges: • To create a network support structure inside EGEE; • To define the associated network operational procedures. • The ENOC is the user support for network failures: • End-to-End network problems troubleshooting; • Coordination unit of the actions of all the entities involved in a network incident; • Try to have an overall view of the end-to-end service, gathering information from all the involved domains; • SLA Management: installation and monitoring. • ENOC Operational Procedures have been defined and validated during the first phase of EGEE; • EGEE-II will fully implement ENOC. GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  20. ENOC (cont.) • ENOC Service: • Collect tickets from NRENs which agree to provide them to the ENOC; • Forward to GGUS the ones that seem relevant (possible impact on the Grid infrastructure); • Receive tickets assigned to ENOC by the GGUS 1st level support; • Troubleshoot them with the help of monitoring tools; • Contact identified faulty domains or reassign ticket to the associated site if there is no evidence of a backbone problem (e.g. LAN issue). • Main Issues: • Load on the ENOC team (amount of info, etc.); • Heterogeneity of systems the ENOC has to deal with (languages, trouble ticket format, monitoring, etc.). GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  21. ENOC status • ENOC team is ready! • 5 people (2 FTE) including one dedicated to it. • ENOC receives operational information from GÉANT2 and 10 NRENs (more to come): • About 80% of all the EGEE sites covered; • An average of 5 tickets handled per day; • 8 different languages. • Building tools to follow up or enhance the network support: • Network Operational Database (interconnection of administrative domains between the EGEE resource centres); • TT parsing and filtering tool; • Dashboard to present overall status of the “EGEE network”. GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  22. EGEE expectations • Towards a better solution against our “multi-domain” and “end-to-end” issues • Seamless access to network monitoring data: • GÉANT2 will provide such access (PerfSonar), from multiple domains, aggregating data from multiple frameworks; • Network resource reservation: • Requests expressed not in terms of service but of characteristics; • The choice of the underlying technology to fulfil them is up to the network; • Answer to a request = SLA (depending of the current network status & load); • What about the last mile? The non-NRENs domains? • Standardization of the operational interface: • Trouble Ticket format (data schema and exchange format); • Access method. GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  23. Summary & conclusion • Focus on providing end-to-end services in a multi-domain context: • Hiding the network complexity from the Grid (users, middleware, Grid support); • Hiding the Grid complexity from the network (single point of contact, operational interface); • Many building blocks depend on the providers: • Resource reservation frameworks, SLA installation, backbone monitoring; • Fortunately, EGEE and GÉANT2 built up a strong collaboration! • Many things remains pending: • Mainly on the operational side (homogenization of the network interface); • How to cope with domains outside the GÉANT2 cloud? • The two infrastructures need to collaborate on these aspects. GridNets 2006 – 2006-10-01 – San Jose, CA, USA

  24. Thank you for your attention! GridNets 2006 – 2006-10-01 – San Jose, CA, USA

More Related