1 / 21

Status of the EGI O-E-12 Task: Coordination of Network Support for EGI

Status of the EGI O-E-12 Task: Coordination of Network Support for EGI. Mario Reale IGI / GARR mario.reale@garr.it. Contents. O-E-12 definitions and goals O-E-12 status Wrap up of the migration (final phase of EGEE III) Current task tools

ckeefer
Télécharger la présentation

Status of the EGI O-E-12 Task: Coordination of Network Support for EGI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Status of the EGI O-E-12 Task: Coordination of Network Support for EGI Mario Reale IGI / GARR mario.reale@garr.it

  2. Contents • O-E-12 definitions and goals • O-E-12 status • Wrap up of the migration (final phase of EGEE III) • Current task tools • Overview of networking support within individual NGIs • Summary of the EGEE III questionnaire for NGIs for Network Support • Next steps and challenges ahead

  3. O-E-12 Definition and Goals • O-E-12 is the coordination of the network support for EGI • Its goal is providing network support to EGI by • proposing useful synergies and promoting cooperation among EGI.eu, the national NGI efforts and the NRENs community • encouraging the definition and adoption of best practices • proposing common solutions and tools • liaising with the NRENs community and GEANT (DANTE) • Provisioned through the EGI-Inspire tasks TSA1.7 (Support Teams) and TSA1.4 (Grid Management Infrastructure) • Provided with a manpower of 0.5 FTE within the EGI-Inspire project, and an additional contribution from IGI • Fundamental will be the collaboration by NGIs and NRENs

  4. Summary of the original workplan • Perform an initial assessment of the adopted model for network support within each NGI • Further follow up the developments of pS-Lite_TSS for on demand troubleshooting and grid-specific tests on the network • Support its deployment on the EGI/NGIs infrastructure • Possibly exploiting further monitoring tools • Define, jointly with the user community, a subset of the Grid sites belonging to the EGI global infrastructure to be periodically monitored • Excluding a priori a full-mash spanning all sites • Putting in place a workflow for the exchange of information about network faults and scheduled downtimes • Organize the structure of a global PERT support for EGI

  5. Current Status of O-E-12: Summary of the EGEE to EGI transition phase • Transition from EGEE SA2 to EGI O-E-12 implied close collaboration and discussions, especially among GARR, EGEE ENOC in Lyon (CC IN2P3), CNRS UREC in Paris • We identified 2 main tools to keep among the ones provided by ENOC and SA2, plus an additional tool to keep following up for possible future adoption: • PerfSONAR-Lite_TSS • On-demandNetwork monitoring and troubleshooting tool based on perfSONAR • The Downcollector • A central tool to check Grid services registered in the GOC DB on their specific TCP ports • The Grid Job based approach for network monitoring • A system not requiring anly local deployment by sysadmins

  6. DownCollector The DownCollector is a polling tool reporting on the reachability of the services registered in the GOC DB Star-based architecture, Central tool • All tests start from the same initial point It checks services are reachable on the corresponding TCP ports Available at https://ccenoc.in2p3.fr/DownCollector/ Migrated to https://perfsonarlitetss.dir.garr.it/DownCollector/ It will be accessible through a new portal dedicated to the O-E-12 task, which will be available at the URL http://eginet.garr.it • This is NOT YET available. It will be setup in the next days High Availability currently not available • Might be implemented in future if operation will prove usefulness of HA Originally developed by IN2P3 CC-Lyon within EGEE SA2 • In future, endorsed by GARR 6

  7. perfSONAR-lite TroubleShooting Services Site A Probe A 2 - Request ENOC 1 - Request 4 - Result Central server 3 - e2e measurement Users 5 - Result Site B Probe B • Started in EGEE-III, entirely designed by SA2 • Developments lead by DFN/Erlangen as a SA2 partner • Central server orchestrating on demand e2e measurements between light probes hosted by Grid sites • EGEE driven improvements of standard perfSONAR framework • Authentication & Authorisation mapped from GOCDB’s roles 7

  8. PerfSONAR-Lite_TSS Focus on on-demand troubleshooting: ENOC supervisor ROCsmemberssite administrator 1 2 AuthenticationAuthorizationProcess • Launch test on demand from a Grid site under central server control: 2 7 • Bandwidth measurements, DNS lookup, Traceroute, Port testing, Ping ENOC 3 6 5 • is easy to use for the Grid administrators • can be used quickly by site admin without the need to establish each time a contact the remote site involved in the problem 4 Grid site B Grid site A Local site light PerfSONAR’s probe Central ENOC monitoring server Networking Support – Xavier Jeannin - EGEE-III First Review 23-24 June 2010 8

  9. PerfSONAR-Lite_TSS First version was released and installed on 6 sites Installation guide and procedure http://www.dfn.de/en/enhome/x-win/download-of-perfsonar-lite-tss/ FAQ, tutorial, new features (users, sites, ROC management) Software authorization schema was adapted to be able to fit with hierarchical EGI/NGI model Difficult to deploy the software during the transition phase toward EGI Networking Support – Xavier Jeannin - EGEE-III First Review 23-24 June 2010 9

  10. perfSONAR-lite TSS 10

  11. perfSONAR-lite TSS: outlook Expected users: Sites, ROCs, ENOC... Status: Tool basically ready, but missing maturation phase • Suffered some staff movements and licensing issues • Not yet fully in production but distributed testbed in place • First production release released at the end of March Future: • Wrap up on current status and initial deployment strategy within the EGI required • O-E-12 will follow up and organize dedicated pre-production deployment campaigns in the next weeks • Future developments to further improve security related to available bandwidth tests and simply AA • May be followed and used outside EGI • DFN and CNRS declared their interest in following up the tool 11

  12. Grid Job based approach to monitoring • Within EGEE SA2 a development started to exploit an approach to Network Monitoring for the Grid based on the Grid Jobs • “Monitor the Grid using the Grid” • The main advantage of this approach is that Grid site adminitrators don’t have to deploy anything • Only accepting 2 jobs permanently running from a specific VO • This approach was conceived especially thinking of the minor and medium-size EGEE sites, with limited resources and attendance/manpower • EGEE SA2 produced a prototype deployed on a testbed of 8 sites in France and Italy • Main developers are Etienne Double / CNRS UREC and Alfredo Pagano / GARR • Structure, example, issues, options will be further described in another presentation by O-E-12

  13. Job-based Network monitoring for Grid Grid network monitoring jobs Monitoring server@ Urec CNRS DB 1 www request Front-end@GARR DB 2 Monitoring server@ Urec CNRS Monitoring server@ ROC1 – Server A Possibleevolutions DB ROC1 Monitoring server@ ROC1 – Server B Frontend: Apache Tomcat, Ajax, Google Web Toolkit (GWT) Backend: PostgreSQL Implementation languages: Python, bash script

  14. Assessment of the current model for network support within the NGIs • EGEE SA2 contributed with 3 questions to the Questionnaire for the NGIs (operations): • Do you expect to nominate a network representative who can be the contact point for the collaboration with the Network Support task at EGI level ? • Could you shortly describe what is your current operational model for network related tickets and issues ? • Have you contacted your NREN to participate to the Network Support task ? (if yes, provide details) • As predictable, we got a large variety of different answers and amount of provided information

  15. First highlights from the Questionnaire • 32 organizations (31 NGIs + CERN) answered • As of today: • 13 provided the email address of a contact person/team for the Network Support task • 14 answered they will appoint someone (or will possibly do it) • 5 answered they will not, or they haven’t decide yet, or they did not answer • In 28 cases the NGI and the NREN are interconnected with already established workflows for network related issues (1 not applicable:CERN) • We will further analyse into more detail the outcome and provide a summary document to SA1 and O-E-12/Network Support contacts

  16. Challenges ahead • Get the Network Support task fully supported by all NGIs to • Involve NGIs in a reasonable roadmap towards the achievements of the O-E-12/Network Support goals • The real task challenge is the Multi Domain/Cross domain e-2-e related network support • People should discuss, agree and act on common goals • We consider this the first major achievement for the O-E-12 task • What shall we focus on ? • We proposed something: • Sharing of information on scheduled downtimes and observed faults • 3 tools to keep working on, exploiting them on a larger set of sites • Organize a general workflow for observed, percepted performances issue organizing at the EGI level a unique entry point for PERT support, able to properly handle, route/ escalate the issues • Defining – together with the VRC/VOs – a subset of relevant sites for which possibly set up periodic and systematic NM measurements

  17. The fundamental O-E-12 Trade Off • There is a general trade off to keep in mind: • Doing essentially nothing • “The Network works…and anyhow If I have a problem, I know myself whom to call.. • Doing too much, trying to provide too much information, which normally means eventually no useful information • Shooting IPERFlike tests everywhere, to the full mash of sites • Providing all tickets related to all NRENs in all possible languages to a unique unfortunate team, in charge of informing everyone that the Institute for Submarine Research of The univeristy of Nowherecity, in the country of TheresEvenMe-Land will possibly have an electric power cut next Thurday, after having been able to translate and understand the original ticket

  18. Challenges ahead / brainstorming • We would like to see useful tools agreed upon and adopted by the NGIs • We would like to provide useful information, only useful information, only when required, to essentialy everyone in need of it • Can we envisage a general tool and the corresponding required level of standardization to be able to provide to “everyone” the binary (1/0) information about the network reachability of a specific Grid site ? • Going beyond the “modelling specific workflows for specific Grid projects”? • In other words: would it be able to provide to Grid managers information on possible network problems related to a specific site ?

  19. What has been achieved so far • Migration plans successfully completed: • pS-Lite_TSS server, DownCollector, BugZilla already in place • Started liasing with GEANT3/DANTE to formalize the EGI-GEANT collaboration • Discussed on GN3 MB on May 5, 2010 • Identified areas for collaboration: • Security • AAI • perfSONAR and interdomain tools • Got an initial set of contacts and a very sketchy, draft idea of what the various NGIs are internally doing w.r.t. network support • But this required further work and peer-to-peer communication

  20. Still Missing / Next steps • Create a full-fledged portal for network support • Including contacts/ wiki / documents / access to the tools • May be sticking it to the domain netsup.egi.eu ? • For the moment we will start by eginet.garr.it • Plan the further development and deployment startegies for • PerfSONAR-Lite_TSS • Grid Job based approach to Network Monitoring for Grids • Get new NGIs and new sites involved about them • Organize the NRENs and NGIs established communication channels / fora aimed at defining an agreed strategy for the Multi Domain and the concrete tools / steps / workflows the NRENs will provide to EGI/NGIs : • NRENs&NGIs event? • Periodical VideoConferences involving NGIs and NRENs ?

  21. Thank you. Questions ?

More Related