1 / 13

(10 Minute) European Update: EGEE-JRA4 and UK

(10 Minute) European Update: EGEE-JRA4 and UK. NM-WG, GGF15, Boston, 4 th October 2005. M J Leese CCLRC Daresbury Laboratory m.j.leese@dl.ac.uk. Contents. Update on EGEE-JRA4 UK update Provides good contrast with perfSONAR More interface than infrastructure GOCs/NOCs Grid middleware

Télécharger la présentation

(10 Minute) European Update: EGEE-JRA4 and UK

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. (10 Minute) European Update: EGEE-JRA4 and UK NM-WG, GGF15, Boston, 4th October 2005 M J Leese CCLRC Daresbury Laboratory m.j.leese@dl.ac.uk

  2. Contents • Update on EGEE-JRA4 • UK update • Provides good contrast with perfSONAR • More interface than infrastructure • GOCs/NOCs • Grid middleware • Network intensive/dependant end users Mark Leese - Daresbury Laboratory

  3. EGEE-JRA4 • EGEE = Europe’s latest Grid project, successor to EDG • Joint Research Activity 4 (JRA4) = group responsible for “Development of Network Services”, inc. Network Performance Monitoring (NPM) • Not the same as GN2-JRA1 et al - these are GÉANT2 projects • There are various monitoring tools and frameworks available: • We (JRA4) are not building another one! The work is about standardising access to NPM data across multiple domains and using it. • NPM activity includes: • Mediator (standardise access to NPM data) • Diagnostic Tool (human use of the data) • Publisher (machine (middleware) use of the data) Mark Leese - Daresbury Laboratory

  4. NPM Status • Mediator deliverable DJRA4.2 was produced in PM9 (Dec ‘04): • “Specification of Interfaces for Network Performance Monitoring” document • First software prototype • Proves we can harness (multi-domain) backbone and end-site tools together • Low level framework only • Could be throw-away – designed as learning exercise to inform on architecture and interfaces • For more info (and prototype design doc): https://edms.cern.ch/document/533215/ • 2nd Mediator prototype (MJRA4.3) produced PM12 (March ’05): • Adds certificate based security • Strong focus on deployment of end-to-end monitoring infrastructure (i.e. WP7) • For more info: https://edms.cern.ch/document/575484/ • Diagnostic Tool (deliverable MJRA4.6) delivered PM18 (Sept ’05): • More later • Publisher: discussions ongoing, but work doesn’t start in earnest until October Mark Leese - Daresbury Laboratory

  5. GOC/NOC Diagnostic Client NM-WG End Site Home grown NM-WG NM-WG NM-WG Backbone GN2 Backbone Perfmonit Backbone piPEs NPM Architecture (1) Some Client NM-WG End Site EDG WP7 Mark Leese - Daresbury Laboratory

  6. GOC/NOC Diagnostic Client NM-WG JRA4 NPM Mediator NM-WG End Site Home grown NM-WG NM-WG NM-WG Backbone GN2 Backbone perfmonit Backbone piPEs NPM Architecture (2) Some Client NM-WG End Site EDG WP7 Mark Leese - Daresbury Laboratory

  7. Client Application Web Service NPM Mediator Discoverer Aggregator Response Cache Web Service Web Service Network Monitoring Infrastructure Network Monitoring Point NPM Architecture (3) • Human & machine users interact via client application, “speaking” NM-WG • Discoverer locates MP(s) or infrastructures that can answer the client’s query…currently static list • Aggregator • obtains query results from MP(s) • aggregates results (if necessary) • To improve performance and reduce loading, results of recent requests will be cached • Discovery, aggregation and caching all big areas with wide application…but we need time for these, so maybe EGEE-II Mark Leese - Daresbury Laboratory

  8. Grid Information System JRA4 NPM Publisher NM-WG End Site Home grown 1..n NM-WG End Site EDG WP7 NPM Architecture (4) “Publisher” for Grid Middleware • GIS holds data in summarised form suitable for middleware (e.g. network cost function) • Publisher has two components: • Registry - holds information about MPs to regularly contact for latest data • Data Manager - gathers data from MPs and publishes it to GIS in correct format • Publisher designed to give middleware efficient access to network performance data • Important: We’re mostly networking people but this is Grid Global Forum • Very relevant to NMA-RG Mark Leese - Daresbury Laboratory

  9. Diagnostic Tool • Like perfSONAR we want to make use of the collected data available via a unified interface. So we creating a prototype Diagnostic Tool aimed at helping NOCs and GOCs detect and diagnose network problems • Initial requirements from: • joint EGEE JRA4-SA2 user requirements doc • UK GOSC and NREN, German NOC, UK projects • experience of group members • Requirements net not cast wider as many groups were unsure of: • what was available • what they wanted/what metrics Grid applications are dependant on • what visualisations are possible/the most useful • So we need a prototype to solicit comments on – blank piece of paper was staying blank • Of course, what we can achieve is dependant on available monitoring infrastructures (WP7, perfmonit etc.) • Do we collected all metrics of interest? e.g. traceroute tests just been added • Lack of on-demand tests could be a limiter, although DFN (German NREN) say tests every 5-15 mins is sufficient • However, the prototype can be seen as a proof of concept with these other things coming later • The kinds of things to be provided are: • the usuals, e.g. historic plots of available bandwidth • Can’t/won’t provide: • display real-time information on the load on a connection – this is about diagnose faults, not 24/7 monitoring • Network topology…although... Mark Leese - Daresbury Laboratory

  10. NSAP • GÉANT2-SA3 group plan to deploy Network Service Access Points (NSAPs) • each domain of DANTE’s extended-QoS network • GÉANT2 and NRENs + QoS compliant regional, metropolitan and campus networks. • NSAPs provide access to network services such as BAR • Will also provide network topology database, via NIS (Network Information Service) for which GN2-SA3 will produce a reference implementation. Mark Leese - Daresbury Laboratory

  11. UK GridMon • “...design and deploy an infrastructure for network performance monitoring within the UK e-Science community” – June 2002 • MPs (Monitoring Points) at each UK e-Science Centre • Full mesh of tests • Human access (www interface) to monitor performance, find faults • Plans to add NM-WG interface for requesting and publishing performance data to Grid m/ware and apps not network operators

  12. Current UK Work (1) • Well received and grew interest (e.g. UK HEP/PP community), but… • Version 1 infrastructure proved to be unsustainable • most institutions were helpful, but… • varying spec of machines, flavours of Linux, security rules etc. • V1 MP: • Ran tests • Stored data locally • Served data to human users using web server running on the MP • Would have provided WS i/f using Tomcat running local • Grew interest and a useful learning exercise • V2 MP will: • Run tests • Write data back to central DB at DL and one other • Revised web i/f and WS i/f will be provided by machines co-located with DBs • MP is thus much simpler, and brains of the operation are centralised at two, more accessible, sites

  13. Current UK Work (2) • Revised Web Interface: • status map and graphs as before, human version of request interface • useful contrast with JRA4 DT: • GridMon = UK only, but DB is co-located and accessed more natively (PerlDBI over TCP) • JRA4 DT = can access any NM-WG compliant infrastructure but WS interface is not exactly efficient for graph plotting • Happy to receive comments/suggestions on MPs and human interfaces • Lots of different approaches to deployment and dissemination being used throughout the World. Not necessarily always recreating the wheel. Hopefully we’ll eventually see what’s best for each scenario (e.g. end users vrs network operators)

More Related