230 likes | 247 Vues
This report summarizes the discussions and updates from the LHCOPN operational working group meeting. It includes topics such as the operational model, feedback, implementation status, tools, and next steps.
E N D
LHCOPN operational working group report Guillaume Cessieux (FR-CCIN2P3 / EGEE networking support) on behalf of the Ops WG LHCOPN meeting, 2009-01-15, Berlin
Background • LHCOPN meeting in Copenhagen 2008-10-16/17 • Test procedures for backup paths agreed • Feedbacks requested • Roadmap for implementation needed • One LHCOPN Ops meeting in mid December • http://indico.cern.ch/conferenceDisplay.py?confId=44050 • Very productive (20 actions + 15 actions for GGUS) GCX - LHCOPN meeting - 2009-01-15
Agenda • The operational model itself • Main feedbacks reviewed • Main changes on the model • Areas of weaknesses • Implementation status and updates • Tools • Roadmap • Pending & next steps GCX - LHCOPN meeting - 2009-01-15
https://twiki.cern.ch/twiki/bin/view/LHCOPN/OperationalModel 1- The OPERATIONAL Model GCX - LHCOPN meeting - 2009-01-15
Ops model in one slide 1 2 Site A Site B NREN A * NREN B NREN C 4 3 Users LHCOPN TTS (GGUS) All sites • Federated with key responsibilities on T1s • On top of what currently exists • Information centralised (twiki & GGUS) GCX - LHCOPN meeting - 2009-01-15
Overview of sites feedbacks GCX - LHCOPN meeting - 2009-01-15
Sites feedbacks • Fear of additional load for small events • Wise thresholds [> 1 hour || > 5 times an hour] • Lack of accuracy of the Ops model on twiki • Initially a high level view • Point us what is still not enough detailed • Many details • Open tickets and then investigate - or the contrary • Flexible model GCX - LHCOPN meeting - 2009-01-15
Network providers feedbacks • Where is the E2ECU? • Hard to understand the twiki • Balance between complexity and accuracy • Low robustness • Federated model cannot work seriously in a stable mode • Inappropriate way to operate such a network • Hot potatoes, cost, distributed ownership of trouble • “You are not prepared for the worst” • Responsibilities will be highlighted based on cost model GCX - LHCOPN meeting - 2009-01-15
Grid feedbacks • Communication channel to the Grid to be studied • Different user communities to be targeted • Grid data contacts to be nominated • Performance issues are very important GCX - LHCOPN meeting - 2009-01-15
Changes on the model (1/2) • Much vagueness removed • Reasonable, major, suitable... • Notification: No longer all sites but affected ones • Sample common use cases provided • https://twiki.cern.ch/twiki/bin/view/LHCOPN/OpsModelUseCases • Quality assessment by CH-CERN • When? • Infrastructure and operation • Suitable data to be available GCX - LHCOPN meeting - 2009-01-15
Changes on the model (2/2) • Responsibilities highlighted • Outages on links between T0 and T1s are of responsibility of T1s (which ordered the link) • Responsibility for outages on T1-T1 links are being studied (should be mapped from existing contract by studying costs model: who pays what, where) • Responsibility for GGUS' ticket is on the unique site which the ticket is assigned to • « You take responsibilities for what you ordered » GCX - LHCOPN meeting - 2009-01-15
Areas of weaknesses • Robustness to be really ensured • Will sites play the game? • Is quality assessment a sufficient way to be protected from passivity of sites? • Grid interactions • They have to provide us clear communication channels GCX - LHCOPN meeting - 2009-01-15
2 –implementation GCX - LHCOPN meeting - 2009-01-15
Tools status (1/4) Operationalprocedures Global web repository (Twiki) Operational contacts Technical information Change management DB Statistics reports • Global information repository • CERN twiki https://twiki.cern.ch/twiki/bin/view/LHCOPN/WebHome • Deeply reorganised • With private part • TTS access details, statistics reports… • Change management database will be into • https://twiki.cern.ch/twiki/bin/view/LHCOPN/ChangeManagementDatabase • Acts as LHCOPN’s technical logbook GCX - LHCOPN meeting - 2009-01-15
Tools status (2/4) LHCOPN TTS (GGUS) • LHCOPN trouble ticket system • GGUS dedicated helpdesk • Access previously opened to the ops working group • First review done and requests sent 2008-12-15 • Group certificate? • Really taking shape • Next release = first production usable release • 2009-02-01 GCX - LHCOPN meeting - 2009-01-15
Tools status (3/4) • Around GGUS • 15 pending actions • Details but also key things for production use • E-mail reminders • A weekly reminder of GGUS tickets assigned to a site and opened • A weekly reminder of GGUS tickets submitted by a site and still opened • E-mail notifications • By default only to impacted sitenames and site which the ticket is assigned (if different) • More notification options: No notification or to all GCX - LHCOPN meeting - 2009-01-15
Tools status (4/4) • LHCOPN Planning/Calendar - Ongoing • Automatic export of GGUS tickets in open iCalendar standard format (.ics) • And a web instance of the calendar GCX - LHCOPN meeting - 2009-01-15
Other • New link IDs for “hidden” links that can deeply affect the LHCOPN • DE-KIT-I-II-LHCOPN-001, CH-CERN-I-II-LHCOPN-001, IT-INFN-CNAF-MIL-BOL-LHCOPN-001 TW-ASGC-AMS-TPE-LHCOPN-001, TW-ASGC-AMS-CHI-LHCOPN-001, TW-ASGC-CHI-TPE-LHCOPN-001 • Key dependencies: Monitoring • Soon trustable? • ASPDrawer – BGP monitoring • Deploy it fully, hosted by CERN, integrated within MDM? – cf. tomorrow’s talk • DownCollector’s LHCOPN flavour https://ccenoc.in2p3.fr/DownCollector/ GCX - LHCOPN meeting - 2009-01-15
Proposed roadmap for implementation April’s LHCOPN meeting Key improvements and adjustements of the model July’s LHCOPN meeting First complete assessment and final adjustements January’s LHCOPN meeting Final production version Model compulsory LHC startup Production version Model compulsory First public release of LHCOPN TTS Trial version Model optional 1 2 3 4 5 6 7 8 9 10 11 2009 GCX - LHCOPN meeting - 2009-01-15
Next steps • Gather GGUS accesses details • Table to be filled on twiki https://twiki.cern.ch/twiki/bin/view/LHCOPN/TTSdetails • “Test” tickets, notifications and twiki accesses • Dissemination around the ops model? • Presentation and “training”? • Target: 12 router operators? • Define KPI GCX - LHCOPN meeting - 2009-01-15
Pending Ops model: • Finalise implementation, test, disseminate, assess, improve Tools: • GGUS production usable release (2009-02-01) • And accesses • Calendar Others: • Monitoring, quality assessment, unified authentication GCX - LHCOPN meeting - 2009-01-15
Conclusion • Model itself • Complex high level view, but flexible • Robustness to be ensured • Need commitment from sites • Can drive improvement of the model • Implementation • Tools taking shape • Tighten schedule to match potential LHC start-up GCX - LHCOPN meeting - 2009-01-15
Questions & discussion GCX - LHCOPN meeting - 2009-01-15