1 / 24

WLCG Collaboration Issues

WLCG Collaboration Issues. WLCG Collaboration Board 24 th April 2008. Strategic Issues. A number of aspects of WLCG where we see the need for some structuring of dialogue with the Tier 2 federations: Reliabilities Accounting Resource pledges/installed capacity Milestones

cortez
Télécharger la présentation

WLCG Collaboration Issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WLCG Collaboration Issues WLCG Collaboration Board 24th April 2008

  2. Strategic Issues • A number of aspects of WLCG where we see the need for some structuring of dialogue with the Tier 2 federations: • Reliabilities • Accounting • Resource pledges/installed capacity • Milestones • Other issues that are arising: • Engagement in EGI/NGI (etc) for future infrastructures • Resource procurement schedules/delays/process • General aspects of Tier 2 coordination/information flow: • Information from MB, engagement in GDB • Technical points – how to discuss with Tier 2s: • Move to SL5/6; pilot jobs; fabric monitoring/tools; what tools do Tier 2s miss? • What is the voice of the Tier 2’s ?

  3. Recent grid use • Across all grid infrastructures • Preparation for, and execution of CCRC’08 phase 1 • Move of simulations to Tier 2s CERN: 11% Tier 2: 54% Tier 1: 35% Federations not yet reporting: Finland India (IN-INDIACMS-TIFR) Norway Sweden Ukraine

  4. Accounting for Tier-2s (1) • Test reporting took place in summer 2007 and formal reporting started from September 2007. • Monthly reports are now produced, circulated for comment and published on the LCG Project Planning website. • Currently the 52 of the 57 Federations are reporting accounting data over a total of 107 sites: • Changes still being signaled for site names therefore situation not yet fully stable • Some Federations provided pledge information from 2008 onwards and will be included in the reporting from April • Follow-up required with Finland, India, Norway, Sweden and Ukraine to include them in the accounting reporting • Slide 5 shows the global picture of reporting by country from September 2007-February 2008. • Slides 6 and 7 show the comparison of MoU pledge with CPU provided split according to size of pledge.

  5. Accounting for Tier-2s (2)

  6. Accounting for Tier-2s (3)

  7. Accounting for Tier-2s (4) What we don’t see here is the installed capacity

  8. Computing Resource Pledge Responsibilities • Following the pledge revision exercise of Autumn 2007 a reminder of the process is felt necessary. • Autumn C-RRB meeting each Federation is expected to provide: • Firm commitment to pledge values for the following year • Planned pledge values for the subsequent 4 years • Spring C-RRB meeting each Federation is expected to: • Confirm that pledge values for the current year are installed and running a production service, or explain any problems for the current year or changes for future years • 2 weeks before the next C-RRB on 11/11/08 the following is therefore required: • Confirmed 2009 pledge values (confirmation of already communicated value, or revised upwards) • Planned pledge values 2010-2013 inclusive (confirmation or revision of already communicated values, + 2013)

  9. Tier 0/Tier 1 Site reliability • Target: • Sites 91% & 93% from December • 8 best: 93% and 95% from December • See QR for full status Follow up process in MB over many months with individual sites

  10. Tier 2 Reliabilities • Reliabilities published regularly since October • In February 47 sites had > 90% reliability How do we address this? • For the Tier 2 sites reporting: • For Tier 2 sites not reporting, 12 are in top 20 for CPU delivered Jan 08

  11. How should the federations be reported - weighted?

  12. Reliability reporting • Currently (Feb 08) All Tier 1 and 100 Tier 2 sites report reliabilities • Recent progress: MB set up group to • Agreement on equivalence of NDGF tests with those used at EGEE and all other Tier 1 sites – now in production at NDGF • Should also be used for Nordic Tier 2 sites • Similar process with OSG (for US Tier 2 sites): tests only for CE so far, agreement on equivalence, tests are in production, publication to SAM in progress • Missing – SE/SRM testing • Expect full production May 2008 (new milestone introduced) • Important that we have all Tier 2s regularly tested and reporting • Important that we have correct Tier 2 federation contact to follow up these issues

  13. Reporting • Urgent now that: • Remaining Tier 2 federations start reporting on reliabilities and accounting • Follow up monthly in checking the published data – we have to understand if there are problems in the process • If the site names are wrong – please tell us what they should be (and how they map to the physical site host names) • Resource installation • We need to gather also information about installed resources at Tier 2s • Follow up process: • For Tier 1s this was done monthly in the MB, site by site – was manageable but slow; with Tier 2s this process is unwieldy (110+ sites) • Need a contact person for each federation, and would be far more convenient to have a contact for each country

  14. Updated Resource Status Summary for May CCRC’08 For 5 May not all sites will now have their full 2008 cpu pledges available, a total of 28648 KSi2K (9600 KSi2K more than in 1Q2008 but a drop of 8000 from Feb plans) . Largest missing sites are +2500 KSi2K at NL-T1 due November 2008, +1700 KSi2K at CNAF due June, +1300 KSi2K at US-CMS due end May and +3400 KSi2K at US-ATLAS due early June. For disk and tape many sites will catch up later in the year as need expands: 2008 disk requirements are 23 PB and 12.4 PB are expected to be available for 5 May (3 PB more than in 1Q2008 but a drop of 3.1 from Feb plans) while 2008 tape requirements are 24 PB and 13.6 PB are expected to be available for 5 May (4.8 PB more than in 1Q2008 but a drop of 1.4 PB from Feb plans). Disk and tape storage for May full scale dress rehearsal run of CCRC’08 are probably better modelled by requiring 55% (accelerator efficiency) times 30/100 (days running) of the increased resource requirements for 2008/9 over those of 2007/8 so 2.8 PB of disk and 3 PB of tape. Globally not a problem but some sites will not be able to fully contribute to the May CCRC if this model is correct. These requirements are to be modified with the specific April 2008 experiment requirements to be given in the next talks. WLCG April 2008: Tier 0 and 1 Resources

  15. Summary of Disk Space Plans As usual the most critical resource: ASGC: Last 300 TB delivery end June CC-IN2P3: Last 880 TB planned for September FZK: Last 650 TB planned for October (600 ALICE, 50 CMS) CNAF: Last 730 TB planned for June/July NDGF: Grow as needed reaching last 700 TB by Autumn NL-T1: Add 800 TB by end May and last 1450 TB in November PIC: Last 370 TB planned for early June. RAL: Last 800 TB in acceptance, ready for end May. TRIUMF: Full pledge for May CCRC US-ATLAS: Add 1200 TB by end May and last 1000 TB in October US-CMS: Full pledge for May CCRC WLCG April 2008: Tier 0 and 1 Resources

  16. Resource procurement • This risks to be a major problem in the coming years • Important to work around the procurement processes so that we can be ready for the accelerator running each year • Has been a problem for almost all Tier 1s. • Is this also an issue for Tier 2s?

  17. Milestones • The project has mostly had formal milestones associated with the project, Tier 0, Tier 1s • It is now time to start to impose milestones on the Tier 2s for specific issues: • E.g. Reliability, resource installation, etc. • Again, will be important to have the appropriate technical coordinators to report and follow up on these issues

  18. Communication • Apart from the issues raised above, • How are the Tier 2s kept informed, and does it work? • Flow of information from Management Board, - do Tier 2s read the minutes? • Is everyone engaged in the GDB (or even aware that they can be)? • How can we structure the communication with the great number of Tier 2 sites, so that we can have a workable process to communicate problems and follow up (in both directions)?? • How can we aggregate Tier 2 status to report in LHCC/OB/RRB/CB etc? • Today it is extremely difficult to get an overview of Tier 2 status and problems

  19. Miscellaneous technical issues • Move to new versions of the OS – SL5/SL6 • Pilot jobs/glexec – is it OK for sites to deploy this now? • Fabric monitoring – • do Tier 2s do this sufficiently? • Do they have the tools? • Security tools? – are sites appropriately protected? • What tools do Tier 2s miss? • How do Tier 2s keep abreast of these developments? • Should participate in the GDB • Is more needed?

  20. Comments on EGI design study • Goal is to have a fairly complete blueprint in June • Main functions presented to NGIs in Rome workshop in March • Essential for WLCG that EGI/NGI continue to provide support for the production infrastructure after EGEE-III • We need to see a clear transition and assurance of appropriate levels of support; Transition will be 2009-2010 • Exactly the time that LHC services should not be disrupted • Concerns: • NGIs agreed that a large European production-quality infrastructure is a goal • Not clear that there is agreement on the scope • Reluctance to accept level of functionality required • Tier 1 sites (and existing EGEE expertise) not well represented by many NGIs • WLCG representatives must approach their NGI reps and ensure that EGI/NGIs provide the support we need These comments apply equally to Tier 2s- they really need to engage with the NGI in their countries

  21. EGI/NGI cont. • While WLCG should work hard to make sure that the EGI design study goes in the right direction, • Strategically the project must be prepared to plan for a fall-back • Tier 1s were questioned in the OB – all replied that they had some plan in place if there were no EGI/NGI • Albeit with a potential reduction in what they could contribute • We need to start thinking about what the Tier 2s can do • It will be clear in June whether the EGI_DS blueprint provides what we need • Put together a group to begin to look at fallback plans for Tier 2s?

  22. Summary • A number of aspects of WLCG where we see the need for some structuring of dialogue with the Tier 2 federations: • General aspects of Tier 2 coordination/information flow: • Information from MB, engagement in GDB • Technical points: • Move to SL5/6; pilot jobs; fabric monitoring/tools; what tools do Tier 2s miss? • What is the voice of the Tier 2’s ? • Do we need a group to start looking at Tier 2 fallback plans if EGI_DS does not deliver? • And what is the situation in US with OSG?

More Related