1 / 29

Grid Deployment & Operations in the UK

Grid Deployment & Operations in the UK. Wednesday 3 rd May ISGC 2006, Taipei. Jeremy Coles GridPP Production Manager UK&I Operations for EGEE J.Coles@rl.ac.uk. Overview. 1 Background to e-Science – The UK Grid Projects NGS & GridPP. 2 The deployment and operations models and vision.

Télécharger la présentation

Grid Deployment & Operations in the UK

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Deployment & Operations in the UK Wednesday 3rd May ISGC 2006, Taipei Jeremy Coles GridPP Production Manager UK&I Operations for EGEE J.Coles@rl.ac.uk

  2. Overview 1 Background to e-Science – The UK Grid Projects NGS & GridPP 2 The deployment and operations models and vision 3 GridPP performance measures 4 Progress in GridPP against LCG requirements 5 Future plans 6 Summary

  3. UK e-Science • National initiatives began in 2001 • UK e-Science programme • Application focused/led developments • Varying degree of “infrastructure” … • ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ • John Taylor • Director General of Research Councils • Office of Science and Technology http://www.rcuk.ac.uk/escience/

  4. UK e-Infrastructure directions Regional and Campus grids VRE, VLE, IE HPCx + HECtoR LHC ISIS TS2 Users get common access, tools, information, Nationally supported services, through NGS Community Grids Integrated internationally

  5. UK e-Infrastructure directions

  6. Applications • Thermodynamic integration • Molecular dynamics • Systems biology • Neutron scattering • Econometric analysis • Climate modelling • Nano-particles • Protein folding • Ab-initio protein structure prediction • radiation transport (radiotherapy) • IXI (medical imaging) • Biological membranes • Micromagnetics • Archeology • Text mining • Lattice QCD (analysis) • Astronomy (VO services) • Many, but not all, applications cover traditional computational sciences • Both user and pre-installed software • Several data focused activities • Common features are • Distributed data and/or collaborators • Not just pre-existing large collaborations • Explicitly encourage new users • Common infrastructure/interfaces

  7. National Grid Service

  8. The UK & Ireland contribution to EGEE SA1 – deployment & operations • Consisted of 3 partners in EGEE-I: • The National Grid Service (NGS) • Grid Ireland • GridPP

  9. The UK & Ireland contribution to EGEE SA1 – deployment & operations • Consisted of 3 partners in EGEE-I: • The National Grid Service (NGS) • Grid Ireland • GridPP • Grid-Ireland focus: • National computational grid for Ireland built over the Higher Education Authority network • Central operations from Dublin • Have developed an auto-build system for EGEE componenets

  10. The UK & Ireland contribution to EGEE SA1 – deployment & operations • Consisted of 3 partners in EGEE-I: • The National Grid Service (NGS) • Grid Ireland • GridPP - Composed of 4 regional Tier-2s and a Tier-1 as per the LCG Tier model • In EGEE-II: • NGS and Grid-Ireland unchanged • The lead institute in each of the GridPP Tier-2s becomes a partner.

  11. What UK structures are involved?

  12. Focus: GridPP structure Oversight Committee Collaboration Board Project Management Board Deployment Board Tier-2 Board User Board Tier-1 Board Tier-1 Manager Production Manager Helpdesk support NorthGrid Coordinator SouthGrid Coordinator ScotGrid Coordinator London Tier-2 Coordinator Tier-1 Technical Coordinator Catalogue support Tier-2 support Tier-2 support Tier-2 support Tier-2 support Tier-1 support & administrators Storage Group Site Administrator Site Administrator Site Administrator Site Administrator Networking Group VOMS support

  13. GridPP structure and work areas Oversight Committee Collaboration Board Project Management Board Deployment Board Tier-2 Board User Board Tier-1 Board Tier-1 Manager Production Manager Helpdesk support NorthGrid Coordinator SouthGrid Coordinator ScotGrid Coordinator London Tier-2 Coordinator Tier-1 Technical Coordinator Catalogue support Tier-2 support Tier-2 support Tier-2 support Tier-2 support Tier-1 support & administrators Storage Group Site Administrator Site Administrator Site Administrator Site Administrator Networking Group VOMS support Example activities from across these areas • Supporting dCache • Supporting DPM • Developing plug-ins • Constructing data views • Supporting network testing • Running core services • Ticket process management • Pre-production service • UK testzone • Pre-release testing • Deployment of new hardware • Information exchange • Maintaining site services • Maintaining production services • LCG service challenges • GridPP challenges • Monitoring use of resources • Reporting • Running helpdesks • Interoperation – parallel deployment • Updating project plans • Agreeing resource allocations • Checking project direction • Tracking documentation • VO interaction/support • Portal development Recent output from SOME areas follows….

  14. How effectively are resources being used? Tier-1 developed script uses one simple measure: sum(CPU time) / sum(wall time). Low efficiencies for 2005 were generally a few jobs making the situation look bad. 2006 Problems with SEs http://www.gridpp.rl.ac.uk/stats/

  15. RTM data views - efficiency What are the underlying reasons for big differences in overall efficiency * Data shown for Q42005 http://gridportal.hep.ph.ic.ac.uk/rtm/reports.html

  16. RTM data views - usage Does the usage distribution make sense? http://gridportal.hep.ph.ic.ac.uk/rtm/reports.html * Data shown for Q42005

  17. RTM data views – job distribution Operations needs to check mappings and discover why some sites not used * Data shown for Q42005 http://gridportal.hep.ph.ic.ac.uk/rtm/reports.html

  18. Site performance measures • Storage provided

  19. Site performance measures • Storage provided • Scheduled downtime

  20. Site performance measures • Storage provided • Scheduled downtime • Estimated occupancy

  21. Site performance measures • Storage provided • Scheduled downtime • Estimated occupancy • SFT failures

  22. Site performance measures • Storage provided • Scheduled downtime • Estimated occupancy • SFT failures • Tickets & responsiveness

  23. Site performance measures • Storage provided • Scheduled downtime • Estimated occupancy • SFT failures • Tickets & responsiveness • # VOs supported

  24. Site performance measures • Storage provided • Scheduled downtime • Estimated occupancy • SFT failures • Tickets & responsiveness • # VOs supported • + others….. • WHAT MAKES A SITE BETTER (beyond manpower)? • Need more data over longer periods • Ideally need more automated data! • Importance will increase in meeting MoU/SLA targets • How reliable are the metrics

  25. Meeting the LCG challengeExample: Tier-2 individual transfer tests Receiving RAL Tier-1 Lancaster Manchester Edinburgh Glasgow Birmingham Oxford Cam Durham QMUL IC-HEP RAL-PPD RAL Tier-1 ~800Mb/s 350Mb/s 156Mb/s 166 Mb/s 289 Mb/s 252 Mb/s 118 Mb/s 84Mb/s 397 Mb/s Lancaster Manchester 150 Mb/s Edinburgh 440Mb/s Glasgow 331Mb/s Birmingham 461 Mb/s IC-HEP Oxford 456 Mb/s Cambridge 74 Mb/s Durham 193 Mb/s QMUL 172 Mb/s IC-HEP RAL-PPD 388 Mb/s Initial focus was on getting SRMs understood and deployed….. • Big variation in what sites could achieve • Internal networking configuration issues • Site connectivity (and contention) • SRM setup and level of optimisation • Rates to RAL were generally better than from RAL • Availability and setup of gridFTP servers at Tier-2s • SRM setup and level of optimisation • Scheduling tests was not straightforward • Availability of local site staff • Status of hardware deployment • Availability of Tier-1 • Need to avoid first tests during certain periods (local impacts) Example rates from throughput tests http://wiki.gridpp.ac.uk/wiki/Service_Challenge_Transfer_Tests

  26. Meeting the LCG challengeExample: Tier-1&Tier-2 combined transfer tests • Early attempts revealed unexplained dropouts • Dropouts later traced to firewall • A rate cap at RAL was introduced for later tests • Tests repeated to check RAL capping • Rate was stretched further by using an OPN link to Lancaster http://wiki.gridpp.ac.uk/wiki/SC4_Aggregate_Throughput

  27. Meeting the LCG challengeTier-1&Tier-2 combined transfer tests-rerun http://wiki.gridpp.ac.uk/wiki/SC4_Aggregate_Throughput

  28. GridPP operations: What is next? • SRM deployments now stable and focus has shifted to improving site configurations and optimisations • Sites are now more comfortable with the release/reporting process but concerns remain – gLite 3.0 • We need to continue improving site transfer performance but also extend the tests to include such things as sustained simultaneous reading and writing • Several sites are receiving new equipment – we need to ensure a smooth deployment. 64-bit machines are being deployed in some cases. • GridPP mapped its Tier-2s to experiments for closer working and “proving” of the Tier-2 capabilities. Some progress already but much more needed. • Data is becoming available for understanding performance of sites but integration and automaton is far from ideal. • The installation of network monitoring “boxes” at UK sites • Security – several areas but extending ROC security challenge and implementing an approach for joint logging are in progress. • More interoperation (and joint operations) with NGS

  29. Summary 1 UK e-science has a broad vision with NGS a central part 2 There will be increasing interoperation between UK activities 3 The UK particle physics grid remains one of the largest projects 4 Operational focus will shift to performance measures 5 Progress being made for LHC pilot service but not always smoothly 6 There are clear areas where further work is required

More Related