1 / 18

UKI-SouthGrid Report and Final Preparation Steps

UKI-SouthGrid Report and Final Preparation Steps. Kashif Mohammad Deputy SouthGrid Technical Coordinator GridPP 23 - Cambridge 9th September 2009. SouthGrid Tier 2. The UK is split into 4 geographically distributed tier 2 centres SouthGrid comprise of all the southern sites not in London.

elise
Télécharger la présentation

UKI-SouthGrid Report and Final Preparation Steps

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UKI-SouthGrid Reportand Final Preparation Steps Kashif Mohammad Deputy SouthGrid Technical Coordinator GridPP 23 - Cambridge 9th September 2009

  2. SouthGrid Tier 2 • The UK is split into 4 geographically distributed tier 2 centres • SouthGrid comprise of all the southern sites not in London • New sites likely to join GridPP-23 Cambridge

  3. UK Tier 2 reported CPU – Historical View to present GridPP-23 Cambridge

  4. SouthGrid SitesAccounting as reported by APEL GridPP-23 Cambridge

  5. New Total Q209SouthGrid GridPP-23 Cambridge

  6. Site Setup Summary GridPP-23 Cambridge

  7. SL5 Migration and Benchmarking • RAL-PPD has already moved its whole cluster to SL5. • Oxford has moved a small part of its cluster to SL5. • Plan to move rest of Cluster before end of September. • Bristol has a small dedicated cluster and a shared HPC Cluster • Ready to move, but some problem due to shared resources and GPFS file system • Birmingham has also a dedicated cluster and shared HPC • Plan to move in October • Cambridge is planning to move in October. • Condor support is an issue • Benchmarking • All sites have benchmarked their system using HEPSPEC 2006 but not publishing it yet in BDII. GridPP-23 Cambridge

  8. New Staff • May 2009 Chris Curtis SouthGrid Hardware support based at Birmingham • June 2009 Bob Cregan HPC support at Bristol GridPP-23 Cambridge

  9. GRIDPPNAGIOS https://gridppnagios.physics.ox.ac.uk/nagios http://www.gridpp.ac.uk/wiki/UKI_Regional_Nagios Many new features are available Use of messaging bus through message broker Most of the SAM equivalent test is available But still in development stage GridPP-23 Cambridge

  10. Prepare To Run GridPP-23 Cambridge

  11. LHC VO Usage in last 9 months GridPP-23 Cambridge

  12. CMS and LHCb • CMS jobs are running very efficiently. • Bristol and RALPP are two T2 CMS Sites in SouthGrid. • At Bristol, problem using GPFS/StoRM. CMS jobs using file protocal change the ACLs/ permission. Temporary solution is to run cron job every half hour. • Not very efficient. • Oxford is running CMS jobs using PhEDEx server at RAL-PPD • LHCb Jobs are also running very efficiently. • Some times sites were banned and site admins have no idea that they are banned. • Should be a mechanism to notify sites before banning it. • Otherwise no major problems. • But, do we need stress test for LHCb and CMS? • Once data taking commences will the load at sites increase significantly ? GridPP-23 Cambridge

  13. Step09 and HammerCloud Tests • 4 SouthGrid sites participated in Step09. • Very useful in finding bottlenecks and configuration problems. http://gangarobot.cern.ch/st/step09summary.html GridPP-23 Cambridge

  14. Bottlenecks and Solution • First series of HC tests used RFIO access. • At RAL-PPD, network connection between two machine rooms having WN’s and storage was found to be problem. • Currently it is 2 X I Gbps, plan to upgrade to 10Gbps in very near future. • In Oxford also we faced a similar problem with the network link to storage pool node becoming saturated. • Currently 1Gpbs connection. • Wish to upgrade to 10Gbps. GridPP-23 Cambridge

  15. Bottlenecks and Solution • Second series of HC tests used file staging. • At Oxford, we Increased the number of job slot available to atlas pilot jobs periodically. • We faced the disk contention issue as the number of jobs on a single wn increases. GridPP-23 Cambridge

  16. Conclusion • Control number of jobs per WN through MAUI • Currently not available in maui in a clean way • RFIO read ahead buffer • Experimenting with different read ahead buffer size • Inconclusive • Channel Bonding • Helped but not much. • SSD for DPM Head Node database • Ordered one for oxford. Would test it. • 10Gbps Network Connection between WN and Disk pool • Would certainly help Low cost High cost GridPP-23 Cambridge

  17. Conclusion • File system • Require man power and expertise • Lustre : Seems to improved performance at QMUL • GPFS : Bad experience at Bristol • Xrootd : No idea • SSD Hard Disk in Worker Nodes • Too expensive • May be next time. Low cost High cost GridPP-23 Cambridge

  18. Thank You GridPP-23 Cambridge

More Related