1 / 17

Computing for ILC experiment

Computing for ILC experiment. Computing Research Center, KEK Hiroyuki Matsunaga. Outline. KEKCC status Grid and distributed computing Belle II and ILC Prospect for the future Summary This talk focuses on data analysis. Common services (such as web and email) are not covered.

Télécharger la présentation

Computing for ILC experiment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga

  2. Outline • KEKCC status • Grid and distributed computing • Belle IIand ILC • Prospect for the future • Summary • This talk focuses on data analysis. Common services (such as web and email) are not covered.

  3. Computing system at KEK (KEKCC) • In operation since April 2012 • Only one large system for data analysis in KEK • Previous Belle system was merged • KEKCC resources are shared with all KEK’s projects • Belle / Belle II is the main user for the next several years • Grid (and Cloud) services have been integrated into current KEKCC • No user for Cloud so far

  4. KEKCC replacement • The whole system has to be replaced with the new one by another lease contract • Every ~3 years (next replacement in summer 2015) • This leads to many problems • System is not available during the replacement and commissioning • More than 1 month for data storage • Replica at other sites would help. • System is likely to be unstable just after the start of operations • Massive data transfer from the previous system needs long time and efforts • In the worst case, data could be lost…

  5. Batch server • Xeon ~3000 cores, Scientific Linux 5 • Grid and local jobs are all managed by LSF batch scheduler Grid jobs Cream 1 Cream 2 LB WMS Computing Element (CE) Local jobs LSF

  6. CPU Utilization

  7. Inefficient use of CPU • Up to 80% CPU utilization • Job slots are almost full • I/O bound • Some Belle jobs use index files and inefficient • Mis-use by end-users • Software bugs (which should be checked before submitting many jobs) • Very short jobs (Less than few minutes) • Overhead of batch scheduler (and Grid) gets higher • We have to monitor such jobs and guide the offending users on a daily basis • Grid users are not many so far, but user training is definitely needed

  8. CPU utilization for Grid Belle II MCproduction

  9. Storage System (HSM) Storage Element (SE) StoRM DPM Frontend Head GridFTP 1 GridFTP 3 GridFTP 2 GridFTP 4 GridFTP 1 GridFTP 2 2PB disk cache (GPFS) GHI (GPFS HPSS Interface) as a backend for GridFTP servers 16PB capacity (HPSS) For Grid: 600TB for Belle 350TB for ILC 80TB for others

  10. Grid at KEK • KEK is not involved in WLCG • But we have deployed gLite and NAREGI middleware for several years • ILC and Belle (II) are main Grid users at KEK • VO Management Service • Belle II : KEK • ILC (ILD + SiD) : DESY • DIRAC • Middleware to access to distributed resources • Grid, Cloud, local resources • Originally developed by and for LHCb • Used by Belle and ILC now • Need customization to each computing model

  11. A T. Hara (KEK)

  12. T. Hara (KEK)

  13. Preparation for Belle II Grid • KEK cannot afford huge resources to be needed in the next years with the current level of budget • Technology evolution is not so fast • Perhaps similar situation for other Grid sites • Data migration to new system will be more difficult • Human resources are not enough for both computing and experiment sides • Current service quality is not sufficient for the host lab (i.e. Tier 0) • We need to provide more services • Some tasks can be outsourced, but still need more lab staff • Preparation for computing started late

  14. ILC case • Smaller amount of data compared to Belle II (in the first several years) • Still similar to current level of LHC experiment • More collaborators (and sites) worldwide • All collaborators must be able to access data equally • Data distribution would be more complex • More efforts for coordination and monitoring of distributed computing infrastructure • In Belle II, most of software and services rely on WLCG. We should consider how we will do for ILC.

  15. Worldwide LHC Computing Grid (WLCG) • To support 4 LHC experiments • Close collaboration with EGI (European Grid Infrastructure) and OSG (American) • EGI and OSG supports many fields of science (Bio, Astrophysics, …), but future funding are not clear • We discussed Ian Bird (WLCG Project leader) in October, and he proposed WLCG expansion to include other HEP (Belle II and ILC etc.) and other fields. • Still being discussed in WLCG

  16. Future directions • Sustainability is a big problem • Future funding is not clear in EU and US • Maintaining Grid middleware by ourselves becomes heavy (with future funding) • Try to adopt “Standards” software/protocol as much as possible • CERN and many other sites is deploying Cloud • Operational cost should be reduced by streamlining services • Resource demands will not be affordable for WLCG (and Belle II) in the near future • We need better (efficient) computing model and software (e.g. better use of many cores) • Exploit new technology • GPGPU, ARM processor • Collaborate with other fields (and private sectors)

  17. Summary • Belle II is a big challenge for KEK • First full-scale distributed computing • For ILC, Belle II will be a good exercise and lessons learned would be beneficial • It would be nice for ILC to collaborate with Belle II • Important to train young students/postdocs who will join ILC in future • Keep up with technology evolution • Better software reduces processing resources • Education to users is also important • Start preparations early • LHC computing had been considered since ~2000 (>10 years before the Higgs discovery)

More Related