Challenges and Prospects in Computing for High Energy Physics Experiments at KEK Research Center

Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga

Outline • KEKCC status • Grid and distributed computing • Belle IIand ILC • Prospect for the future • Summary • This talk focuses on data analysis. Common services (such as web and email) are not covered.

Computing system at KEK (KEKCC) • In operation since April 2012 • Only one large system for data analysis in KEK • Previous Belle system was merged • KEKCC resources are shared with all KEK’s projects • Belle / Belle II is the main user for the next several years • Grid (and Cloud) services have been integrated into current KEKCC • No user for Cloud so far

KEKCC replacement • The whole system has to be replaced with the new one by another lease contract • Every ~3 years (next replacement in summer 2015) • This leads to many problems • System is not available during the replacement and commissioning • More than 1 month for data storage • Replica at other sites would help. • System is likely to be unstable just after the start of operations • Massive data transfer from the previous system needs long time and efforts • In the worst case, data could be lost…

Batch server • Xeon ~3000 cores, Scientific Linux 5 • Grid and local jobs are all managed by LSF batch scheduler Grid jobs Cream 1 Cream 2 LB WMS Computing Element (CE) Local jobs LSF

CPU Utilization

Inefficient use of CPU • Up to 80% CPU utilization • Job slots are almost full • I/O bound • Some Belle jobs use index files and inefficient • Mis-use by end-users • Software bugs (which should be checked before submitting many jobs) • Very short jobs (Less than few minutes) • Overhead of batch scheduler (and Grid) gets higher • We have to monitor such jobs and guide the offending users on a daily basis • Grid users are not many so far, but user training is definitely needed

CPU utilization for Grid Belle II MCproduction

Storage System (HSM) Storage Element (SE) StoRM DPM Frontend Head GridFTP 1 GridFTP 3 GridFTP 2 GridFTP 4 GridFTP 1 GridFTP 2 2PB disk cache (GPFS) GHI (GPFS HPSS Interface) as a backend for GridFTP servers 16PB capacity (HPSS) For Grid: 600TB for Belle 350TB for ILC 80TB for others

Grid at KEK • KEK is not involved in WLCG • But we have deployed gLite and NAREGI middleware for several years • ILC and Belle (II) are main Grid users at KEK • VO Management Service • Belle II : KEK • ILC (ILD + SiD) : DESY • DIRAC • Middleware to access to distributed resources • Grid, Cloud, local resources • Originally developed by and for LHCb • Used by Belle and ILC now • Need customization to each computing model

A T. Hara (KEK)

T. Hara (KEK)

Preparation for Belle II Grid • KEK cannot afford huge resources to be needed in the next years with the current level of budget • Technology evolution is not so fast • Perhaps similar situation for other Grid sites • Data migration to new system will be more difficult • Human resources are not enough for both computing and experiment sides • Current service quality is not sufficient for the host lab (i.e. Tier 0) • We need to provide more services • Some tasks can be outsourced, but still need more lab staff • Preparation for computing started late

ILC case • Smaller amount of data compared to Belle II (in the first several years) • Still similar to current level of LHC experiment • More collaborators (and sites) worldwide • All collaborators must be able to access data equally • Data distribution would be more complex • More efforts for coordination and monitoring of distributed computing infrastructure • In Belle II, most of software and services rely on WLCG. We should consider how we will do for ILC.

Worldwide LHC Computing Grid (WLCG) • To support 4 LHC experiments • Close collaboration with EGI (European Grid Infrastructure) and OSG (American) • EGI and OSG supports many fields of science (Bio, Astrophysics, …), but future funding are not clear • We discussed Ian Bird (WLCG Project leader) in October, and he proposed WLCG expansion to include other HEP (Belle II and ILC etc.) and other fields. • Still being discussed in WLCG

Future directions • Sustainability is a big problem • Future funding is not clear in EU and US • Maintaining Grid middleware by ourselves becomes heavy (with future funding) • Try to adopt “Standards” software/protocol as much as possible • CERN and many other sites is deploying Cloud • Operational cost should be reduced by streamlining services • Resource demands will not be affordable for WLCG (and Belle II) in the near future • We need better (efficient) computing model and software (e.g. better use of many cores) • Exploit new technology • GPGPU, ARM processor • Collaborate with other fields (and private sectors)

Summary • Belle II is a big challenge for KEK • First full-scale distributed computing • For ILC, Belle II will be a good exercise and lessons learned would be beneficial • It would be nice for ILC to collaborate with Belle II • Important to train young students/postdocs who will join ILC in future • Keep up with technology evolution • Better software reduces processing resources • Education to users is also important • Start preparations early • LHC computing had been considered since ~2000 (>10 years before the Higgs discovery)

Challenges and Prospects in Computing for High Energy Physics Experiments at KEK Research Center

Challenges and Prospects in Computing for High Energy Physics Experiments at KEK Research Center

Presentation Transcript

The Computing System for the Belle Experiment

Offline Computing for the MINERVA Neutrino Experiment – Heidi Schellman

Hybrid sensor work for ILC

ILC-inspired TPC for Tlep ?

Plastic Scintillator Detector for ILC

Detector Development for the ILC

Precise prediction for ILC experiment

Ultimate polarization for ilc

Some Timing Aspects for ILC

ILC

Low RRR Materials for ILC

Klystron Power Supplies for ILC

Compton for ILC

FPCCD Vertex Detector for ILC

Energy Spectrometer for the ILC

DOE Outlook for ILC

Computing Strategy of the AMS-02 Experiment

ILC Challenges for electronics

Fast Timing for ILC detectors ?