1 / 26

Grid Computing for High Energy Physics in Japan

Grid Computing for High Energy Physics in Japan. Hiroyuki Matsunaga International Center for Elementary Particle Physics (ICEPP), The University of Tokyo International Workshop on e-Science for Physics 2008. Major High Energy Physics Program in Japan. KEK-B (Tsukuba) Belle J-PARC (Tokai)

manju
Télécharger la présentation

Grid Computing for High Energy Physics in Japan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Computing for High Energy Physics in Japan Hiroyuki Matsunaga International Center for Elementary Particle Physics (ICEPP), The University of Tokyo International Workshop on e-Science for Physics 2008

  2. Major High Energy Physics Program in Japan • KEK-B (Tsukuba) • Belle • J-PARC (Tokai) • Japan Proton Accelerator Research Complex • Operation will start within this year • T2K (Tokai to Kamioka) • long baseline neutrino experiment • Kamioka • SuperKamiokande • KamLAND • International collaboration • CERN LHC (ATLAS, ALICE) • Fermilab Tevatron (CDF) • BNL RHIC (PHENIX)

  3. Grid Related Activities • ICEPP, University of Tokyo • WLCG Tier2 site for ATLAS • Regional Center for ATLAS-Japan group • Hiroshima University • WLCG Tier2 site for ALICE • KEK • Two EGEE production sites • BELLE experiment, J-PARC, ILC… • University support • NAREGI • Grid deployment at universities • Nagoya U. (Belle), Tsukuba U. (CDF)… • Network

  4. Grid Deployment at University of Tokyo • ICEPP, University of Tokyo • Involved in international HEP experiments since 1974 • Operated pilot system since 2002 • Current computer system started working last year • TOKYO-LCG2. gLite3 installed • CC-IN2P3 (Lyon, France) is the associated Tier 1 site within ATLAS computing model • Detector data from CERN go through CC-IN2P3 • Exceptionally far distance for T1-T2 • RTT ~280msec, ~10 hops • Challenge for efficient data transfer • Data catalog for the files in Tokyo located at Lyon • ASGC (Taiwan) could be additional associated Tier1 • Geographically nearest Tier 1 (RTT ~32msec) • Operations have been supported by ASGC • Neighboring timezone

  5. Hardware resources • Tier-2 site plus (non-grid) regional center facility • Support local user analysis by ALTAS Japan group • Blade servers • 650 nodes (2600 cores) • Disk arrays • 140 Boxes (~6TB/box) • 4Gb Fibre-Channel • File servers • Attach 5 disk arrays • 10 GbE NIC • Tape robot (LTO3) • 8000 tapes, 32 drives Tape robot Blade servers Disk arrays Disk arrays

  6. SINET3 • SINET3 (Japanese NREN) • Third generation of SINET, since Apr. 2007 • Provided by NII (National Institute of Informatics) • Backbone: up tp 40Gbps • Major universities connect with 1-10 Gbps • 10 Gbps to Tokyo RC • International links • 2 x 10 Gbps to US • 2 x 622 Mbps to Asia

  7. International Link • 10Gbps between Tokyo and CC-IN2P3 • SINET3 + GEANT + RENATER (French NREN) • public network (shared with other traffic) • 1Gbps link to ASGC (to be upgraded to 2.4 Gbps) GEANT (10Gbps) SINET3 (10Gbps) RENATER (10Gbps) Lyon New York Tokyo Taipei

  8. Network test with Iperf • Memory-to-memory test performed with Iperf program • Use Linux boxes dedicated for iperf test at both ends • 1Gbps limited by NIC • Linux kernel 2.6.9 (BIC TCP) • Window size 8Mbytes, 8 parallel streams • For Lyon-Tokyo: long recovery time due to long RTT Taipei <-> Tokyo (RTT: 32ms) Lyon <-> Tokyo (RTT: 280ms)

  9. Data Transfer from Lyon Tier1 center • Data transferred from Lyon to Tokyo • Used Storage Elements in production • ATLAS MC simulation data • Storage Elements • Lyon: dCache (>30 gridFTP servers, Solaris, ZFS) • Tokyo: DPM (6 gridFTP servers, Linux, XFS) • FTS (File Transfer System) • Main tool for bulk data transfer • Execute multiple file transfers (by using gridFTP) concurrently • Set number of streams for gridFTP • Used in ATLAS Distributed Data Management system

  10. Performance of data transfer Throughput per file transfer • >500 Mbytes/s observed in May, 2008 • Filesize: 3.5Gbytes • 20 files in parallel, 10 streams each • ~40Mbytes/s for each file transfer • Low activity at CC-IN2P3 during the period (other than ours) 100 10 1 40 20 0 Mbytes/s 500 Mbytes/s

  11. Data transfer between ASGC and Tokyo • Transferred 1000 files at a test (1Gbytes filesize) • Tried various numbers of concurrent files / streams • From 4/1 to 25/15 • Saturate 1Gbps WAN bandwidth 25/10 25/15 25/10 20/10 8/2 16/1 8/1 4/2 4/4 Tokyo -> ASGC 4/1 20/10 16/1 4/2 25/10 4/4 ASGC -> Tokyo

  12. CPU Usage in the last year (Sep 2007 – Aug 2008) • 3,253,321 CPU time (kSI2k*hours) in last year • Most jobs are ATLAS MC simulation • Job submission is coordinated by CC-IN2P3 (the associated Tier1) • Outputs are uploaded to the data storage at CC-IN2P3 • Large contribution to the ATLAS MC production TOKYO-LCG2 CPU time per month CPU time at Large Tier2 sites

  13. ALICE Tier2 center at Hiroshima University • WLCG/EGEE site • “JP-HIROSHIMA-WLCG” • Possible Tier 2 site for ALICE

  14. Status at Hiroshima • Just became EGEE production site • Aug. 2008 • Associated Tier1 site will likely be CC-IN2P3 • No ALICE Tier1 in Asia-Pacific region • Resources • 568 CPU cores • Dual-Core Xeon(3GHz) X 2cpus X 38boxes • Quad-Core Xeon(2.6GHz) X 2cpus X 32boxes • Quad-Core Xeon(3GHz) X 2cpus X 20blades • Storage: ~200 TB next year • Network: 1Gbps • On SINET3

  15. KEK • Belle experiment has been running • Need to have access to existing peta-bytes of data • Site operations • KEK does not support any LHC experiment • Try to gain experience by operating sites in order to prepare for future Tier1 level Grid center • University support • NAREGI KEK Tsukuba campus Mt. Tsukuba Belle exp. KEKB Linac

  16. Grid Deployment at KEK • Two EGEE sites • JP-KEK-CRC-1 • Rather experimental use and R&D • JP-KEK-CRC-2 • More stable services • NAREGI • Used beta version for testing and evaluation • Supported VOs • belle (main target at present), ilc, calice, … • Not support LCG VOs • VOMS operation • belle (registered in CIC) • ppj (accelerator science in Japan), naokek • g4med, apdg, atlasj, ail

  17. Belle VO • Federation established • 5 countries, 7 institutes, 10 sites • Nagoya Univ., Univ. of Melbourne, ASGC, NCU, CYFRONET, Korea Univ., KEK • VOMS is provided by KEK • Activities • Submit MC production jobs • Functional and performance tests • Interface to existing peta-bytes of data

  18. Takashi Sasaki (KEK)

  19. ppj VO • Federated among major universities and KEK • Tohoku U. (ILC, KamLAND) • U. Tsukuba (CDF) • Nagoya U. (Belle, ATLAS) • Kobe U. (ILC, ATLAS) • Hiroshima IT (ATLAS, Computing Science) • Common VO for accelerator science in Japan • NOT depend on specific projects, but resources shared • KEK acts as GOC • Remote installation • Monitoring • Based on Nagios and Wiki • Software update

  20. KEK Grid CA • Started since Jan. 2006 • Accredited as an IGTF (International Grid Trust Federation) compliant CA Numbers of Issued certificates

  21. NAREGI • NAREGI: NAtional REsearch Grid Initiative • Host institute: National Institute of Infrmatics (NII) • R&D of the Grid middleware for research and industrial applications • Main targets are nanotechnology and biotechnology • More focused on computing grid • Data grid part integrated later • Ver. 1.0 of middleware released in May, 2008 • Software maintenance and user support services will be continued

  22. NAREGI at KEK • NAREGI-b version installed on the testbed • 1.0.1: Jun. 2006 – Nov. 2006 • Manual installation for all the steps • 1.0.2: Feb 2007 • 2.0.0: Oct. 2007 • apt-rpm installation • 2.0.1: Dec. 2007 • Site federation test • KEK-NAREGI/NII: Oct. 2007 • KEK-National Astronomy Observatory (NAO): Mar. 2008 • Evaluation of application environment of NAREGI • job submission/retrieval, remote data stage-in/out

  23. Takashi Sasaki (KEK)

  24. Data Storage: Gfarm • Gfarm: distributed file system • DataGrid part in NAREGI • Data are stored in multiple disk servers • Tests performed : • Stage-in and stage-out to the Gfarm storage • GridFTP interface • Between gLite site and NAREGI site • File access from application • Have access with FUSE (Filesystem in userspace) • Without the need of changing application program • IO speed is several times slower than local disk

  25. Future Plan on NAREGI at KEK • Migration to the production version • Test of interoperability with gLite • Improve the middleware in the application domain • Development of the new API to the application • Virtualization of the middleware • for script languages (to be used at web portal as well) • Monitoring • Jobs, sites,…

  26. Summary • WLCG • ATLAS Tier2 at Tokyo • Stable operation • ALICE Tier2 at Hiroshima • Just started operation in production • Coordinated effort lead by KEK • Site operations with gLite and NAREGI middlewares • Belle VO: SRB • Will be replaced with iRODs • ppj VO: deployment at universities • Supported and monitored by KEK • NAREGI • R&D, interoperability

More Related