190 likes | 324 Vues
Site Report from KEK, Japan. JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan , Stockholm, Sweden 13-15 June 2007. JP-KEK-CRC-01 and JP-KEK-CRC-02. Deployment status at KEK. KEK External Network. KEK Internal Network.
E N D
Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 KungligaTekniskahögskolan, Stockholm, Sweden 13-15 June 2007
JP-KEK-CRC-01 and JP-KEK-CRC-02 Deployment status at KEK Grid Operations Workshop at KTH, Stockholm
KEK External Network KEK Internal Network Logical Site Overview Scoped only for GRIDs APAN SuperSINET • Taiwan • Asia-Pacific region KEK Firewall • Domestic institutes • U.S.A Grid LAN Central Computing System New KEK-CC HPSS JP-KEK-CRC-02 • Production System JP-KEK-CRC-00 JP-KEK-CRC-01 • Not for WLCG • Staff’s training • Will Shift to PPS • Production System Grid Operations Workshop at KTH, Stockholm
Physical Site Overview KEK-2 KEK-1 Grid Operations Workshop at KTH, Stockholm
Brief Summary of LCG Deployment JP-KEK-CRC-01 JP-KEK-CRC-02 since early 2006. is registered to GOC, is ready to WLCG. Site Role: More stable services based on KEK-1 experiences. Resource and Component: SL or SLC w/ gLite-3.0 later CPU: 48, Storage: ~1TB (w/o HPSS) Full components Supported VOs: belle, apdg, g4med, atlasj, ppj, ilc, calice, dteam, ops and ail • since Nov. 2005. • is registered to GOC, is ready to WLCG • is operated by KEK staffs. • Site Role: • practice for production system JP-KEK-CRC-02. • test use among university groups in Japan. • Resource and Component: • SL-3.0.5 w/ gLite-3.0 later • CPU: 14, Storage: ~1.5TB • FTS, FTA, RB, MON, BDII, LFC, CE, SE • Supported VOs: • belle, apdg, g4med, ppj, dteam, ops, calice, ilc and ail Grid Operations Workshop at KTH, Stockholm
GridRelated Services • We have our own GRID CA • is started on Feb. 2006, and is recognized by LCG. • is accredited by APGRID PMA • http://gridca.kek.jp/ • VO Membership Service • Supported VOs: • apdg is the VO for Asia-Pacific Data Grid. • belle is the VO for Belle experiments. • atlasj is the VO for Atlas experiments in Japan. • g4med is the VO for Geant4 medical application. • PPJ is the VO for the Particle Physics in Japan. • ail is the VO for Associated International Laboratory between Japan and France. • http://voms.kek.jp/ • Local Mirror Service • SL, SLC, LCG, gLite • It takes ~30 minutes to update by using apt-get with CERN or FNAL repositories. • ~3 minutes with KEK repository • http://hepdg.cc.kek.jp/mirror/ • Semi-automatic Installation Service • WNs can be installed semi-automatically by PXE (PrebooteXecution Environment) and kickstart configuration file. • http://hepdg.cc.kek.jp/install/ • Site Portal • http://grid.kek.jp/ Grid Operations Workshop at KTH, Stockholm
People on Grid at KEK/CRC • 7 persons in total • CA • T. Sasaki and Y. Iida • VOMS • Y. Watase and G. Iwai • Site Operation and Security • KEK-0 • G. Iwai • KEK-1 • T. Sasaki, Y. Iida, Y. Watase and G. Iwai • KEK-2 • T. Sasaki, Y. Watase, and G. Iwai • Deployment • Y. Watase, Y. Iida and G. Iwai • Documentation • Y. Watase • Networking • S. Suzuki, S. Yashiro and Y. Iida • Application (SRB, Portal and some Gridified applications) • K. Murakami, Y. Iida and G. Iwai Grid Operations Workshop at KTH, Stockholm
Operation statistics Grid Operations Workshop at KTH, Stockholm
Submitted GGUS Tickets in JFY2006 • Total number of submitted ticket: 28 • KEK-1: 11 • KEK-2: 17 Grid Operations Workshop at KTH, Stockholm
Number of Submitted Jobs in JFY2006 JP-KEK-CRC-01 JP-KEK-CRC-02 Grid Operations Workshop at KTH, Stockholm
Normalized CPU time in JFY2006(kSI2K*hrs) JP-KEK-CRC-01 JP-KEK-CRC-02 Grid Operations Workshop at KTH, Stockholm
Belle Experiment and Accelerator Science Virtual organization Grid Operations Workshop at KTH, Stockholm
VO for the Belle Experiment • Belle VO is federated among 4 countries, 6 institutes, 9 sites. • Japan: Nagoya University and KEK • Taiwan: ASGC and NCU • Australia: University of Melborne • Poland: CYFRONET • Korea University comes up soon. • Started using SRB and LCG • Data distribution service using SRB-DSI • Belle already has a few PBs data in total including 100s TB DST and MC • Bulk file register helps us: Sregister • we do not move any of them • It is too much difficult to export existing data to LCG physically • Benefits both for native SRB users and LCG users • SRB-DSI with LCG is in operation now. KEK Japan CYFRONET Poland Nagoya Univ. Japan ASGC Taiwan NCU Taiwan Melbourne Univ. Australia Grid Operations Workshop at KTH, Stockholm
VO for the Accelerator Science • Domestic supports • Typical case at laboratory: A few staffs, ~10 students and no technician. • Start to monitor them centrally over the VO • PPJ VO is started for the accelerator science in Japan. • Federated among a few universities. • Tohoku Univ., Tsukuba Univ., Kobe Univ., Hiroshima Univ., Nagoya Univ. and KEK. • Usage: • To share resources and experiences among major groups, ILC, KamLand, CDF and ATLAS without depending on experimental projects. Hiroshima IT Grid Operations Workshop at KTH, Stockholm
Conclusion • Tools used in daily grid operations • Semi –automatic installation tools only for WNs • Most of tools are handmade scripts • Monitoring tools, e.g.; SAM and GSTAT are very useful. • GGUS Search and APWIKI are also. • We are testing to audit by using nCircle, vulnerability management system. • Scheduled Interventions • 11 times in JFY2006 • Due to • Software/hardware upgrade and site reconfiguration • Annual maintenance • Replacement of host certificate • Unscheduled interventions • ~10 times/year • Ex) Failed to reconfigure the site, or power cut by thunder. • Domestic supports in Japan • Important mission for KEK. • ~90% of problems are detected by the COD, SAM, GSTAT and nagios. • Our operation on Grid is supported by great efforts by APROC members in ASGC, Taiwan. • We’d like to keep the tighter collaboration with ASGC. Grid Operations Workshop at KTH, Stockholm
Thank you END Grid Operations Workshop at KTH, Stockholm
Pluggable Extension SRB-DSI GridFTP 130.87.104.0/22 LCG with SRB at Belle VO APAN KEK-DMZ GridFTP SuperSINET KEK Firewall SRB KEK-CC Grid LAN KEK-1 130.87.208.0/22 KEK-2 202.13.197.0/24 KEK-FB 130.87.224.0/21 SRB/MCAT 172.22.28.0/24 B-NET HSM NFS New built Grid Operations Workshop at KTH, Stockholm
Points to Cover in Each Presentation • tools used in daily grid operations • what features are missing to make your work easier • examples of the most frequent scheduled interventions at your site • examples of the most frequent unscheduled interventions at your site • points to improve in communication with ROC, other sites, Vos, rest of the world... • How do you plan deployment of updates/new versions so continuous production is not interrupted? • Communication with users: how are you informed about operational problems at your site reported by local/remote users? Mail/GGUS/phone/other? • Correlation of cross-site issues: is the operations meeting enough for this? How do you do it otherwise? • What percentage of real site problems are detected and reported by the COD before you know about them? • usefulness of the following operations bodies/meetings and suggestions to improve them: • COD • your ROC support team • operations meeting Grid Operations Workshop at KTH, Stockholm