1 / 14

Status Report on Tier-1 in Korea

Status Report on Tier-1 in Korea. Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC). Korea Institute of Science and Technology Information Global Science experiment Data hub Center. OUTLINE. Computing Resources Operations Network Conclusion. KISTI GSDC Tier-1 Team. ~ 9 people.

moke
Télécharger la présentation

Status Report on Tier-1 in Korea

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Status Report on Tier-1 in Korea GungwonKang, Sang-Un Ahn andHangjinJang (KISTI GSDC) Korea Institute of Science and Technology Information Global Science experiment Data hub Center

  2. OUTLINE • Computing Resources • Operations • Network • Conclusion 15th CERN-Korea Committee

  3. KISTI GSDC Tier-1 Team ~ 9 people 15th CERN-Korea Committee

  4. Computing Resource Status • 2013 Pledges (CPU): HepSpec06 25,000 • Current HepSpec06: 28,055 • 2,524 Jobs slots available (4 reserved slots for pilot jobs) with H/T enabled • 2013 Pledges (Tape Storage): Tape 1,500 TB • Current Tape capacity: 1,000 TB • Pledges will be met in this year • 2013 Pledges (Disk Storage): Disk 1,000 TB • Current Disk capacity: 966 TB (allocated 1,000 TB but usable space slightly below) 15th CERN-Korea Committee

  5. operations

  6. 3.58% (2013) Jobs Total wall clock hours for ALICE jobs in the last 6 months • Current capacity: 2,524 job slots, 28.1 kHS06 • 84 nodes, 32 (logical) cores per node, 11 HS06/core • Maintenance issues • Worker nodes migration to 10GbE equipped ones • Middleware: EMI-3 migration (end of support to EMI-2 by 30 April) • Delivered full pledges for 2013 KISTI, 3.9 % (Including Tier-2) T1 worker nodes migration to 10GbE equipped ones ~ 2500 ~ 1800 ALICE Central Service Maintenance EMI-3 Migration & Delivery of full pledges ~ 800 Apr 2014 Oct 2013

  7. Site Reliability 15th CERN-Korea Committee

  8. 15th CERN-Korea Committee KISTI Analysis Facility - KIAF • Parallel Analysis Facility based on PROOF • In operation since 2011, ALICE use only • 1 master, 8 worker nodes, 12 cores and 22 TB disk per node • Similar size and utilization as CAF - CERN Analysis Facility

  9. Plans for On-call Service We are planning to prepare for On-call Service. Maybe it has 3 functions of service. • Alarm system • Nagios + e-mail notifications • Implementing SMS plugin + Night Owl shift by private company • Tape system - hardware/software malfunction reported to IBM and third-party company • 24/7support, intervention to be carried out within one day • Ongoing evaluation of monitoring frameworks: e.g. Icinga, Zabbix, etc. • On-call scheme • One week shift cycle with 5-6 personnel • Expecting 1 or 2 calls in a cycle - alarms from batch scheduler and services, WN servicing • From daily monitoring report – detailed action list on services and hardware incidents • Night owl shift • Private company contract – on-site support • If necessary - SMS and e-mail notification to off-site on-duty experts • Supercomputing division at KISTI is running similar system for years 15th CERN-Korea Committee

  10. Network

  11. Internal Network • Internal network for Tier-1 is isolated from the computing centre service network • Done in Oct 2013 - internal network re-structuring (3-week shutdown) • Preparation for upgrade of bandwidth of external network up to 10Gbps • Main switch upgrade: bandwidth up to 2.5 Tbps • HA configuration of private network • Remove bottlenecks to storage • Full 20 Gbps configuration (Incoming/Outgoing) • Replaced all switches by 10 Gbps; done on part of service racks • 1Gbps switches in place for servers with 1Gbps cards • Worker nodes to be upgraded with10 Gb cards • Tape service nodes are being connected to the 10 Gbps switches

  12. External Network • Current Bandwidth to CERN: 2 Gbps • Dedicated link via Daejeon-Chicago-Amsterdam-Geneva • Roadmap for 10 Gbps upgrade presented to WLCG MB and accepted • Working on upgrading bandwidth up to 10 Gbps

  13. LHC OPN • KISTI T1 network (134.75.125.0/24) included into LHC OPN • BGP Peering between Kreonet router @ KISTI and LCG network @ CERN • perfSONAR has been deployed for measuring bandwidth and latency; firewall policy issue persists concerning the ports below 1024 e.g. 80 (http), 443 (https), 843 (bwctl)

  14. Conclusion • KISTI T1 has been approved as a full T1 at the meeting of WLCG Overview Board in Nov. 2013 • The progress of ramping up the capability as a T1 appreciated by ALICE community and a roadmap to 10G network accepted • In Jan, KISTI T1 joined LHC OPN • Over the last 6 months, KISTI T1 has been in “shape-shifting” in terms of network • Core switches replaced (bandwidth: 0.9 Tbps 2.5 Tbps) • Rack switches replaced (bandwidth: 1 Gbps 10 Gbps) • Servers migrated to 10GbE equipped ones

More Related