1 / 24

ASGC Site Report

ASGC Site Report. Jason Shih ASGC/OPS HEPiX Fall 2009 Umea, Sweden. Overview. Fire incident Hardware Network Storage Future remarks. Fire incident – event summary. Damage Analysis: fire was limited at the power room Severe damage of UPS wiring of power system, AHR

Télécharger la présentation

ASGC Site Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ASGC Site Report Jason Shih ASGC/OPS HEPiX Fall 2009 Umea, Sweden

  2. Overview • Fire incident • Hardware • Network • Storage • Future remarks

  3. Fire incident – event summary • Damage Analysis: fire was limited at the power room • Severe damage of UPS • wiring of power system, AHR • Smoke dust pervaded and smudged almost every where, including computing & storage systems • History and Planning • 16:53 Feb. 25 UPS battery burning • 19:50 Feb. 25 Fire extinguishment by Fire department • 10:00 Feb. 26 Fire scene investigation by Fire department • 15:00 Feb 26 ~ Mar 23 DC cleaning, re-partitioning, re-wiring, deoderization, and re-installation • from ceiling to ground under raised floor, from power room to machine room, from power system, air conditioning, fire prevention system to computing system • All facilities moved outside to cleaning • Mar 23 Computing System installation • Mar 23 ~ Apr 9 Recovery of Monitoring, Environment control and Access control system

  4. Fire incident – recovery plan • DC Consultant will review the re-design on Mar. 11, schedule will be revised based on the inspection • Tier1/Tier2 services will be collocated at IDC for 3 months from Mar. 20

  5. Fire incident – review/lessons (I) • DC Infrastructure Standards to comply with • ANSI TIA/EIA • ASHRAE thermal guideline for data processing env. • Guidelines for green data centers are available, e.g., LEED • NFPA: Fire suppression system • Capacity and type of UPS (min. scale) • Vary by the responding time of generators • Adjust rating of all breaks (NFB and ACB) • Location of UPS (open space & outside PR) • Regular maintenance of batteries • Inner resistance measurement

  6. Fire incident – review/lessons (II) • Smoke damage: Fire stopping • Improvement of monitoring system • Re-design the monitoring sys. • Earlier pre-action: consider: VESDA • Emergent response and procedures • Routine Fire drill is indispensable • Disaster Recovery plan is necessary • Other improvement: • PP and H/C aisle splitting • Fiber panels: MDF and FOR • OH cable tray (exist: PWR tray in subfloor)+ Fiber guide • Raised floor grommets

  7. Move out all facilities for cleaning Protect Racks from Dust Container as storage and humidification Ceiling Removal

  8. Fire incident - Tape system • Snapshots of decommissioned tape drives after the incident

  9. DC recovered – mid of May • FOR in area #1 • MDF move to center of DC area • H/C aisle fully split • Plan to replace racks to provide 1100mm depth

  10. IDC Collocation (I) • Site selection and paper processing - one week • Preparation at IDC – one week • 15R + reservation for tape system (6R) • Power (14kW per racks) • cooling (perforated raise floor) • 10G protection SDH STM-64 networking between IDC and ASGC

  11. IDC collocation (II) • Relocation of 50+% computing/storage – one week • 2k job slots (3.2MSI2K), 26 chassis of blade servers • 2.3PB storage (1PB allocated dynamically) • Cabling + setup + reconfiguration – one week

  12. IDC collocation (III) • Facility install complete at Mar 27th • Tape system delay after Apr 9th • Realignment • RMA for faulty parts

  13. T1 performance • 7G peak reach to Amsterdam • 9G peak observed between IDC/ASGC

  14. Network – before May SINet APAN-JP KEK JPIX GE*2 KREONET2 GE GE GE CERNet CSTNet WIDE GE GE HKIX GE 2.5G WL non-protect HARNet GE JP, KDDI Otemachi GE M120 M120 Sinica, Taipei HK, Mega-iAdvantage 100M NUS NCIC - 2.5G(STM-16) SDH AARNet Pacnet IP Transit M320 GE GE TWGate IP Transit 100M SG, KIM CHUNG 622M(STM-4) SDH on APCN2 M20

  15. Network - 2009 SINet APAN-JP KEK JPIX GE*2 KREONET2 GE GE GE CERNet CSTNet WIDE GE GE HKIX GE STM-16 SDH HARNet JP, KDDI Otemachi GE GE M120 M120 Sinica, Taipei HK, Mega-iAdvantage NUS 100M SingAREN 2.5G(STM-16) SDH AARNet Pacnet IP Transit M320 GE GE GE TWGate IP Transit 100M 622M(STM-4) SDH on EAC Singapore, Global Switch M20

  16. ASGC Resource Level Targets • 2008 • 0.5PB expansion of Tape system in Q2 • Meet MOU target mid of Nov. • 1.3MSI2k per rack base on recent E5450 processor. • 2009 • 150 QC blade servers • 2TB per drives for raid subsystem • 42TB net capacity per chassis and 0.75PB in total

  17. Hardware Profile and Selection (I) • CPU: • 2K8 Expansion: 330 blade server provide 3.6KSI2k • 7U height chassis • SMP Xeon E5430 processors, 16GB FB-DIMM • each blade provide 11KSI2k • 2 blade/U density, Web/SOL management • current capacity: 2.4MSI2k • Year end total computing power: ~5.6MSI2k • 22KSI2k/U (24 chassis in 168U)

  18. Tape system • Before incident: • LTO3 * 8 + LTO4 * 4 • 720TB with LTO3 • 530TB with LTO4 • May 2009: • Two loan LOT3 drives • MES: 6 LTO4 drives end of May • Capacity: 1.3PB (old) + 0.8PB (LTO4) • New S54 model introduced • 2K slots with tier model • Upgrade ALMS • Enhanced gripper

  19. iSCSI – 1Gb Roadmap – Host I/F 2009 3U16bay FC-SAS in May, 2U/12 and 4U/24 bay in June 8G FC ( ≈ 800 MB/sec) 4G FC ( ≈ 400 MB/sec) SAS 6G (4-lane ≈ 2400 MB/sec) SAS 3G (4-lane ≈ 1200 MB/sec) iSCSI – 10 Gb U320 - SCSI ( ≈ 320 MB/sec)

  20. Roadmap – Drive I/F 2009 4G FC SAS 6G SAS 3G U320 - SCSI 2.5” SSD (B12F series) SATA-II

  21. Est. Density • 2009 H1 1TB, 1 rack (42U)= 240TB • 2009 H2 2TB, 1 rack (42U)= 480TB • 2010 H1 2TB, 1 rack (42U)= 480TB • 2010 H2 3TB, 1 rack (42U)= 720TB • 2012 5TB…..

  22. Future remarks • DC full restore end of May • Restart run-the-clock operation • Resources relocated fully involved in STEP09 • Facility relocation end of Jun from IDC • New resource expansion end of Jul • Improve DC monitoring

  23. Water mist • Fire suppresion system • Review the implementation of Gas supression system • Consider water mist in power room • Wall cabinet outside data center area

  24. Water mist – design plan

More Related