1 / 17

Kolkata Tier II @ ALICE and Status

Learn about the ALICE Tier-2 Grid Computing Facility at VECC in Kolkata, India, its journey since 2002, and its contributions to the ALICE project. Discover the impressive resources, cooling solution, and network connectivity of this Tier-2 site.

mariocastro
Télécharger la présentation

Kolkata Tier II @ ALICE and Status

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kolkata Tier II @ ALICE and Status Site Name :- Tier-2 Site for the WLCG (World Wide Computing Grid) GOCDB Name:- IN-DAE-VECC-02VO :- ALICE Group:- EHEPAG Unit:- VECCCity:- KOLKATACountry :- INDIA Team:- Subhasis Chattopadhyay Vikas Singhal Prasun Singh Roy T. K. Samanta and S. K. Pal helped in establishing the centre in the initial years. Grid Computing Architecture WLCG Grid based on MONARCH Tier model

  2. Grid Computing Facility at VECCJourney from 2002 to till now Why Tier-2 India’s contribution in 2 main detectors of the ALICE PMD and Muon Arm and large volume of raw, calibrated and simulated data, therefore decided to build TIER-2. VECC, Kolkata is the only Tier-2 for the ALICE. CERN 128Kbps Bandwidth 2002 started with 2 Computers 2

  3. Evolution of Grid Computing Facility and Cooling Solution 2008 2006 2012 - now • Hot and Cool Air is separated via Cold Aisle Containment. • Temperature gradient between Cold and Hot aisle is 5oC. • Power usage effectiveness (PUE) • =Total Facility Power/ • IT Equipment Power • = 1200Units / 816Unit per Day • = 1.47 • Cooling solution reduced cooling power consumption by half. • Management and monitoring of the server, storage is from outside Cold Aisle Containment. 2010 3 2012 Cooling Solution logical diagram

  4. ALICE Job completed @Kolkata Nearly 4000000 jobs successfully completed, at Kolkata Cream Consistently every hour more than 70 Jobs successfully completing during last 6 years. No AMC for any server. Maintaining In-house only 24x7 Operation, 95% availability Vikas Singhal, VECC, INDIA

  5. Kolkata Tier-2 Resources Total :-Computing -> 448 cores (Equivalent 5K HEPSpec 2006 Computing Resources) • DELL Quad Core Blades 28*2*4*2=448(HT) • Due to new hardware arrived few older nodes removed. • Within a month new cores will be under production. • Storage :- 174 TB (78TB SAN Based) • 1 Xrootd Redirector • 1 disk Server SAN based 78TB (IBM) • No warranty for the HP EVA (72TB) • Will move this data if could recover. • EOS :- 96TB (Usable) • 3 Dell PowerEgde R730 48TB each • 1Gbps WAN Network speed (Will increase to 10Gbps) • 10Gbps Backbone Network

  6. More than 2500 HT cores of Computing Resources procured • Procured 12 DELL PowerEdge FX2 Enclosures. • Each contain 4 DELL PowerEdge FC630 servers. • Each server configuration:- • 2 Nos of Intel Xeon E5-2680 v4 2.4 GHz with 14 cores • 8 * 16 GB RDIMM, 2400MT/s • 960 GB of SSD harddisk. • 2 * 10 Gigabit network cards. • Total cores 48*2*14*2=2688 (HT) • Installation is almost complete. • Scientific Linux CERN 6.8 installed • 10G network connected. • Benchmarking is going on. • Approx cost of the equipment = $ 300,000.

  7. 51TFlops of Computing Resources Theoretical Peak Performance Rpeak= CPU speed GHZ * total number of core * operation/cycle Rpeak = 2.4*28*16 gigaflop = 1075.2 gigaflop = 1.0752teraflop (Single Server) Rpeak value for all 48 nodes will be Rpeak = 1.0752*48 Tflops = 51.6096Tflops Preliminary HPL Benchmarking test results:- (Test completed yesterday only) is 4.30471e+04 Gflops or 43.0471 Tflops.[root@wn123 HPL]# tail -20 xhpl_intel64_dynamic_outputs_48nodes_log2.txt T/V                N    NB     P     Q               Time                 Gflops--------------------------------------------------------------------------------WC00C2R2      871012   192    12     8           10233.83            4.30471e+04HPL_pdgesv() start time Mon Oct  9 22:28:12 2017HPL_pdgesv() end time   Tue Oct 10 01:18:46 2017--------------------------------------------------------------------------------||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0023805 ...... PASSED============================================================Finished      4 tests with the following results:              4 tests completed and passed residual checks,              0 tests completed and failed residual checks,              0 tests skipped because of illegal input values.--------------------------------------------------------------------------------End of Tests. Want to know HEP Spec 2006 number for the Intel Xeon E5-2680 v4 2.4 GHz processors with 35 MB of cache.

  8. 10 Gbps 4-10 Gbps 1 Gbps x Gbps Physical Network Connectivity upto Kolkata Upgradation to 10G Redundant Link upto Kolkata-tier2 is about to finish. CERN LHCONE 10 Gbps TEIN-3 Network 4 Gbps/10 Gpbs Shared TIFR Mumbai 10 Gbps 10 Gbps Internet 1 Gbps POP in Mumbai 10 Gbps 10 Gbps Core Network 10 Gbps 10G Switches installed at both VECC and SINP site. SINP Kolkata POP in Kolkata 10 Gbps 1 Gbps 10 Gbps VECC Kolkata 1 Gbps Kolkata Tier-2

  9. 1G Network Utilization by Kolkata Tier-2 during April 2017

  10. Bandwidth Test for Kolkata Tier-2

  11. Backbone Network inside Kolkata Tier-2 Brocade Switches • 10Gb connection to each server, • 40Gb connectivity between the core switches, • Both Fiber and Copper connectivity, • Connection via DAC Cable. • Dual path, Full Redundancy. Vikas Singhal, VECC, INDIA

  12. Restructured Power Distribution • Replaced earlier 3*40KVA UPS due to safety measures. • Procured a new 80KVA UPS and one redundant line via 160 KVA UPS from Computer Division.. • Proper planning done to avoid single point of failure. • Two MCBs boxes installed. • Providing UPS power to MCB Box from 2 different sources. • Every network rack is getting power from two MCBs to avoid SPOF. • Cabling to servers are done in such a way that every server is getting power from both MCB A and MCB B. Thanks to ELECTRICAL Section, VECC for coordinating and performing the entire work.

  13. Grid-Peer Tier-3 Cluster Status • More Load on the CLUSTER as CBM user also utilizing the cluster • 8 Numbers of HP BL675G7 Blade servers each with 4 * AMD Opteron Processor 6380(16 Core) • 3 Number of Dell M610 Blade servers each with 2 * Intel Quad Core E5530 Xeon 2.4GHz CPU 8MB cache and 16GB RAM. • 6 out of 8 HP blade servers are dedicated for non-interactive nodes and rest is being used for CBM work. • 3 Dell blade servers are being used as interactive node. • Extensively used by VECC users and PMD Collaborators, completed more than 35000 jobs successfully in last 3 months. • 75 TB storage, almost filled up. • 75 + active users (across India.) • 45 + active users (in VECC.) • Tape based backup of Tier-3 storage performed twice in a month.

  14. Connectivity between Asian Tiers Not bad increasing day by day. 10/10/2017

  15. Achievements

  16. Future Road Map and Vision • Planning to collaborate with Industry. Exploring Cloud Computing via Industry partnership. (Discussion going on with Microsoft (Brij initiated the same)). • Parallel computing is the only solution for utilizing present computing infrastructure. • Accelerated or Heterogeneous Computing is new evolving field. (Participating at CBM @ FAIR in this direction). • Focus on High Throughput Computing (HTC) using of Accelerates like NVIDIA GPU, AMD, APU, Intel Co-processors. • Spreading knowledge on Parallel and Heterogeneous Computing. • For the huge data resources, using the low cost storage solution based on the EOS CERN. Procuring the low cost storage boxes.

  17. Thank You

More Related