1 / 20

INFN: SC3 activities and SC4 planning Mirco Mazzucato On behalf of Tier1 and INFN SC team

INFN: SC3 activities and SC4 planning Mirco Mazzucato On behalf of Tier1 and INFN SC team CERN, Nov 15 2005. The INFN Grid strategy for SC. Grid middleware services were conceived since the beginning as being of general use and having to satisfy the requirements of other sciences

onaona
Télécharger la présentation

INFN: SC3 activities and SC4 planning Mirco Mazzucato On behalf of Tier1 and INFN SC team

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INFN: SC3 activities and SC4 planning Mirco Mazzucato On behalf of Tier1 and INFN SC team CERN, Nov 15 2005 INFN: SC3 activities and SC4 planning

  2. The INFN Grid strategy for SC • Grid middleware services were conceived since the beginning as being of general use and having to satisfy the requirements of other sciences • Biology, Astrophyisics, Earth Observation…… • Tight collaboration with CERN, as natural coordinator of European Projects for M/W and EU e-Infrastructure developments and other National Projects (UK e-Science…) leveraging from • CERN Tier0 role and natural LCG coordinator • CERN as an EU excellence lab. in S/W production (WEB…) • Strong integration between the National and EU e-Infrastructure (EDG, EGEE/LCG, EGEE-II) • “Same” M/W, same services, integrated operation and management of common services provided by EU funded projects manpower • National development of complementary or missing services but well integrated in EGEE/LCG M/W Service Oriented Architecture • Strong preference for common services compared to specific solution considerd as temporary when duplicating functionalities of general services INFN: SC3 activities and SC4 planning

  3. SC3 Throughput PhaseResults INFN: SC3 activities and SC4 planning

  4. SC3 configuration and results: CNAF (1/4) • Storage • Oct 2005: Data disk • 50 TB (Castor front-end) • WANDisk performance: 125 MB/s (demonstrated, SC3) • SE Castor: Quattor • Oct 2005: Tape • 200 TB (4 9940B + 6 LTO2 drives) • Drives shared with production • WANTape performance: mean sustained ~ 50 MB/s (SC3, throughput phase, July 2005) • Computing • Oct 2005: min 1200 kSI2K, max1550 kSI2K (as the farm is shared) INFN: SC3 activities and SC4 planning

  5. 2x 1Gbps links aggregation SC3 configuration and results: CNAF (2/4) LAN and WAN connectivity GARR 10 Gbps link (November) CNAF Production Network L0 General Internet L2 L1 backdoor (CNAF) 192.135.23.254 1Gbps link 192.135.23/24 n x 1Gbps link T1 SC layout INFN: SC3 activities and SC4 planning

  6. SC3 configuration and results: CNAF (2/3) Network • Oct 2005: • 2 x 1 GEthernet links CNAF  GARR, dedicated to SC traffic to/from CERN • Full capacity saturation in both directions demonstrated in July 2005 with two single memory-to-memory TCP sessions • CERN – CNAF Connection affected by sporadic loss, apparently related to concurrence with production traffic (lossless connectivity measured during August) • Tuning of Tier-1 – Tier-2 connectivity for all the Tier-2 sites (1 GigaEthernet uplink connections) • PROBLEMS and SOLUTIONS: • Catania and Bari: throughput penalties caused by border routers requiring hw upgrades • Pisa: MPLS mis-configuration causing asymmetric throughput  configuration fixed • Torino: buggy Cisco IOS version causing high CPU utilization and packet loss  IOS Upgrade • Nov 2005: • ongoing upgrade to 10 GEthernet, CNAF   GARR, dedicated to SC • policy routing at the GARR access points to grant exclusive access to the CERN – CNAF connection by LHC traffic • Type of connectivity to INFN Tier-2 sites under discussion • Realization of a backup connection Tier-1  Tier-1 (Karlsruhe) – ongoing discussion with GARR and DFN • Ongoing 10 GigaEthernet LAN tests (Intel Pro/10 Gb): • 6.2 Gb/s UDP, 6.0 Gb/s TCP, 1 stream, memory-to-memory (with proper PCI and TCP stack configuration). INFN: SC3 activities and SC4 planning

  7. SC3 configuration and results: CNAF (3/3) • Software • Oct 2005: • SRM/Castor • FTS: ongoing installation of a second server secondo server (installation and configuration with QUATTOR), backend: DB oracle on a different host • LFC: installation on a new host with more reliable configuration (raid5, redundant power supply) • farm middleware: LCG 2.6 • Software in evolution. Good response of developers. Bug fixing on going, need more effort. • Debug of FTS and Phedex(CMS) has required duplication of efforts • Nov 2005, File Transfer Service channels configured: • CNAF   Bari • CNAF   Catania • CNAF   Legnaro • CNAF   Milano • CNAF   Pisa • CNAF   Torino • FZK  CNAF(full testing to be done) 2nd half of July: FTS channel tuning T1-T2s (number of concurrent gridftp sessions, number of parallel streams) • Starting from Dec 2005: • Evaluation of dCache and StoRM (for disk-only SRMs) • Possible upgrade to CASTOR v2 INFN: SC3 activities and SC4 planning

  8. Tier-2 sites in SC3 (Nov 2005) • Torino (ALICE): • FTS, LFC, dCache (LCG 2.6.0) • Storage Space: 2 TBy • Milano (ATLAS): • FTS, LFC, DPM 1.3.7 • Storage space: 5.29 TBy • Pisa (ATLAS/CMS): • FTS, PhEDEx, POOL file cat, PubDB, LFC, DPM 1.3.5 • Storage space: 5 TBy available, 5 TBy expected • Legnaro (CMS): • FTS, PhEDEx, Pool file cat., PubDB, DPM 1.3.7 (1 pool, 80 Gby) • Storage space: 4 TBy • Bari (ATLAS/CMS)i: • FTS, PhEDEx, POOL file cat., PubDB, LFC, dCache, DPM • Storage space: 1.4 TBy available, 4 TBy expected • Catania (ALICE): • DPM and Classic SE (Storage space: 1.8 Tby) • LHCb • CNAF INFN: SC3 activities and SC4 planning

  9. SC3 Service PhaseReport INFN: SC3 activities and SC4 planning

  10. CMS, SC3 Phase 1 report (1/3) • Objective: 10 or 50 TB per Tier-1 and ~5 TB per Tier-2 (source: L.Tuura) INFN: SC3 activities and SC4 planning

  11. CMS, SC3 Phase 1 report (2/3) Source: L.Tuura INFN: SC3 activities and SC4 planning

  12. CMS, SC3 Phase 1 report (3/3) Source: L.Tuura INFN: SC3 activities and SC4 planning

  13. LHCb, report Phase 1 (Data moving) • Less than 1TB stripped DSTs replicated. At INFN most of this data already existed with only a few files missing from the dataset therefore it was only necessary to replicate a small fraction of the files from CERN (source: A. C. Smith) • Tier1-to-Tier1 FTS channel configured (FZK  CNAF) • Configuration of an entire Tier1-to-Tier1 channel matrix under evaluation (for replication of stripped data) INFN: SC3 activities and SC4 planning

  14. ATLAS • Production phase started on Nov 2 • CNAF: some problems experienced in August (power cut, networking and LFC client upgrade) • 5932 files copied and registered at CNAF: • 89 “Failed replication” events • 14 “No replicas found” INFN: SC3 activities and SC4 planning

  15. ALICE • CNAF, and Torino in production • Small failure rates, after the initial debugging phase • Overall job output produced: 5 TBy • Other sites are still being debugged: • CNAF:  proxy renewal service did not exist - being debugged, ALICE part deployed and configured • Catania: LCG configuration problem being addressed, ALICE part deployed and configured • Bari: LCG & ALICE part deployed and configured, being tested with a single job, to be opened in production • CNAF and Catania to be opened to production • (Information source: P.Cerello, Oct 26 2005) CNAF TORINO INFN: SC3 activities and SC4 planning

  16. Tape pool utilization at CNAF (1/2) ALICE ATLAS INFN: SC3 activities and SC4 planning

  17. Tape pool utilization at CNAF (2/2) CMS LHCb INFN: SC3 activities and SC4 planning

  18. SC4 Planning INFN: SC3 activities and SC4 planning

  19. INFN Tier-1: long-term plans (Oct 2006) • SC4: storage and computing resources will be shared with production • Storage • data disk: • additional 400 TB (approx 300 TB for LHC) •  TOTAL: approx 350 TB • Tape: up to 450 TB • Computing • Additional 800 kSI2K •  TOTAL: min 2000 KSI2k, max 2300 KSI2k • Network • 10 GEthernet CNAF  CERN • 10 GEthernet CNAF  INFN Tier-2 and backup connection to Karlsruhe (?) INFN: SC3 activities and SC4 planning

  20. Tier-2 sites at INFN: SC4 • Currently 9 candidate Tier-2 sites: • in some cases one Tier-2 hosting two experiments • Total: 12 Tier2 (4 sites for every experiment – ATLAS, ALICE, CMS) • LHCb Tier-2: CNAF • Ongoing work to understand: • Number of Tier-2 sites actually needed • Availability of local manpower and adequacy of local infrastructures • Capital expanditure: • Tier-2 overall infrastructure • Computing power and storage • Network connectivity: 1 GEthernet for every Tier-2, Avg Guaranteed bandwidth: 80% of link capacity • Only after this preliminary analysis INFN will be ready for the MoU currently under definition at CERN INFN: SC3 activities and SC4 planning

More Related