1 / 30

LHC Data Challenges and Physics Analysis

LHC Data Challenges and Physics Analysis. Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005. Overview. I will concentrate on ATLAS, with some CMS ATLAS Computing timeline ATLAS Data Challenge 2 CMS DC04 The LHC Service Challenges Distributed Analysis PanDA

lynne
Télécharger la présentation

LHC Data Challenges and Physics Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

  2. Overview • I will concentrate on ATLAS, with some CMS • ATLAS Computing timeline • ATLAS Data Challenge 2 • CMS DC04 • The LHC Service Challenges • Distributed Analysis • PanDA • ATLAS Computing System Commissioning J. Shank VI DOSAR Workshop 16 Sept. 2005

  3. 2003 POOL/SEAL release (done) ATLAS release 7 (with POOL persistency) (done) LCG-1 deployment (done) ATLAS complete Geant4 validation (done) ATLAS release 8 (done) DC2 Phase 1: simulation production (done) DC2 Phase 2: intensive reconstruction (only partially done…) Combined test beams (barrel wedge) (done) Computing Model paper (done) Computing Memoranda of Understanding (ready for signatures) ATLAS Computing TDR and LCG TDR (done) Computing System Commissioning Physics Readiness Report Start cosmic ray run GO! 2004 2005 2006 2007 ATLAS Computing Timeline J. Shank VI DOSAR Workshop 16 Sept. 2005

  4. ATLAS Production Runs (2004-2005) • Over 400 physicists attended Rome workshop, 100 papers presented based on the data produced during DC2 and Rome production • Production during DC2 and Rome established a hardened Grid3 infrastructure benefiting all participants in Grid3 J. Shank VI DOSAR Workshop 16 Sept. 2005

  5. Rome Grid ProductionSuccessful Job Count at 83 ATLAS sites Southwest T2 BNL T1 Boston T2 Midwest T2 J. Shank VI DOSAR Workshop 16 Sept. 2005

  6. U.S. Grid Production (Rome/DC2 combined) UBuf Other US sites (7) 2% 4% UCSD UM 1% PSU 3% 3% (3 sites) Southwest T2 FNAL 20 different sites used in the U.S. 24% 4% PDSF 4% Midwest T2 ATLAS Tier 2’s played dominant role 13% (2 sites) BNL T1 22% Boston T2 20% J. Shank VI DOSAR Workshop 16 Sept. 2005

  7. ATLAS Production on Grid3 <Capone Jobs/day> = 350 Max # jobs/day = 1020 US ATLAS dominated all other VOs in use of Grid3 2004 2005 J. Shank VI DOSAR Workshop 16 Sept. 2005

  8. CMS Data and Service Challenges • CMS is organizing a series of Data Challenges and participating in the LCG service challenges • Each is a step in the preparation for the start LHC data taking • Data Challenge 2004 (DC04) was the first opportunity to test all the components of reconstruction, data transfer and registration in a real-time environment • Goal was for 25Hz of reconstruction and transfer of events to Tier-1 centers for archiving • First of the LCG Service Challenges started in FY05 • Transfer data from disk to disk from CERN to Tier-1 centers • Second LCG Service Challenge • Transfer data using • Service Challenge 3 is in progress • Use experiment services to demonstrate the functionality of the Tier-0, Tier-1 and Tier-2 services simultaneously • Transfer, Publishing and Verify Consistency of data, Run example analysis jobs against transferred data

  9. J. Shank VI DOSAR Workshop 16 Sept. 2005

  10. J. Shank VI DOSAR Workshop 16 Sept. 2005

  11. J. Shank VI DOSAR Workshop 16 Sept. 2005

  12. 2005 2006 2007 2008 first physics cosmics first beams LHC Service Operation Service Challenges – ramp up to LHC start-up June05 - Technical Design Report Sep05 - SC3 Service Phase May06 – SC4 Service Phase Sep06 – Initial LHC Service in stable operation Apr07 – LHC Service commissioned SC2 SC3 SC4 Full physics run SC2 – Reliable data transfer (disk-network-disk) – 5 Tier-1s, aggregate 500 MB/sec sustained at CERN SC3 –Reliable base service – most Tier-1s, some Tier-2s – basic experiment software chain – grid data throughput 500 MB/sec, including mass storage (~25% of the nominal final throughput for the proton period) SC4 –All Tier-1s, major Tier-2s – capable of supporting full experiment software chain inc. analysis – sustain nominal final grid data throughput LHC Service in Operation –September 2006 – ramp up to full operational capacity by April 2007 – capable of handling twice the nominal data throughput J. Shank VI DOSAR Workshop 16 Sept. 2005

  13. Service Challenge 1 • SC1 was an extremely valuable system integration exercise • It showed that the currently deployed system has a high degree of usability. • SC1 demonstrated a 10x higher throughput (25 TB/day WAN) than prior use in fairly realistic deployment [c.f. CDF LAN rate is ~30 TB/day] • Many problems exposed due to high number of transfers and high rate. Problems were fixed, or new features added or system redesigned, before proceeding with tests • Rate results: • Using a fully integrated system, rate - 300 MB/sec, CERN to FNAL, disk to disk • Using dcache-java-class gridftp script - 500 MB/sec, no disk at FNAL • Using dcache-java-class gridftp script - 400 MB/sec, to dCache disks • 20 parallel streams per transfer, each with 2 MB buffers is optimal tune • These transfer rates were only possible using the Starlight research networks.

  14. Service Challenge 1 • Data written in SC1 by srmcp transfers in November 2004.

  15. Service Challenge II • Continue the goal of making transfers robust and reliable, understand and fix problems rather than just restart • Only use SRM managed transfers to FNAL dCache pools using 3rd party gridftp transfers. No special/contrived transfer scripts. • Continue to use the deployed CMS dCache infrastructure and the Starlight network links. Real data transfers have to coexist with users, service challenge should do so as well. This exposed many bugs in SC1. • Sustain 50 MB/s to tape from CERN to FNAL • Transfer ~10 MB/s of user data from Castor to tape at FNAL ~5 tapes/day • Transfer 40 MB/s of fake data from CERN OPLAPRO lab to FNAL tape, but recycle tapes quickly • Use PhEDEx for both user and fake data transfers - Exercising PhEDEx and making it a robust tool is a goal of SC2 • Plan to participate in of 500 MB/s SRM managed transfers to CMS's resilient or volatile pools over the Starlight network links.

  16. Service Challenge II • USCMS Tier 1 to Tier 2 Complement of SC2 • US CMS Tier 2 sites want some of the data we are transferring. Plan on delivering these data sets to them as part of this challenge also using PhEDEx • Initially expect low rate to each, to 3-4 Tier 2 sites • End-to-end functionality test of many components • Rate can grow as Tier 2 can accept data • UF, UCSD and Caltech Tier 2s already have resilient dCache + SRM deployed. Each site is installing PhEDEx daemons. • Will use PhEDEx for these transfer as well. • Tier 2 sites do not have MSS, only disk cache, relatively cheap disk. The Tier 1 to Tier 2 operations are being investigated.

  17. US ATLAS Tier 1 SC3 Performance • Plots and statistics taken from • BNL: Zhao and Yu • CERN: Casey and Shiers • Primary members of BNL team • Zhao, Liu, Deng, Yu, Popescu

  18. Service Challenge 3 (Throughput Phase) • Primary goals: • Sustained 150MB/s disk (T0) – disk (T1) transfers • Sustained 60MB/s disk (T0) – tape (T1) transfers • Secondary goal: • A few named T2 sites (T2 <=> T1) transfers at a few MB/s • Participating US ATLAS Tier 2 Sites • Boston University • University of Chicago • Indiana University • University of Texas Arlington

  19. SC3 Services (and some issues) • Storage Element – US ATLAS production dCache system • dCache/SRM (V1.6.5-2, with SRM 1.1 interface) • Total 332 nodes with about 170 TB disks • Multiple GridFTP, SRM, and dCap doors • Network to CERN • Network for dCache at 2 x 1Gb/sec => OC48 (2.5 Gb/sec) WAN • Shared link to CERN with Round Trip Time: >140 ms • RTT for European sites to CERN: ~ 20ms. • Occasional packet losses observed along BNL-CERN path which limited single stream throughput • 1.5 Gb/s aggregated bandwidth observed by iperf with 160 TCP streams.

  20. Daily Disk to Disk Transfer Plot

  21. Disk to Disk Transfer Rates

  22. Disk to Disk Transfer Plot

  23. Disk to Disk Transfer Plots Castor2 LSF plug-in problem Data routed through GridFTP doors

  24. Data Transfer Status • Transfer component parameter settings influence successful transfer completions (timeouts, etc.) • 150 MB/sec rate achieved for one hour with large numbers (> 50) of parallel file transfers. CERN FTS original limit of 50 files per channel was not enough to fill CERN BNL data channel

  25. Disk (T0) to Tape (T1) Transfer Plot • Sustained operation at target 60 MB/sec was achieved but glitches, primarily with tape HSM (HPSS), resulted in periods of stopped or reduced transfer

  26. Aggregate Disk to Disk Tier 2 Transfers • While average aggregate transfer to 4 sites at ~16 MB/sec was achieved (~4 MB/sec per site, meeting goal) Tier 2 availability limited consistency

  27. PanDA • US ATLAS Production and Distributed Analysis • Based on experience of DC2: redesign of job submission on the grid • Very actively being worked on now, with aggressive schedule • Integrates components that were separate in DC2 • Distributed Data Management • Production (job submission/Workload management) • Distributed Analysis J. Shank VI DOSAR Workshop 16 Sept. 2005

  28. J. Shank VI DOSAR Workshop 16 Sept. 2005

  29. J. Shank VI DOSAR Workshop 16 Sept. 2005

  30. ATLAS Computing System Commissioning • Starts Early 2006 • Will test complete end-end ATLAS computing infrastructure • Including some key components missing from DC2: • Scale: Production comparable to DC2 J. Shank VI DOSAR Workshop 16 Sept. 2005

More Related