1 / 21

D Ø Computing Model & Monte Carlo & Data Reprocessing

D Ø Computing Model & Monte Carlo & Data Reprocessing. Gavin Davies Imperial College London. DOSAR Workshop, Sao Paulo, September 2005. Outline. Operational status Globally – continue to do well Shared by recent Run II Computing Review D Ø Computing model

jadzia
Télécharger la présentation

D Ø Computing Model & Monte Carlo & Data Reprocessing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DØ Computing Model & Monte Carlo & Data Reprocessing Gavin Davies Imperial CollegeLondon DOSAR Workshop, Sao Paulo, September 2005

  2. Outline • Operational status • Globally – continue to do well • Shared by recent Run II Computing Review • DØ Computing model • Ongoing, ‘long’ established plan • Production Computing • Monte Carlo • Reprocessing of Run II data • 109 events reprocessed on the grid – largest HEP grid effort • Looking forward • Conclusions DOSAR Workshop, Sept 2005

  3. Snapshot of Current Status • Reconstruction keeping up with data taking • Data handling is performing well • Production computing is off-site and grid based. It continues to grow & work well • Over 75 million Monte Carlo events produced in last year • Run IIa data set being reprocessed on the grid – 109 events • Analysis cpu power has been expanded • Globally doing well • Shared by recent Run II Computing Review DOSAR Workshop, Sept 2005

  4. Computing Model • Started with distributed computing with evolution to automated use of common tools/solutions on the grid (SAM-Grid) for all tasks • Scalable • Not alone – Joint effort with others at FNAL and elsewhere, LHC… • 1997 – Original Plan • All Monte Carlo to be produced off-site • SAM to be used for all data handling, provides a ‘data-grid’ • Now: Monte Carlo and data reprocessing with SAM-Grid • Next: Other production tasks e.g. fixing and then user analysis • Use concept of Regional Centres • DOSAR one of pioneers • Builds local expertise DOSAR Workshop, Sept 2005

  5. Reconstruction Release • Periodically update version of reconstruction code • As develop new / more refined algorithms • As get better understanding of detector • Frequency of releases decreases with time • One major release in last year – p17 • Basis for current Monte Carlo (MC) & data reprocessing • Benefits of p17 • Reco speed-up • Full calorimeter calibration • Fuller description of detector material • Use of zero-bias overlay for MC • (More details: http://cdinternal.fnal.gov/RUNIIRev/runIIMP05.asp) DOSAR Workshop, Sept 2005

  6. Data Handling - SAM • SAM continues to perform well, providing a data-grid • 50 SAM sites worldwide • Over 2.5 PB (50B events) consumed in the last year • Up to 300 TB moved per month • Larger SAM cache solved tape access issues • Continued success of SAM shifters • Often remote collaborators • Form 1st line of defense • SAMTV monitors SAM & SAM stations http://d0db-prd.fnal.gov/sm_local/SamAtAGlance/ DOSAR Workshop, Sept 2005

  7. SAMGrid More than 10 DØ execution sites http://samgrid.fnal.gov:8080/ SAM – data handling JIM – job submission & monitoring SAM + JIM  SAM-Grid http://samgrid.fnal.gov:8080/list_of_schedulers.php http://samgrid.fnal.gov:8080/list_of_resources.php DOSAR Workshop, Sept 2005

  8. Remote Production Activities – Monte Carlo - I • Over 75M events produced in last year, at more than 10 sites • More than double last year’s production • Vast majority on shared sites • DOSAR major part of this • SAM-Grid introduced in spring 04, becoming the default • Based on request system and jobmanager-mc_runjob • MC software package retrieved via SAMo way, inc central farm • Average production efficiency ~90% • Average inefficiency due to grid infrastructure ~1-5% • http://www-d0.fnal.gov/computing/grid/deployment-issues.html • Continued move to common tools • DOSAR sites continue move to SAMGrid from McFarm From 04 DOSAR Workshop, Sept 2005

  9. Remote Production Activities – Monte Carlo - II • Beyond just ‘shared’ resources • More than 17M events produced ‘directly’ on LCG via submission from Nikhef • Good example of remote site driving the ‘development’ • Similar momentum building on/for OSG • Two good site examples within p17 reprocessing DOSAR Workshop, Sept 2005

  10. Remote Production Activities – Reprocessing - I • After significant improvements to reconstruction, reprocess old data • P14 Winter 2003/04 • 500M events, 100M remotely, from DST • Based around mc_runjob • Distributed computing rather than Grid • P17 End march  ~Oct • x 10 larger ie 1000M events, 250TB • Basically all remote • From raw ie use of db proxy servers • SAM-Grid as default (using mc_runjob) • 3200 1GHz PIIIs for 6 months • Massive activity - largest grid activity in HEP http://www-d0.fnal.gov/computing/reprocessing/p17/ DOSAR Workshop, Sept 2005

  11. Reprocessing - II Grid jobs spawns many batch jobs “Production” “Merging” DOSAR Workshop, Sept 2005

  12. Reprocessing -III • SAMGrid provides • Common environment & operation scripts at each site • Effective book-keeping • SAM avoids data duplication + defines recovery jobs • JIM’s XML-DB used to ease bug tracing • Tough deploying a product, under evolution with limited manpower to new sites (we are a running experiment) • Very significant improvements in JIM (scalability) during this period • Certification of sites - Need to check • SAMGrid vs usual production • Remote sites vs central site • Merged vs unmerged files FNAL vs SPRACE DOSAR Workshop, Sept 2005

  13. ( = production speed in M events / day) ( = number batch jobs completing successfully) Reprocessing - IV • Monitoring (illustration) • Overall efficiency, speed or by site. • Status – into the “end-game” • Data sets all allocated, moving to ‘cleaning-up’ • Must now push on the Monte Carlo http://samgrid.fnal.gov:8080/cgi-bin/plot_efficiency.cgi ~855 Mevents done DOSAR Workshop, Sept 2005

  14. SAM-Grid Interoperability • Need access to greater resources as data sets grow • Ongoing programme on LCG and OSG interoperability • Step 1 (co-existence) – use shared resources with SAM-Grid head-node • Widely done for both Reprocessing and MC • OSG co-existence shown for data reprocessing • Step 2 – SAMGrid-LCG interface • SAM does data handling & JIM job submission • Basically forwarding mechanism • Prototype established at IN2P3/Wuppertal • Extending to production level • OSG activity increasing – build on LCG experience • Team work between core developers / sites DOSAR Workshop, Sept 2005

  15. Looking Forward • Increased data sets require increased resources for MC, repro etc • Route to these is increased use of grid and common tools • Have an ongoing joint program, but work to do.. • Continue development of SAM-Grid • Automated production job submission by shifters • Deployment team • Bring in new sites in manpower efficient manner • ‘Benefit’ of a new site goes well beyond a ‘cpu’ count – we appreciate / value this. • Full interoperability • Ability to access efficiently all shared resources • Additional resources for above recommended by Taskforce DOSAR Workshop, Sept 2005

  16. Conclusions • Computing model continues to be successful • Based around grid-like computing, using common tools • Key part of this is the production computing – MC and reprocessing • Significant advances this year: • Continued migration to common tools • Progress on interoperability, both LCG and OSG • Two reprocessing sites operating under OSG • P17 reprocessing – a tremendous success • Strongly praised by Review Committee • DOSAR major part of this • More ‘general’ contribution also strongly acknowledged. • Thank you • Let’s all keep up the good work DOSAR Workshop, Sept 2005

  17. Back-up DOSAR Workshop, Sept 2005

  18. Terms • Tevatron • Approx equiv challenge to LHC in “today’s” money • Running experiments • SAM (Sequential Access to Metadata) • Well developed metadata and distributed data replication system • Originally developed by DØ & FNAL-CD • JIM (Job Information and Monitoring) • handles job submission and monitoring (all but data handling) • SAM + JIM →SAM-Grid – computational grid • Tools • Runjob - Handles job workflow management • dØtools – User interface for job submission • dØrte - Specification of runtime needs DOSAR Workshop, Sept 2005

  19. Reminder of Data Flow • Data acquisition (raw data in evpack format) • Currently limited to 50 Hz Level-3 accept rate • Request increase to 100 Hz, as planned for Run IIb – see later • Reconstruction (tmb/DST in evpack format) • Additional information in tmb → tmb++ (DST format stopped) • Sufficient for ‘complex’ corrections, inc track fitting • Fixing (tmb in evpack format) • Improvements / corrections coming after cut of production release • Centrally performed • Skimming (tmb in evpack format) • Centralised event streaming based on reconstructed physics objects • Selection procedures regularly improved • Analysis (out: root histogram) • Common root-based Analysis Format (CAF) introduced in last year • tmb format remains DOSAR Workshop, Sept 2005

  20. Remote Production Activities – Monte Carlo DOSAR Workshop, Sept 2005

  21. The Good and Bad of the Grid • Only viable way to go… • Increase in resources (cpu and potentially manpower) • Work with, not against, LHC • Still limited BUT • Need to conform to standards – dependence on others.. • Long term solutions must be favoured over short term idiosyncratic convenience • Or won’t be able to maintain adequate resources. • Must maintain production level service (papers), while increasing functionality • As transparent as possible to non-expert DOSAR Workshop, Sept 2005

More Related