1 / 55

Spring 2002 CMS Monte Carlo Production: What ? How ? What Next ?

This seminar discusses the Monte Carlo production process for the Spring 2002 CMS physics community and the challenges faced in data simulation and analysis.

ajaeger
Télécharger la présentation

Spring 2002 CMS Monte Carlo Production: What ? How ? What Next ?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spring 2002 CMS Monte Carlo Production: What ? How ? What Next ? Véronique Lefébure (CERN-HIP)CERN-IT SeminarThe 25th of September 2002

  2. Content: “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  3. Introduction: CMS On-line System • Multi-level trigger • Filter out background • Reduce data volume 40 MHz (1000 TB/sec) Level 1 Trigger 75 KHz (50 GB/sec) Level 2 Trigger 5 KHz(5 GB/sec) Level 3 Trigger 100 Hz (100 MB/sec) Data Recording & Offline Analysis “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  4. Data Simulation Needs • Spring 2002 Production for the CMS Physics Community: • need a large amount of simulated data in order to prepare the CMS DAQ TDR document: “Data Acquisition Technical Design Report” due for end of 2002 • need the most up-to-date physics software to be used • need the data before June 2002 CMS week “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  5. Monte Carlo Production Steps • The full Production Chain consists of 4 steps: • 3 Logical Monte Carlo Simulation Steps: • Generation • Simulation • Digitisation • 1 Reconstruction and Analysis Step • Production was performed step by stepfor many different p-p physics channel RAW data as produced by the real detector Stored in Objectivity/DB “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  6. p Monte Carlo Production Steps1) Generation Primary interactionsin vacuum of beam-pipe • Generation of one p-p interaction at a time • for a Selected physics channel • In reality: ~4 or 20 interactions per beam-crossing depending on the beam luminosity • (2.1033 or 1034cm-2 s-1)i.e. interactions are superimposed: “pile-up” p “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  7. Monte Carlo Production Steps2) Simulation Secondary interactionsin detector material and magnetic field • Individual Hits: • Crossing points • Energy deposition • Time of flight • In reality: one beam-crossing every 25 ns<< time of flight and electric signal development • i.e. superimposition of signals from particles from different beam-crossings: “pile-up” “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  8. Particle time of flight & • Electrical signal development : Monte Carlo Production Steps3) Digitisation Response of Sensitivedetector elements, taking into account the two sourcesof Pile-Up • 4 or 20 interactionsper beam-crossings • Beam-crossings: [-5,+3] • For 1 Signal p-p event of 1 MB • We have 70 MB of Pile-up events @1034 cm-2 s-1 “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  9. Monte Carlo Production Steps4) Reconstruction and Analysis Higher level physicsReconstruction andHistograming • Level-1 trigger Filtering • Track, clusters, vertices Reconstruction • First-pass physics Analysis • Histograming “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  10. p Physics Applications “Generation” p “Simulation” “ooHit formatting” “Digitisation” “Reconstructionand Analysis” “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  11. More Production Steps • Filtering (Level-1 trigger, …) • Add digits (eg. First calorimeter digits, then Tracker after filtering) • Cloning of ooHits and/or Digis (smaller collection of data to handle, less staging at analysis time) • Re-digitisation with different algorithms or parameters “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  12. Resource Constraints • Long CMSIM jobs: can take 2 days and more • RAM : > 512 MB for dual processors (ORCA) • Redhat 6.1(.1) for Objectivity/DB license • Data server: • ~80 GB of Pile-Up events (re-used, otherwise 300TB!) • Typically 1 server per 12 CPUs • Disk space: size of one typical dataset @ 1034: 50K events (1MB fz + 1MB oohits + 4 MB digis)/event= 300 GB • Lockserver, AMS server:number of file handles may reach ~3000 “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  13. Job Complexity • Generation and Simulation jobs: easy part • ORCA-COBRA jobs: more tricky : • Closely-coupled jobs • Shared federation/lockserver, output server, AMS • ~ 5 jobs write in parallel to 1 DB • 1 job may populate many DBs (~10) • One stale lock can bring everything to a halt • Massive I/O system @ 1034 • ~100 jobs in parallel • Input = ~70 MB pile-up events per 1 MB signal event, 1 event/minute = 1MB/sec/job • Output = 4 MB/minute/event/job • Not yet fully robust physics software: need to recover from crashes and to spot infinite loops “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  14. How Much Data ? • Generation/Simulation: • 4 months • 6 M events = 150 Physics channels • ORCA production: • 2 months • 19000 files = 500 Collections = 20 TB NoPU: 2.5M, 2x1033PU:4.4M, 1034PU: 3.8M, filter: 2.9M • 300 TB of pile-up movement on the LAN • 100 000 jobs, 45 years CPU (wall-clock) • More than 10 TB traveled on the WAN • Production completed just on time Successful Production at a regular global rate ! “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  15. CMSIM 6 million events 1.2 seconds per event for 4 months Feb. 8th June 6th “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  16. 2x1033PU 4 million events 1.2 seconds per event, 2 months April 12th June 6th “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  17. 1034PU 3.5 million events 1.4 seconds per event, 2 months April 10th June 6th “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  18. Physics Results Data is usedfor physics studies, not only for computing performance studies “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  19. How ? • Production • Distribution • Coordination • Production Tool Suite • Success and Difficulties “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  20. World-wide Distributed Production CMS Production Regional Centre “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  21. World-wide Distributed Production • 11Regional Centres (RC)> 20 sites in USA, Europe, Russia~ 1000 CPUsBristol/RAL (UK), Caltech, CERN, Fermilab, Imperial College (UK), IN2P3-Lyon, INFN (Bari, Catania, Bologna, Firenze, Legnaro, Padova, Perugia, Pisa, Roma, Torino), Moscow (ITEP, JINR, SINP MSU, IHEP), UCSD(San Diego), UFL (Florida), Wisconsin; Note: Still more sites joining (RICE, Korea, Karlsruhe, Pakistan, Spain,Greece, … ) • > 30Production OperatorsMaria Damato, Alessandra Fanfani, Daniele Bonacorsi, Catherine MacKay, Dave Newbold, Suresh Singh, Vladimir Litvine, Salavatore Costa, Julia Andreeva, Tony Wildish, Veronique Lefebure, Greg Graham, Shafqat Aziz, Nicolo Magini, Olga Kodolova, David Colling, Philip Lewis, Claude Charlot, Philippe Mine, Giovanni Organtini, Nicola Amapane, Victor Kolosov, Elena Tikhonenko, Massimo Biasotto, Stefano Lacaprara, Alexander Kryukov, Nikolai Kruglov, Leonello Servoli, Livio Fano, Simone Gennai, Ian Fisk, Dimitri Bourilkov, Jorge Rodriguez, Pamela Chimney, Shridara Dasu, Iyer Radhakrishna, Wesley Smith,plus probably many more persons in ‘the shadow’ ! • > 20 Physicists as Production “Requestors” “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  22. Coordination Issues • Physicists side • Handle four Physics groups • Check uniqueness of requests • Check number of requested events is reasonable • Take care of requests priorities • Producers side • Deploy and support production tools • Distribute physics executables • Distribute adequately requests to RCs • Insure uniqueness of produced data • Track progress of data production and transfer “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  23. Coordination Means • Physicists side: • 1 Coordinator per Physics group • 1 Coordinator for the 4 Physics groups • Meetings • Use of MySQL CMS DB for recording and managing the production requests (“RefDB”) • Producers side: • 1 Production Manager • 1 Production Coordinator in contact with the Physics Coordinators • 1 or 2 ContactPersons per Regional Centre • Meetings and mailing list • Use of MySQL CMS DB for assigning production requests to Regional Centres and progress tracking (“RefDB”) • Pre-allocation of run numbers, random seeds, DBIDs • Automatic file naming provided by “RefDB” “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  24. RefDB Central Reference Database Production Requests: • Submission forms for each production step • List of recorded Requests • Modification/Correction of submitted Requests Production Assignments: • Selection of a set of Requests for Assignment to an RC • Re-assignment of a Request to another RC or production site • List and Status of Assignments “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  25. RefDB Central Reference Database Meta Data catalogue : • Browse Datasets according to : • Physics Channel • Software Version • … • Get Production Status • Get Data Location • Get Input Parameters “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  26. How ? • Production • Distribution • Coordination • Production Tool Suite • Success and Difficulties “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  27. Interface Web Interface for Production Requests Interface Web Interface for Browsing of Metadata & Data Location Interface Interface Plus: =“DAR”; = “Tony’s scripts”; Executables Distribution Data Transfer Tools Data Storage Production Tools: Spring02 Components “IMPALA” Job Scripts Generator Central Input Parameters DB Monitoring Schema & Scripts “RefDB” “BOSS” Local Job Monitoring DB Central Output Metadata DB Job Scheduler “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  28. DAR“Distribution After Release” • CMS softwaredistribution tool • allows to create and install the binaries • Distribution tar files published at FNAL and at CERN • Local installation: dar -i Distribution_Tar_FileInstallation_Directory • Used for distribution of ALL physics executables and Geometry file “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  29. BOSS“Batch Object Submission System” • tool for job monitoring and book-keeping developed by CMS • not a job scheduler, but can be interfaced with any scheduler : • LSF (CERN, INFN) • PBS (Bristol, Caltech, UFL, Imperial College, INFN) • FBSNG (Fermilab) • Condor (INFN, Wisconsin) • Uses a database (MySQL) “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  30. BOSS • User registers a scheduler: • Scripts for job submission, deletion and query (DB blobs) • User registers a job type: • Schema for the information to be monitored (new DB table) • Algorithms to retrieve the information from the job (DB blobs) • User submits jobs of a defined type: • A new entry is created for the job in the BOSS database tables • The running job fetches the user monitoring programs and updates the BOSS database “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  31. STDOUT OUT pipe BOSS DB jobExecuter TEE USER LOG STDIN TEE pipe Filter pipe TEE ERR pipe MonitoringAlgorithm STDERR BOSS “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  32. BOSS for Spring02 Production BOSS Job Type Registration components Job Type Table cmkin.schema , preprocess, runtimeprocess , postprocess KIN Generation cmsim.schema , preprocess, runtimeprocess , postprocess SIM Simulation oohit.schema , preprocess, runtimeprocess , postprocess OOHit OOHit Digitisation oodigi.schema , preprocess, runtimeprocess , postprocess OODigi “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  33. From BOSS to RefDB: “Summary scripts” • Updating RefDB with current status of assignment progress • Book-keeping of the monitored values • Checking of uniqueness of generation and simulation run numbers and random seeds • Warning for duplicate runs • Warning for missing or incomplete runs “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  34. Data Validation Scripts • After storage of the data: Final Validation at the Meta Data level • Basically, checks that warnings given by the ‘summary’ scripts have been corrected • Correct number of events • No duplicates • Closure of DB files (COBRA sense of it: no more data will be written to that DB file) • All DB files of a Collection are attached to the Federation “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  35. “IMPALA” Interface Web Interface for Production Requests Interface Web Interface for Browsing of Metadata & Data Location Interface Interface Plus: =“DAR”; = “Tony’s scripts”; Executables Distribution Data Transfer Tools Data Storage IMPALA Job Scripts Generator Central Input Parameters DB Monitoring Schema & Scripts “RefDB” “BOSS” Local Job Monitoring DB Central Output Metadata DB Job Scheduler “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  36. IMPALA“Intelligent Monte Carlo Production Local Actuator” • Automated script generation tool developed by CMS for MC Production • Job splitting: 50 000 events = 100 jobs of 500 events • Interfaces defined for • Parameter Handling • Input source discovery and enumeration • Tracking (“declared, created, submitted, running, done, problems, logs”) • Job Submission “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  37. Physicist’sBrain RefDB Request RefDB Assignment Send Request Assign Request to RC IMPALA “Declare”:fetch parameters from RefDB RC IMPALA “Create” Export Data IMPALA “Submit” Dataproduction Data Farm MetaData BOSS pre/runtime/post process BOSS DB Close DBs & Invalidate Bad runs (MDeamon) IMPALA Summary scripts RefDB Run table, Assignment Done IMPALA IMPALA Tracking/Production files IMPALA Tracking/Batch files “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  38. IMPALA: Configuration • Executable location (DAR file) • Output data location(Boot file for the Objectivity/DB federation, output disk, …) • BOSS (or Scheduler) installation location • Local functions(CopyLogFiles, StageIn, StageOut, …) “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  39. Data Transfer(“Tony’s scripts”) • Transfer tool developed by CMS: “Tony’s scripts” • For CERN/Europe • Many US sites use GDMP (Grid) and globus-url-copy • Simple HTTP server publishes list of files • Files on disk (‘find’) or on tape (flat list) • Client searches list for new files • Compares to list of files already retrieved, selects by pattern-matching (to select datasets) • Client asks server to push n files • ‘DBServer’ pushes files in m parallel streams • using designated copy agent: scp, bbcp, rfcp “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  40. Spring02 Transferred Data • To CERN: 3 or 4 exporters in parallel, 7 TB in total • To FNAL: 5TB • Sustained rate network-to-disk higher than sustained rate disk-to-tape “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  41. Data Storage • CASTOR (CERN) • ENSTORE (Fermilab) • Basic tape system (RAL) “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  42. Success and Difficulties • Coordination • Farm Setup • Running Jobs • Data Transfer • Data Storage and Publication “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  43. Use of a Central Reference DB “RefDB” Uniform format of input parameter files NEW Storage and index of parameter files NEW Automatic retrieval of the parameters by IMPALA NEW Tracking of the global CMS production rateNEW Test-assignments for validation of software installation NEW Where GRID tools can help us: Assignment of Requests to RCs is still done by hand Need of a CMS-wide Resource Monitoring System Update of RefDB has to be done by hand Should be automated and incorporated in the Job Monitoring System Success and Difficulties: Coordination “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  44. Success and Difficulties: Farm Setup • We have a Production Tool SuiteNEW • But a lot to learn the first time • At system level (MySQL, Disk servers configuration for Pile-Up, AMS & Lockserver , …) • At the software level (test-assignments to play) • Heavy support task: rapidly evolving production software: new releases, bug fixes(but excellent team spirit) • Different Farm configurations: not possible to test the tools for all (Different job schedulers, MSS or not, distributed or central disks, shared or dedicated CPUs, firewalls or no, data servers on CPU nodes or not,…) • Where GRID tools can help us: • ‘installation in one command’ toolkit “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  45. Success and Difficulties: Running Jobs • ORCA Digitisation Job Resume System NEW • Highly helpful (~10% of failure, jobs can now be easily resumed) • Still need more robustness in the user analysis part of ORCA • Invalidation of bad runs to be automated • Objectivity/DB “readonly” option NEW • Much less locking problems than before • System problem recovery: • Cleaning of stale Objectivity/DB locks • 2GB file size limit to be controlled on Solaris disk (CERN) • Network failure (no more disk failure) • Disk space • Scaling problems in the way we use BOSS • Where GRID tools can help us: • Farm Monitoring System , with discovery of crash reason and action for recovering “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  46. Success and Difficulties: Data Transfer • We have transfer tools: “Tony’s scripts” and GDMP • Much more data movement than beforeover half the data has traveled on the WAN • still problems to be handled by hand: • Transfer interrupted (time limit) • Data corruption • Disk space limitation • Missing files: Datasets spread over up to 500 files for one collection (typically 100 files) but we must have every file before analysis can start safely • Where GRID tools can help us: • Replica Manager “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  47. Success and Difficulties: Storage & Publication • Validation scripts for Dataset integrity check NEW • Should be part of the data transfer tool • Tape failures (RAL) • Archive failure in Castor: rare but difficult to spot • Stage in time to Castor can be very long for few files (>1hour) • Interaction between Castor and (multiple) analyses not well understood  needs studying “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  48. Success and Difficulties: Summary • Major improvements in the physics code and in the production machinery with respect to previous years • ORCA Resume System • Use of RefDB and BOSS: made better automation and book-keeping possible • Our CMS production tools can be improved: more automation • GRID tools may help to have it even better: • Tool for Installation/Configuration of Production Tools • Resource Monitoring System • Replica Manager • Anything that can help reducing the manpower needs • Data access for user analysis has to be improved • Problems have been addressed by the Production team and the Production Tools Review team “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  49. LHC1E34 Average slope =x2.5/year LHC2E33 DC06 Readiness DC05 LCG TDR DC04Physics TDR DAQTDR More and Faster • 1999: 1TB – 1 month – 1 person • 2000-2001: 27 TB – 12 months – 30 persons • 2002: 20 TB – 2 months – 30 persons • 2003: 175 TB – 6 months – <30 persons “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

  50. Coming Data Challenge • 2004 Data Challenge“DC04” • Analysis of data produced by25% LHC startup luminosity (2.1033 cm-2 s-1) @ a data-taking rate of 25Hz during 1 month = ~ 5. 107 events • = 5% LHC final luminosity (1034 cm-2 s-1) • To validate the software baseline: • new LCG persistency framework (POOL,ROOT) • new simulation software (OSCAR/Geant4) • new GRID tools and resources • 2003 pre-challenge: production of the 5. 107 events @ 2.1033 cm-2 s-1 “Spring 2002 CMS MonteCarlo Production: What ? How ? What next ?” Véronique Lefébure

More Related