1 / 27

CERN and the LHC Computing Grid

CERN and the LHC Computing Grid. Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch. What is CERN?. CERN is: 2500 staff scientists (physicists, engineers, …) Some 6500 visiting scientists (half of the world's particle physicists)

gage
Télécharger la présentation

CERN and the LHC Computing Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch HP Puerto-Rico – 9 February 2004 - 1

  2. What is CERN? • CERN is: • 2500 staff scientists (physicists, engineers, …) • Some 6500 visiting scientists (half of the world's particle physicists) • They come from • 500 universities • representing • 80 nationalities. • CERN is the world's largest particle physics centre funded by 20 European member states • Particle physics is about: • - elementary particles of which all matter in the universe is made • - fundamental forces which hold matter together • Particles physics requires: • - special tools to create and study new particles HP Puerto-Rico – 9 February 2004 - 2

  3. … is located in Geneva, Switzerland Mont Blanc, 4810 m Downtown Geneva HP Puerto-Rico – 9 February 2004 - 3

  4. What is CERN? • The special tools for particle physics are: • ACCELERATORSHuge machines able to speed up particles to very high energies before colliding them into other particles • DETECTORSMassive instruments which register the particles produced when the accelerated particles collide HP Puerto-Rico – 9 February 2004 - 4

  5. What is LHC? LHC is due to switch on in 2007 Four experiments, with detectors as ‘big as cathedrals’: ALICE ATLAS CMS LHCb • LHC will collide beams of protons at an energy of 14 TeV • Using the latest super-conducting technologies, it will operate at about – 2700C, just above absolute zero of temperature. • With its 27 km circumference, the accelerator will be the largest superconducting installation in the world. HP Puerto-Rico – 9 February 2004 - 5

  6. The LHC Data Challenge • A particle collision = an event • Provides trivial parallelism, hence usage of simple farms • Physicist's goal is to count, trace and characterize all the particles produced and fully reconstruct the process. • Among all tracks, the presence of “special shapes” is the sign for the occurrence of interesting interactions. HP Puerto-Rico – 9 February 2004 - 6

  7. The LHC Data Challenge Starting from this event… Selectivity: 1 in 1013 Like looking for 1 person in a thousand world populations! Or for a needle in 20 million haystacks! You are looking for this “signature” HP Puerto-Rico – 9 February 2004 - 7

  8. 1 Megabyte (1MB) A digital photo 1 Gigabyte (1GB) = 1000MB A DVD movie 1 Terabyte (1TB) = 1000GB World annual book production 1 Petabyte (1PB) = 1000TB 10% of the annual production by LHC experiments 1 Exabyte (1EB) = 1000 PB World annual information production LHC data (simplified) • 40 million collisions per second • After filtering, 100 collisions of interest per second • A Megabyte of digitised information for each collision = recording rate of 0.1 Gigabytes/sec • 1011 collisions recorded each year = 10 Petabytes/year of data CMS LHCb ATLAS ALICE HP Puerto-Rico – 9 February 2004 - 8

  9. today Expected LHC computing needs Data: ~15 Petabytes a year Processing: ~ 100,000 of today’s PC’s Moore’s law (basedon 2000 data) Networking: 10 – 40 Gb/s to all big centres HP Puerto-Rico – 9 February 2004 - 9

  10. Computing at CERN today • High-throughput computing based on reliable “commodity” technology • More than 1500 dual processor PCs • More than 3 Petabyte of data on disk (10%) and tapes (90%) Nowhere near enough! HP Puerto-Rico – 9 February 2004 - 10

  11. Computing at CERN today The new computer room is being populated… CPU servers Disk servers Tape silos and servers HP Puerto-Rico – 9 February 2004 - 11

  12. …while the existing computer centre is being cleared for renovation… Computing at CERN today CPU servers Disk servers …and an upgrade of the power supply from 0.5MW to 2.5MW is underway. HP Puerto-Rico – 9 February 2004 - 12

  13. Computing for LHC Europe: ~270 institutes ~4500 users Elsewhere: ~200 institutes ~1600 users • Problem: even with computer centre upgrade, CERN can only provide a fraction of the necessary resources • Solution:computing centres, which were isolated in the past, will now be connected, uniting the computing resources of particle physicists in the world using GRID technologies! HP Puerto-Rico – 9 February 2004 - 13

  14. LHC Computing Grid Project • The LCG Project is a collaboration of – • The LHC experiments • The Regional Computing Centres • Physics institutes .. working together to prepare and deploy the computing environment that will be used by the experiments to analyse the LHC data • This includes support for applications • provision of common tools, frameworks, environment, data persistency • .. and the development and operation of a computing service • exploiting the resources available to LHC experiments in computing centres, physics institutes and universities around the world • presenting this as a reliable, coherent environment for the experiments HP Puerto-Rico – 9 February 2004 - 14

  15. Applications Area Torre Wenaus Development environment Joint projects, Data management Distributed analysis Middleware Area Frédéric Hemmer Provision of a base set of grid middleware (acquisition, development, integration)Testing, maintenance, support CERN Fabric AreaBernd Panzer Large cluster management Data recording, Cluster technology Networking, Computing service at CERN Grid Deployment Area Ian Bird Establishing and managing the Grid Service - Middleware, certification, security operations, registration, authorisation,accounting LCG Project • Technology Office - David Foster • Overall coherence of the project; Pro-active technology watch • Long-term grid technology strategy; Computing models HP Puerto-Rico – 9 February 2004 - 15

  16. Project Management Board Project Management Management Team SC2, GDB chairs Experiment Delegates External Projects EDG, GridPP, INFN Grid, VDT, Trillium Other Resource Suppliers IN2P3, Germany, CERN-IT PEB deals directly with the Fabric and Middleware areas Architects’ Forum Applications Area Manager Experiment Architects Computing Coordinators Grid Deployment Board Experiment delegates, national regional centre delegates The GDB negotiates and Agrees operational and security policy, Resource allocation, etc HP Puerto-Rico – 9 February 2004 - 16

  17. LCG-1 components (schematic) LCG, experiments High level services User interfaces Applications EU DataGrid “Active” services Information system Global scheduler Data management VDT (Globus, GLUE) “Passive” services Data transfer User access Security Information schema PBS, Condor, LSF,… NFS, … System software RedHat Linux Operating system File system Local scheduler Closed system (?) HPSS, CASTOR… Hardware Computing cluster Network resources Data storage HP Puerto-Rico – 9 February 2004 - 17

  18. Elements of a Production Grid Service • Middleware: - the systems software that interconnects the computing clusters at regional centres to provide the illusion of a single computing facility • Information publishing and finding, distributed data catalogue, data management tools, work scheduler, performance monitors, etc. • Operations: • Grid infrastructure services • Registration, accounting, security • Regional centre and network operations • Grid operations centre(s) – trouble and performance monitoring, problem resolution – 24x7 around the world • Support: • Middleware and systems support for computing centres • Applications integration, production • User support – call centres/helpdesk – global coverage; documentation; training HP Puerto-Rico – 9 February 2004 - 18

  19. LCG Service • Certification and distribution process established • Middleware package – components from – • European DataGrid (EDG) • US (Globus, Condor, PPDG, GriPhyN)  the Virtual Data Toolkit • Agreement reached on principles for registration and security • Rutherford Lab (UK) to provide the initial Grid Operations Centre • FZK (Karlsruhe) to operate the Call Centre The 1st “certified” release was made available to 14 centres on 1 September –Academia Sinica Taipei, BNL, CERN, CNAF, Cyfronet Cracow, FNAL, FZK, IN2P3 Lyon, KFKI Budapest, Moscow State Univ., Prague, PIC Barcelona, RAL, Univ. Tokyo HP Puerto-Rico – 9 February 2004 - 19

  20. LCG Service – Next Steps • Deployment status – • 12 sites active when service opened on 15 September • ~30 sites now active • Pakistan, China, Korea, HP, ..preparing to join • Preparing now for adding new functionality in November to be ready for 2004 • VO management system • Integration of mass storage systems • Experiments now starting their tests on LCG-1 • CMS target is to have 80% of their production on the grid before the end of the PCP of DC04 • Essential that experiments use all features (including/especially data management) • -- and exercise the grid model even if not needed for short term challenges • Capacity will follow readiness of experiments HP Puerto-Rico – 9 February 2004 - 20

  21. LCG Service – Next Steps • Deployment status – • 12 sites active when service opened on 15 September • 28 sites now active • HP, Pakistan, Australia, Korea, China, ..preparing to join • Starting to deploy LCG-2 – upgrade for 2004 • VO management system • Integration of mass storage systems • Experiments now starting their tests on LCG-2 in preparation for Data Challenges • CMS target is to have 80% of their production on the grid before the end of the PCP of DC04 • Essential that experiments use all features (including/especially data management) • -- and exercise the grid model even if not needed for short term challenges • Capacity will follow readiness of experiments HP Puerto-Rico – 9 February 2004 - 21

  22. Resources committed for 1Q04 Resources in Regional Centres • Resources planned for the period of the data challenges in 2004 • CERN ~12% of the total capacity • Numbers have to be refined – different standards used by different countries • Efficiency of use is a major question mark • Reliability • Efficient scheduling • Sharing between Virtual Organisations (user groups) HP Puerto-Rico – 9 February 2004 - 22

  23. agree spec. of initial service (LCG-1) 2003 2004 2005 2006 2007 open LCG-1 (achieved) – 1 Sept first data LCG Service Time-line computing service physics open LCG-1 (schedule – 1 July) used for simulated event productions • Level 1 Milestone – Opening of LCG-1 service • 2 month delay, lower functionality than planned • use by experiments will only starting now (planned for end August) •  decision on final set of middleware for the 1H04 data challenges will be taken without experience of production running • reduced time for integrating and testing the service with experiments’ systems before data challenges start next spring • additional functionality will have to be integrated later HP Puerto-Rico – 9 February 2004 - 23

  24. 2003 2004 2005 2006 2007 agree spec. of initial service (LCG-1) open LCG-1 (achieved) – 1 Sept LCG-2 - upgraded middleware, mgt. and ops tools principal service for LHC data challenges Computing model TDRs* LCG-3 – second generation middleware validation of computing models TDR for the Phase 2 grid Phase 2 service acquisition, installation, commissioning experiment setup & preparation Phase 2 service in production LCG Service Time-line computing service physics used for simulated event productions first data * TDR – technical design report HP Puerto-Rico – 9 February 2004 - 24

  25. LCG and EGEE • EU project approved to provide partial funding for operation of a general e-Science grid in Europe, including the supply of suitable middleware –Enabling Grids for e-Science in Europe – EGEEEGEE provides funding for 70 partners, large majority of which have strong HEP ties • Similar funding being sought in the US • LCG and EGEE work closely together, sharing the management and responsibility for - • Middleware – share out the work to implement the recommendations of HEPCAL II and ARDA • Infrastructure operation – LCG will be the core from which the EGEE grid develops – ensures compatibility; provides useful funding at many Tier 1, Tier2 and Tier 3 centres • Deployment of HEP applications - small amount of funding provided for testing and integration with LHC experiments HP Puerto-Rico – 9 February 2004 - 25

  26. Middleware - Next 15 months • Work closely with experiments on developing experience with early distributed analysis models using the grid • Multi-tier model • Data management, localisation, migration • Resource matching & scheduling • Performance, scalability • Evolutionary introduction of new software – rapid testing and integration into mainline services – – while maintaining a stable service for data challenges! • Establish a realistic assessment of the grid functionality that we will be able to depend on at LHC startup – a fundamental input for the Computing Model TDRs due at end 2004 HP Puerto-Rico – 9 February 2004 - 26

  27. Grids - Maturity is some way off • Research still needs to be done in all key areas of importance to LHC • e.g. data management, resource matching/provisioning, security, etc. • Our life would be easier if standards were agreed and solid implementations were available – but they are not • We are just entering now in the second phase of development • Everyone agrees on the overall direction, based on Web services • But these are not simple developments • And we still are learning how to best approach many of the problems of a grid • There will be multiple and competing implementations – some for sound technical reasons • We must try to follow these developments and influence the standardisation activities of the Global Grid Forum (GGF) • It has become clear that LCG will have to live in a world of multiple grids – but there is no agreement on how grids should inter-operate • Common protocols? • Federations of grids inter-connected by gateways? • Regional Centres connecting to multiple grids? Running a service in this environment will be challenge! HP Puerto-Rico – 9 February 2004 - 27

More Related