1 / 52

Grid@Large workshop Lisboa 29 August 2005 Kors Bos NIKHEF, Amsterdam

LCG. LHC Data and Computing Grid Project LCG. Grid@Large workshop Lisboa 29 August 2005 Kors Bos NIKHEF, Amsterdam. Creation of matter ?. Hubble Telecope: in the Eagle Nebula 7000 ly from Earth in the Serpense Constellation

ceri
Télécharger la présentation

Grid@Large workshop Lisboa 29 August 2005 Kors Bos NIKHEF, Amsterdam

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LCG LHC Data and Computing Grid Project LCG Grid@Large workshop Lisboa 29 August 2005 Kors Bos NIKHEF, Amsterdam

  2. Creation of matter ? Hubble Telecope: in the Eagle Nebula 7000 ly from Earth in the Serpense Constellation Newborn stars emerging from compact pockets of dense stellar gas

  3. Remark 1 You need a real problem to be able to build an operational grid Building a generic grid infrastructure without a killing application for which it is vital to have a working grid is much more difficult

  4. The Large Hadron Collider • 27 km circumference • 1296 superconducting dipoles • 2500 other magnets • 2 proton beams of 7000 GeV each • 4 beam crossing points  experiments • 2835 proton bunches per beam • 1.1 x 1011 protons per bunch

  5. The Data Records data at 200 Hz, event size = 1 MB  200 MB/sec 1 operational yr = 107 seconds  2 PByte/year raw data Simulated data + derived data + calibration data  4 PByte/year There are 4 experiments  16 PByte/year For at least 10 years

  6. Balloon (30 Km) CD stack with 1 year LHC data! (~ 20 Km) Concorde (15 Km) 6 cm Mt. Blanc (4.8 Km) 50 CD-ROM = 35 GB Challenge I: The Data Volume Annual data production: 16 PetaBytes/year

  7. Remark 2 In data intensive sciences like particle physics the data management is challenge # 1 much more than the computing needs Event reconstruction is an embarrassingly parallel problem. The challenge is to get the data to and from the cpu’s. Keeping track of all versions of the reconstructed data and their location is another.

  8. But … Data Analysis is our Challenge II One event has many tracks Reconstruction takes ~1MSI2k/ev. = 10 minutes on a 3 GHz P4 100,000 cpu’s needed to keep up with data rate Re-reconstructed many times 10% of all data has to simulated also Simulation takes ~100 times longer Subsequent analysis steps over all data BUT … the problem is embarrassingly parallel !

  9. Remark 3 Grids offer a cost effective solution for codes that can run in parallel For many problems codes have not been written with parallel architectures in mind. Many applications that currently run on supercomputers (because they are there) could be executed in a more cost effective way on a grid of loosely coupled cpu’s.

  10. Challenge III : distributed user community

  11. Remark 4 Grids fit almost perfectly large international scientific collaborations Inversely, if you don’t already have an international collaboration, it is difficult to get grid resources outside your own country as a economic model is not in place yet.

  12. simulation Data Handling and Computation for Physics Analysis reconstruction event filter (selection & reconstruction) detector analysis processed data event summary data raw data batch physics analysis event reprocessing analysis objects (extracted by physics topic) event simulation interactive physics analysis les.robertson@cern.ch

  13. Tier-0 – the accelerator centre • Data acquisition & initial processing • Long-term data curation • Distribution of data  Tier-1 centres Tier-1 – “online” to the data acquisition process  high availability • Managed Mass Storage – grid-enabled data service • Data-heavy analysis • National, regional support Canada – Triumf (Vancouver) France – IN2P3 (Lyon) Germany – Forschunszentrum Karlsruhe Italy – CNAF (Bologna) Netherlands – NIKHEF/SARA (Amsterdam) Nordic countries – distributed Tier-1 Spain – PIC (Barcelona) Taiwan – Academia SInica (Taipei) UK – CLRC (Oxford) US – FermiLab (Illinois) – Brookhaven (NY) LCG Service Hierarchy Tier-2 – ~100 centres in ~40 countries • Simulation • End-user analysis – batch and interactive

  14. Tier3 physics department CERN Tier 0 physicsgroup Desktop    LHC Grid Lab m Uni x physicsgroup Uni a Netherlands Taipei Lab a UK Italy France Uni n Tier-1 Tier2 Nordic Spain Germany Canada USA(2) Lab b Lab c regional group Uni y Uni b les.robertson@cern.ch

  15. The OPN Network • Optical Private Network of permanent 10Gbit/sec light paths between the Tier-0 CERN and each of the Tier-1’s • Tier-1 to Tier-1 preferably also dedicated light paths (cross border fibres) but not strictly part of OPN • Additional (separate) fibres for Tier-0 to Tier-1 backup connections • Security considerations: ACLs because of limited number of applications and thus ports. Primarily a site issue. • In Europe: GEANT2 will deliver the desired connectivity, in the US: a combination of LHCNet and ESNet and in Asia: ASNet • Proposal to be drafted for a Network Operations Centre

  16. The Network for the LHC

  17. Remark 5 If data flows are known to be frequent (always) between the same end-points it is best to purchase your own wave lengths between those points A routed network would be more expensive a much harder to maintain. And why have a router when the start and destination addresses never change.

  18. Goal Create a European-wide production quality multi-science grid infrastructure on top of national & regional grid programs • Scale 70 partners in 27 countries Initial funding (€32M) for 2 years • Activities Grid operations and support joint LCG/EGEE operations team Middleware re-engineering close attention to LHC data analysis requirements Training, support for applications groups • Builds on LCG grid deployment Experience gained in HEP LHC experiments  pilot applications Integration LCG and EGEE

  19. 30 sites 3200 cpus 25 Universities 4 National Labs 2800 CPUs Grid3 Inter-operation with the Open Science Grid in the US and with NorduGrid:  Very early days for standards – still getting basic experience  Focus on baseline services to meet specific experiment requirements July 2005 150 Grid sites 34 countries 15,000 CPUs 8 PetaBytes

  20. Remark 6 There is no one grid and we will have to concentrate on interoperability and insist on standards We are still in early days for standards as only now working grids emerge and multiple implementations can be tested to work together

  21. Operational Infrastructure A grid of this size (still) needs organisation For stability, support, information sharing, etc In Europe we follow the EGEE hierarchy

  22. Remark 7 Although in contradiction with the definition of a pure grid, the infrastructure needs to be well managed and tightly controlled (at least initially)

  23. Site Functional TestsInternal testing of the site • A job with an ensemble of tests is submitted to each site (CE) once a day • This job runs on one CPU for about 10 minutes • Data is collected at the Operations Management Centre (CERN) • The data is made available publicly through web pages and in a database • These tests look at the site from the inside to check the environment a job finds when it arrives there

  24. Remark 8 You need a good procedure to exclude malfunctioning sites from a grid One malfunctioning site can destroy a whole well functioning grid f.e. through the “black hole effect”

  25. End of that table Selections: The Northers Region Only In Table Format with all entries listed Grid Status (GStat) TestsExternal monitoring of the sites • BDII and Site BDII monitoring • BDII: # of sites found • Site BDII: # of ldap objects • Old entries indicate the BDII is not being updated • Query response time: indication of network and server loading • GIIS monitoring • Collect GlueServiceURI objects from each site bdii: • GlueServiceType: The type of service provided • GlueServiceAccessPointURL: How to reach the service • GlueServiceAccessControlRule: Who can access this service • Organized Services • “ALL” by Service Type • “VO_name” by VOs • Certificate Lifetimes for CEs and SEs • Job submission monitoring Click Here

  26. Remark 9 You need to monitor how the grid as a whole is functioning The number of sites, number of cpu’s, number of free cpu’s Is one site never getting any jobs or is one site getting all the jobs

  27. RC RC RC RC RC RC ROC RC RC RC RC RC RC CIC CIC RC ROC ROC ROC RC RC CIC RC OMC CIC CIC CIC RC RC RC RC RC Grid Operations Monitoring: CIC on Duty • Core Infrastructure Centres: CERN, CNAF, IN2P3, RAL, FZK, Univ. Indiana, Taipei • Tools: • CoD Dashboard • Monitoring tools and in-depth testing • Communication tool • Problem Tracking Tool • 24/7 Rotating Duty

  28. Remark 10 In spite of all atomization you need a person ‘on duty’ to watch the operations and react if something needs to be done System administrators are of the focused on fabric management and have to be alerted when they need to do some grid management

  29. Ticketing system Interface GGUS CIC Portal mails SFT GOC DB Gstat … Static data Monitoring tools CIC-On-Duty (COD) workflow One week duty rotates among CICs Procedures and Tools Ensure coordination, availability, reliability CH, UK, France, Germany, Italy, Russia But also, Univ. Of Indiana, US, ASCC Taipei Issues: Needs strong link to security groups Needs strong link to user support Monitoring of site responsiveness to problems Scheduling of tests and upgrades Alarms and follow-up When and how to remove a site? When and how to suspend a site? When and how to add sites?

  30. ` Problem categories • ` Sites list (reporting new problems) Test summary (SFT,GSTAT) GGUS Ticket status Tools – 1: CIC Dashboard

  31. Critique 1 Critique on CoD • At any time ~40 sites out of 140 have problems • CIC-on-duty is sometimes unable to open all new tickets • ~80% of resources are concentrated in ~20 biggest sites, failures at big sites have dramatic impact on the availability of resources • Prioritise sites: big sites before small sites • Prioritise problems: severe problems before warnings • Remove problem sites  loss of resources • Separate compute problems from storage problems • Storage sites can not be removed!

  32. Remark 11 Job monitoring is something different from grid monitoring and equally important Grid monitoring gives a systems perspective of the grid, job monitoring is what the user wants to see

  33. User’s Job Monitoring • The tool is R-GMA, data is kept in a database • Data can be viewed from anywhere • A number of R-GMA tables contain information about: • Job Status (JobStatusRaw table) • Information from the logging and bookkeeping service • The Status of the job, Queued, Running, Done etc. • Job-Id, State, Owner, BKServer, Start time, etc • Job Monitor (JobMonitor table) • The Metrics of the running job every 5 minutes • A script forked off the worker node by the job wrapper • Job-Id, Local-Id, User-dn, Local-user, VO, RB, WCTime, CPUTime, Real Mem, etc. • Grid FTP (GridTFP table) • Information about file transfers • Host, user-name, src, dest, file_name, throughput, start_time, stop_time, etc • Job Status (User Defined Tables) • job-step, event-number, etc.

  34. Critique 1 Critique on User’s Job Monitoring • Querying a data base is not particularly user friendly • Building and populating user defined tables is not straight forward • We have tried to solve the problem with R-GMA but there are other solutions

  35. User Support • Covers Helpdesk, User Information, Training, Problem Tracking, Problem Documentation, Support Staff Information, Metrics, .. • Tool for problem submission and tracking, Tool to access knowledge base and FAQ’s, Tool to status information and contacts, .. • Websites to documentation, information, howto’s, training, support staff, .. • Where is my job? Where is the data? • Why did my job fail? • I don’t understand the error! • It says certificate expired, but I think it is not? • It says I’m not the VO, but I am • etc • How do I get started? • Where do I go for help? • What are procedures for certificates, CA’s, VO’s? • Are there examples, templates? • etc

  36. Operations Center (CIC / GOC / ROC)Operations Problems Resource Centers (RC)Hardware Problems Deployment Support (DS)Middleware Problems Global Grid User Support (GGUS)Single Point of Contact Coordination of UserSupport Experiment Specific User Support (ESUS)VO spec. (Software) Problems Other Communities (VOs), e.g. EGEE LHC experiments(Alice Atlas CMS LHCb) non-LHC experiments(BaBar, CDF, Compass, D0) Support Teams in LCG & EGEE

  37. User Support Workflow ESUS Applicationproblem GGUS DS Middlewareproblem User GOC+Resource Center Operationproblem

  38. Grid.it The User’s View At present many channels used: EIS contact people: support-eis@cern.ch LCG Rollout list: LCG-ROLLOUT@LISTSERV.RL.AC.UK Experiment specific: atlas-lcg@cen.ch GGUS: http://www.ggus.org Not a real agreed procedure. GGUS provides a useful portal and problem tracking tools – however requests are forwarded, information spread, etc. http://www.grid-support.ac.uk/

  39. Critique 3 User’s Support Critique • GGUS doesn’t work as we expected, but maybe the scope was too ambitious: Global Grid User Support • The setup was made too much top-down rather than starting from the user. A user wants to talk to her local support people and we should have started from there. Should a user know about GGUS? • Anything that can be done locally should be done locally and if that cannot be done be taken a level up like to the ROC • GGUS was too formal and set up in too much isolation. (Sysadmins got tickets to resolve without even having heard about GGUS) • The new service takes account of these points and is getting better: • more adapted procedures • integration of a simpler ticketing system; • improved application specific (VO) support • better integration with the EGEE hierarchy and the US system

  40. Critique 4 Do not organise globally what can be done locally In the Grid community there is a tendency to do everything globally. However there are things that can better be done locally like user support, training, installation tools, fabric monitoring, accounting, etc.

  41. Critique 5 Software Releases Critique • A heavy process, not well adapted to many bugs found in early stages • Becoming more and more depended on other packages • Hard to coordinate upgrades with on-going production work • Difficult to make >100 sites upgrade all at once: local restrictions • Difficult to account for local differences: operations systems, patches, security • Conscious decision to not prescribe an installation tool: default just rpm’s but also YAIM, others use Quattor and yet other tools

  42. Operational security We now have (written and agreed) procedures for: • Installation and Resolution of Certification Authorities • Release and withdrawal of certificates • Joining and leaving the service • Adding and Removing Virtual Organisations • Adding and Removing users • Acceptable Use Policies: Taipei accord • Risk assessment • Vulnerability • Incident response Legal Issues need to be more addressed urgently

  43. Last Operational Issue: The Challenges • Challenges are needed in all operational areas: • Monte Carlo Challenges (massive CPU usage) • Data Challenges (distributed data analysis) • Data Recording Challenges (writing to tape) • Service Challenges (raw and derived data distribution round the clock) • Security Challenges (not announced beforehand) • Milestones defined by the applications and agreed with the sites in the Grid Deployment Board GDB • Pre-defined metrics determine the level of success/shortfall • Massive amount of work, we can do only a few per year. All above mentioned components are needed • Should gradually evolve to pre-production and production status

  44. LCG Data Recording Challenge Tape server throughput – 1-8 March 2005 • Simulated data acquisition system to tape at CERN • In collaboration with ALICE – as part of their 450 M B/sec data challenge • Target – one week sustained at 450 MB/sec – achieved 8 March • Succeeded!

  45. Service Challenge 2 • Data distribution from CERN to some Tier-1 sites • Original target – sustain daily average of 500 MByte/sec from CERN to at least 5 Tier-1 sites for one week by the end of April • Target raised to include 7 sites and run for 10 days • BNL, CCIN2P3, CNAF,FNAL, GridKa, RAL,NIKHEF/SARA • Achieved on 2 April – -- average 600 MB/sec-- peak 820 MB/sec • 500 MB/sec is 30%of the data distributionthroughput required for LHC

  46. Monte Carlo ChallengesRecent ATLAS work ~10,000 concurrent jobs in the system Number of jobs/day • ATLAS jobs in EGEE/LCG-2 in 2005 • In latest period up to 8K jobs/day • Several times the current capacity for ATLAS at CERN alone – shows the reality of the grid solution

  47. Service Challenge 3 (ongoing) Target: > 1 GByte/sec of data out of CERN, disk-to-disk With ~150 MByte/s to individual sites And ~40 Mbytes/s to tape at some Tier-1 sites Sustained for 2 weeks

  48. Remark 12 Challenges are paramount on the way to an operational grid Grid Service Challenges as well as Application Challenges are needed for testing as well as for planning purposes

  49. Ramping up to the LHC Grid Service580 days left before LHC turn-on • The grid services for LHC will be ramped-up through two Service Challenges SC3 this year and SC4 next year • These will include the Tier-0, the Tier-1s and the major Tier-2s • Each service Challenge includes – -- a set-up period • check out the infrastructure/service to iron out the problems before the experiments get fully involved • schedule allows time to provide permanent fixes for problems encountered • A throughput test -- followed by a long stable period for the applications to check out their computing model and software chain

  50. Grid Services Achieved • We have an operational grid running at ~10,000 concurrent jobs in the system • We have more sites and processors than we anticipated at this stage: ~140 sites, ~12,000 processors • The # of sites is close to that needed for the full LHC grid but only at a few % of the full capacity • Grid operations shared responsibility between operations centres • We have successfully completed a series of Challenges • 34 countries working together in a consensus based organisation

More Related