310 likes | 315 Vues
Learn about GridPP's development and deployment status, challenges, and plans for the future in this informative lunchtime talk session.
E N D
Middleware Development and Deployment Status Tony Doyle PPE & PPT Lunchtime Talk
Contents • What is GridPP doing as part of the International effort? • What was GridPP1? • Is GridPP a Grid? • What is planned for GridPP2? • What lies ahead? • Summary • Why? What? How? When? • What are the Challenges? • What is the scale? • How does the Grid work? • What is the status of (EGEE) middleware development? • What is the deployment status? PPE & PPT Lunchtime Talk
Science generates data and might require a Grid? Earth Observation Bioinformatics Astronomy Digital Curation Healthcare ? Collaborative Engineering PPE & PPT Lunchtime Talk
What are the challenges? • Must • share databetween thousands of scientists with multiple interests • link major (Tier-0 [Tier-1]) and minor (Tier-1 [Tier-2])computer centres • ensure all data accessible anywhere, anytime • grow rapidly, yet remainreliablefor more than a decade • cope withdifferent management policiesof different centres • ensure data security • be up and running routinely by2007 PPE & PPT Lunchtime Talk
What are the challenges? 2. Software efficiency 1. Software process 3. Deployment planning 4. Link centres 10. Policies 5. Share data Data Management, Security and Sharing 9. Accounting 8. Analyse data 7. Install software 6. Manage data PPE & PPT Lunchtime Talk
Tier-1 Scale Step-1.. financial planning Step-2.. Compare to (e.g. Tier-1) expt. requirements Ian Foster / Carl Kesselman: "A computational Grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities." Step-3.. Conclude that more than one centre is needed Step-4.. A Grid? Currently network performance doubles every year (or so) for unit cost. PPE & PPT Lunchtime Talk
What is the Grid? Hour Glass I. Experiment Layer e.g. Portals II. Application Middleware e.g. Metadata III. Grid Middleware e.g. Information Services IV. Facilities and Fabrics e.g. Storage Services PPE & PPT Lunchtime Talk
How do I start? http://www.gridpp.ac.uk/start/ • Getting started as a Grid user • Quick start guide for LCG2GridPP guide to starting as a user of the Large Hadron Collider Computing Grid. • Getting an e-science certificateIn order to use the Grid you need a Grid certificate. This page introduces the UK e-Science Certification Authority, which issues cerficates to users. You can get a certificate from here. • Using the LHC Computing Grid (LCG)CERN's guide on the steps you need to take in order to become a user of the LCG. This includes contact details for support. • LCG user scenarioThis describes in a practical way the steps a user has to follow to send and run jobs on LCG and to retrieve and process the output successfully. • Currently being improved.. PPE & PPT Lunchtime Talk
Job Submission(behind the scenes) Input “sandbox” DataSets info UI JDL Output “sandbox” grid-proxy-init SE & CE info Output “sandbox” Expanded JDL Job Submit Event Job Query Input “sandbox” + Broker Info Publish Job Status Storage Element Globus RSL Job Status Job Status Replica Catalogue Information Service Resource Broker Author. &Authen. Job Submission Service Logging & Book-keeping Compute Element PPE & PPT Lunchtime Talk
Deliver a 24/7 Grid service to European science build a consistent, robust and secure Grid network that will attract additional computing resources. continuously improve and maintain the middleware in order to deliver a reliable service to users. attract new users from industry as well as science and ensure they receive the high standard of training and support they need. 100 million euros/4years, funded by EU >400 software engineers + service support 70 European partners Enabling Grids for E-sciencE PPE & PPT Lunchtime Talk
Prototype MiddlewareStatus & Plans (I) • Workload Management • AliEn TaskQueue • EDG WMS (plus new TaskQueue and Information Supermarket) • EDG L&B • Computing Element • Globus Gatekeeper + LCAS/LCMAPS • Dynamic accounts (from Globus) • CondorC • Interfaces to LSF/PBS (blahp) • “Pull components” • AliEn CE • gLite CEmon (being configured) Blue: deployed on development testbed Red: proposed LHCC Comprehensive Review – November 2004 11
Storage Element Existing SRM implementations dCache, Castor, … FNAL & LCG DPM gLite-I/O (re-factored AliEn-I/O) Catalogs AliEn FileCatalog – global catalog gLite Replica Catalog – local catalog Catalog update (messaging) FiReMan Interface RLS (globus) Data Scheduling File Transfer Service (Stork+GridFTP) File Placement Service Data Scheduler Metadata Catalog Simple interface defined (AliEn+BioMed) Information & Monitoring R-GMA web service version; multi-VO support Prototype MiddlewareStatus & Plans (II) LHCC Comprehensive Review – November 2004 12
Security VOMS as Attribute Authority and VO mgmt myProxy as proxy store GSI security and VOMS attributes as enforcement fine-grained authorization (e.g. ACLs) globus to provide a set-uid service on CE Accounting EDG DGAS(not used yet) User Interface AliEn shell CLIs and APIs GAS Catalogs Integrate remaining services Package manager Prototype based on AliEn backend evolve to final architecture agreed with ARDA team Prototype MiddlewareStatus & Plans (III) LHCC Comprehensive Review – November 2004 13
LCG ARDA EGEE Expmts CB PMB Deployment Board User Board Tier1/Tier2, Testbeds, Rollout Service specification & provision Requirements Application Development User feedback Metadata Storage Workload Network Security Info. Mon. PPE & PPT Lunchtime Talk
Middleware Development Network Monitoring Configuration Management Grid Data Management Storage Interfaces Information Services Security PPE & PPT Lunchtime Talk
Application Development ATLAS LHCb CMS SAMGrid (FermiLab) BaBar (SLAC) QCDGrid PhenoGrid PPE & PPT Lunchtime Talk
GridPP Deployment Status GridPP deployment is part of LCG (Currently the largest Grid in the world) The future Grid in the UK is dependent upon LCG releases Three Grids on Global scale in HEP (similar functionality) sites CPUs • LCG (GridPP) 90 (15) 8700 (1500) • Grid3 [USA] 29 2800 • NorduGrid 30 3200 PPE & PPT Lunchtime Talk
LCG Overview • By 2007: • 100,000 CPUs • - More than 100 institutes worldwide • building on complex middleware being developed in advanced Grid technology projects, both in Europe (Glite) and in the USA (VDT) • prototype went live in September 2003 in 12 countries • Extensively tested by the LHC experiments during this summer PPE & PPT Lunchtime Talk
Deployment Status (26/10/04) • Incremental releases: significant improvements in reliability, performance and scalability • within the limits of the current architecture • scalability is much better than expected a year ago • Many more nodes and processors than anticipated • installation problems of last year overcome • many small sites have contributed to MC productions • Full-scale testing as part of this year’s data challenges • GridPP “The Grid becomes a reality” – widely reported British Embassy (USA) Technology Sites British Embassy (Russia) PPE & PPT Lunchtime Talk
Data Challenges • Ongoing.. • Grid and non-Grid Production • Grid now significant • ALICE - 35 CPU Years • Phase 1 done • Phase 2 ongoing LCG • CMS - 75 M events and 150 TB: first of this year’s Grid data challenges Entering Grid Production Phase.. PPE & PPT Lunchtime Talk
Data Challenge • 7.7 M GEANT4 events and 22 TB • UK ~20% of LCG • Ongoing.. • (3) Grid Production • ~150 CPU years so far • Largest total computing requirement • Small fraction of what ATLAS need.. Entering Grid Production Phase.. PPE & PPT Lunchtime Talk
LHCb Data Challenge 186 M Produced Events Phase 1 Completed 3-5 106/day LCG restarted LCG paused LCG in action 1.8 106/day DIRAC alone 424 CPU years (4,000 kSI2k months), 186M events • UK’s input significant (>1/4 total) • LCG(UK) resource: • Tier-1 7.7% • Tier-2 sites: • London 3.9% • South 2.3% • North 1.4% • DIRAC: • Imperial 2.0% • L'pool 3.1% • Oxford 0.1% • ScotGrid 5.1% Entering Grid Production Phase.. PPE & PPT Lunchtime Talk
Paradigm ShiftTransition to Grid… 424 CPU · Years May: 89%:11% 11% of DC’04 Jun: 80%:20% 25% of DC’04 Jul: 77%:23% 22% of DC’04 Aug: 27%:73% 42% of DC’04 PPE & PPT Lunchtime Talk
More Applications • ZEUS uses LCG • needs the Grid to respond to increasing demand for MC production • 5 million Geant events on Grid since August 2004 • QCDGrid • For UKQCD • Currently a 4-site data grid • Key technologies used • - Globus Toolkit 2.4 • - European DataGrid • eXist XML database • managing a few hundred gigabytes of data PPE & PPT Lunchtime Talk
Issues “LCG-2 MIDDLEWARE PROBLEMS AND REQUIREMENTS FOR LHC EXPERIMENT DATA CHALLENGES” First large-scale Grid production problems being addressed… at all levels https://edms.cern.ch/file/495809/2.2/LCG2-Limitations_and_Requirements.pdf PPE & PPT Lunchtime Talk
Coordinates resources that are not subject to centralized control … using standard, open, general-purpose protocols and interfaces … to deliver nontrivial qualities of service YES. This is why development and maintenance of LCG is important. YES. VDT (Globus/Condor-G) + EDG/EGEE(Glite) ~meet this requirement. YES. LHC experiments data challenges over the summer of 2004. 5 Is GridPP a Grid? http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf http://agenda.cern.ch/fullAgenda.php?ida=a042133 PPE & PPT Lunchtime Talk
What was GridPP1? • A team that built a working prototype grid of significant scale > 1,500 (7,300) CPUs > 500 (6,500) TB of storage > 1000 (6,000) simultaneous jobs • A complex project where 82% of the 190 tasks for the first three years were completed A Success “The achievement of something desired, planned, or attempted” PPE & PPT Lunchtime Talk
Aims for GridPP2? From Prototype to Production BaBarGrid BaBar EGEE SAMGrid CDF D0 ATLAS EDG LHCb ARDA GANGA LCG ALICE CMS LCG CERN Tier-0 Centre CERN Prototype Tier-0 Centre CERN Computer Centre UK Tier-1/A Centre UK Prototype Tier-1/A Centre RAL Computer Centre 4 UK Tier-2 Centres 19 UK Institutes 4 UK Prototype Tier-2 Centres Separate Experiments, Resources, Multiple Accounts Prototype Grids 'One' Production Grid 2004 2007 2001 PPE & PPT Lunchtime Talk
Planning: GridPP2 ProjectMap Structures agreed and in place (except LCG phase-2) PPE & PPT Lunchtime Talk
What lies ahead? Some mountain climbing.. Annual data storage: 12-14 PetaBytes per year CD stack with 1 year LHC data (~ 20 km) 100 Million SPECint2000 Importance of step-by-step planning… Pre-plan your trip, carry an ice axe and crampons and arrange for a guide… Concorde (15 km) In production terms, we’ve made base camp 100,000 PCs (3 GHz Pentium 4) We are here (1 km) Quantitatively, we’re ~9% of the way there in terms of CPU (9,000 ex 100,000) and disk (3 ex 12-14*3 years)… PPE & PPT Lunchtime Talk
Why? 2. What? • 3. How? 4. When? • From Particle Physics perspective the Grid is: • 1. needed to utilise large-scale computing resources efficiently and securely • 2. a) a working prototype running today on large testbed(s)… • b) about seamless discovery of computing resources • c) using evolving standards for interoperation • d) the basis for computing in the 21st Century • e) not (yet) as transparent or robust as end-users need • 3. see the GridPP getting started pages • (two-day EGEE training courses available) • a) Now, at prototype level, for simple(r) applications (e.g. experiment Monte Carlo production) • b) September 2007 for more complex applications (e.g. data analysis) – ready for LHC PPE & PPT Lunchtime Talk