160 likes | 177 Vues
LCG and HEPiX. Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002. What is LCG? Why is it relevant to HEPiX?. LCG Project Goals. Goal – Prepare and deploy the LHC computing environment. applications - tools, frameworks, environment, persistency
 
                
                E N D
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002
What is LCG? Why is it relevant to HEPiX? Ian.Bird@cern.ch
LCG Project Goals Goal –Prepare and deploy the LHC computing environment • applications- tools, frameworks, environment, persistency • computing system global grid service • cluster  automated fabric • collaborating computer centres  grid • CERN-centric analysis  global analysis environment • central role of data challenges This is not another grid technology project – it is a grid deployment project Ian.Bird@cern.ch
LCG Level 1 Milestonesproposed to LHCC Ian.Bird@cern.ch
LCG and its interactions GTA Common Applications Deployment Fabric Experiments Grid Projects HEPCAL PPDG iVDGL (VDT) GriPhyN Globus GLUE EDG NorduGrid GDB AliEn Regional Centres CERN Ian.Bird@cern.ch
Multi-dimensional problem • Regional Centres: • Host one or more experiments • Different RC’s deploy different grid middleware in existing testbeds • Have different operational and security policies • Experiments: • Use middleware from various grid projects • Run at many regional centres • Provide applications that rely on specific middleware • Grid projects: • Provide middleware – that does not often (yet) interoperate • Starting to collaborate on common solutions and interoperability  The Deployment area of LCG ties these all together Ian.Bird@cern.ch
Grid Deployment – goals of LCG-1 • Production service for Data Challenges in 2H03 & 2004 • Focused on batch production work • Experience in close collaboration between the Regional Centres • Should have wide enough participation to understand the issues, but not too many initially • Learn how to maintain and operate a global grid • Focus on a production-quality service and all that implies • Robustness, fault-tolerance, predictability, and supportability take precedence over functionality • But – minimum functionality to be of value • This requires: • a middleware support group with integration, certification, testing, packaging etc. responsibilities • A support structure • LCG should be integrated into the sites’ physics computing services – should not be something apart • This requires coordination between participating sites in: • Policies and collaborative agreements • Resource planning and scheduling • Operations • Support Ian.Bird@cern.ch
What might LCG-1 look like? • User’s perspective: - requires • Functionality adequate to provide advantage over not using distributed model • Straightforward to use – • Well defined services • Advice on how to use the system • Help with problems • Failures should be understandable • Ability to determine status of jobs and data • Sites’ perspective: • Integrated into computer centre/IT (inc. security) infrastructures • Able to support service • Able to allocate and manage resources – local autonomy where needed • Overall service perspective: • Performance and problem monitoring • Accounting • Etc. Ian.Bird@cern.ch
LCG has to build the “virtual computer centre” (= LHC computing environment) • With all that is expected from a production service • User support • Operations group • “Account” management • Security • Fabric management • Etc.. • Except this is now distributed across many countries and continents • Requires agreements, collaboration, and coordination • At all levels: management, system managers, user support, etc. Ian.Bird@cern.ch
Grid Operation queries monitoring & alarms corrective actions User Local user support Local operation Local site Call Centre Grid Operations Centre Grid information service Grid operations Grid logging & bookkeeping Virtual Organisation Network Operations Centre Ian.Bird@cern.ch
Deployment Summary • Deploy middleware to support essential functionality, but goal is to evolve and incrementally add functionality • Added value is to robustify, support and make into a 24x7 production service • How? • Certification & test procedure – tight feedback to developers • must develop support agreements with grid projects to ensure this • Define missing functionality – require from providers • Provide documentation and training • Provide missing operational services • Provide a 24x7 Operations and Call Centre • Guarantee to respond • Single point of contact for a user • Make software easy to install – facilitate new centres joining Ian.Bird@cern.ch
LCG Strategy • Develop as little as possible • Use existing middleware, tools and software • Pressure developers to provide missing functionality • Negotiate support agreements • Leverage existing experience • Various data grid projects and testbeds • Teragrid, interoperability demonstrations, GGF – production grids area • Actively encourage collaboration and coordination Ian.Bird@cern.ch
Grid Deployment Teams – the plan HEPiX interests suppliers’ integration teams provide tested releases common applications s/w Trillium - US grid middleware DataGrid middleware certification, build & distribution LCG infrastructure coordination & operation user support grid operation call centre LCG … fabric operation regional centre A fabric operation regional centre B fabric operation regional centre X fabric operation regional centre Y Ian.Bird@cern.ch
Coordination & Collaboration • There are many opportunities for common solutions, which should be actively pursued • HICB – JTB, existing & proposed new collaborative activities • GLUE • Schema definitions & interoperability work • Validation and Test Suites • Distribution and Meta-Packaging • Interoperable distribution and configuration utilities identified as a definite need by all the recent trans-Atlantic demonstration and validation work. • Support for this group comes from: • LCG, EDG, EDT, Trillium, DataTAG • Security czars • Already talking to address grid issues • GGF • Production grids • AAA • Etc. • LCG – grid deployment board, etc. Ian.Bird@cern.ch
Summary of Issues that might be addressed by HEPiX/LCCWS • I know many of these are discussed by a plethora of grid projects and offshoots, but remember, more than ever before we all have to work together coherently to make a grid work: • Grid operations centre: Teragrid, iVDGL • User support – • distributed helpdesk/call centre: iVDGL, Teragrid, Nordic grid collabs, GGF production grids area • Helpdesk tools • Certification process for operating environments • Upgrade procedures • Configuration management • Joint OS version certification • Packaging, installation – inc applications • User management • Security etc. • Fabric management (see LCCWS) • Etc. Ian.Bird@cern.ch
Proposal • HEPiX is already (a lot of) the right people • Already, or soon to be, deploying LCG and other grids in their computer centres • Keep LCCWS associated with HEPiX • Add a Grid Coordination/LCG interest group – like HEPNT or Storage • To address themes and issues of common interest • Encourage new people to attend • Line up specific talks by selected people to address issues and to propose activities to follow on • We need to solve the problems – not just talk about them • Needs a coordinator & agenda to make sure this happens – • Volunteers? Ian.Bird@cern.ch