460 likes | 590 Vues
The National Grid Service: Towards the UK's e-infrastructure. http://www.ngs.ac.uk. http://www.grid-support.ac.uk. Neil Geddes Director, GOSC Thanks to Stephen Pickles and Andy Richards
E N D
The National Grid Service:Towards the UK's e-infrastructure http://www.ngs.ac.uk http://www.grid-support.ac.uk Neil Geddes Director, GOSC Thanks to Stephen Pickles and Andy Richards The UK's National Grid Service is a project to deploy and operate a grid infrastructure for computing and data access across the UK. This development will be a cornerstone of the development of the UK's "e-Infrastructure" over the coming decade. The goals, current status and plans for the National Grid Service and the Operations Support Centre will be described.
Outline • Overview of GOSC and NGS • Users • registrations, usage, helpdesk queries • Services & Middleware • what we offer today • managing change • The Future • Expansion and “joining” the NGS • Roadmap for the future • Summary
GOSC The Grid Operations Support Centre is a distributed “virtual centre” providing deployment and operations support for the NGS and the wider UK e-Science programme. - started October 2004
GOSC Roles • UK Grid Services • National Services • Authentication, authorization, certificate management, VO management, security, network monitoring, help desk + support centre. • support@grid-support.ac.uk • NGS Services • Job submission, simple registry, data transfer, data access and integration, resource brokering, monitoring and accounting, grid management services, workflow, notification, operations centre. • NGS core-node Services • CPU, (meta-) data storage, key software • Services coordinated with others (eg OMII, NeSC, LCG, EGEE): • Integration testing, compatibility & Validation Tests, User Management, training • Administration: • Security • Policies and acceptable use conditions • SLA’s, SLD’s • Coordinate deployment and Operations
NGS “Today” Interfaces Projects e-Minerals e-Materials Orbital Dynamics of Galaxies Bioinformatics (using BLAST) GEODISE project UKQCD Singlet meson project Census data analysis MIAKT project e-HTPX project. RealityGrid (chemistry) Users Leeds Oxford UCL Cardiff Southampton Imperial Liverpool Sheffield Cambridge Edinburgh QUB BBSRC CCLRC. Nottingham … OGSI::Lite • If you need something else, please say!
http://www.ngs.ac.uk NGS core nodes: Need UK e-Science certificate (1-2 days) Apply through NGS web site (1-2 weeks)
NGS Partner nodes Data nodes at RAL + Manchester Compute nodes at Oxford + Leeds Compute nodes at Cardiff + Bristol Free at point of use Apply through NGS web site Accept terms and conditions of use Light-weight peer review 1-2 weeks To do: project or VO-based application and registration All access is through digital X.509 certificates from UK e-Science CA or recognized peer National HPC services HPCx CSAR Must apply separately to research councils Digital certificate and Conventional (username/ password) access supported Gaining Access
Users registrations, helpdesk and usage
Query Tracking and FAQ’s • Provide first point of contact support • Contact point between other helpdesks - Or provide helpdesk facilities for other sites • Users input queries at a range on places • behind the scenes collaboration - user gets answer back from where they asked • Develop Support relationship with technical expertise at sites
Help Desk http://www.grid-support.ac.uk support@grid-support.ac.uk Certification 54 Savannah 28 NGS 14 SRB 6 General 4 Security 3 GT2 2 Access Grid 2 Internal 1 Project Registration 1 OGSA-DAI 1
User Survey • December 2004 - Query all users who have had accounts > 3 months • 16 responses (out of ~100 users) • 3 papers • AHM04 (2), Phys. Rev. Lett. • 6 conference presentations • AHM04, SC2005, Systems Biology, IPEM(2), MC2004 • ~Bi-annual activity hereafter
End-users • You need a current UK e-Science Certificate • http://ca.grid-support.ac.uk/ • See your local Registration Authority • Complete the application form on the NGS web site, and read the conditions of use: • http://www.ngs.ac.uk/apply.html • Wait 1-2 weeks for peer review • You gain access to all core nodes automatically • Use the NGS and GSC web sites and help-desk • Happy computing!
Projects and VOs • Just need access to compute and data resources for users in your project? • short term, we need applications from individuals • project-based applications will come, currently in requirements gathering phase • if in doubt, talk to us! • Want to host your data on NGS? • consider GridFTP, SRB, Oracle, or OGSA-DAI • NGS maintains infrastructure • you populate and manage data • for OGSA-DAI, work with NGS to validate Grid Data Services
Provisioning services • NGS resources can be used to provision a portal or other service for your community • Deployment and security scenarios are negotiable • NGS policies (core nodes): • your portal can present its own, or a delegated user’s credential to NGS, but tasks should be traceable to initiating end-user • you should not run your own services in user space without prior agreement of NGS and hosting site • we need to know that services are secure, will not jeopardise operation of other NGS services, or consume too much precious resource on head nodes • Talk to us!
NGS Core Services Globus, SRB, Oracle, OGSA-DAI, and others
NGS Core Services: Globus • Globus Toolkit version 2 • GT 2.4.3 from VDT 1.2 • Job submission (GRAM) • File transfer (GridFTP) • Shell (GSI-SSH) • Information Services (MDS/GIIS/GRIS) • Information providers from GLUE schema • Use BDII implementation of MDS2 (as does EGEE)
NGS Core Services: SRB • Storage Resource Broker from SDSC • Location transparent access to storage • Metadata catalog • Replica management • Clients on compute nodes • Servers on data nodes • Issues/to do: • licensing • MCAT replication and failover
NGS Core Services: Oracle • Oracle 9i database • Only on data nodes • Populated by users/data providers • Infrastructure maintained by NGS database administrators • Accessed directly • e.g. Geodise • or via OGSA-DAI
NGS Services: OGSA-DAI • Developed by UK e-Science projects OGSA-DAI and DAIT • OGSA-DQP (Distributed Query Processing) • Experimental service based on OGSI/GT3 on Manchester data node only • containment: 2 cluster nodes reserved for development and production • will consider WS-I and WSRF flavours when in final release • Uses Oracle underneath • User-provided Grid Data Services validated on test system, then transferred to production during scheduled maintenance • Early users from e-Social Science (ConvertGrid) • Established liaisons with OGSA-DAI team
NGS Core Services: other Operated by GOSC for NGS and UK e-Science programme: In production: • Certificate Authority • Information Services (MDS/GIIS) • MyProxy server • Integration tests and database • Cluster monitoring • LCG-VO In testing: • VOMS • EDG Resource Broker • Portal In development: • Accounting • using GGF Usage Record standard for interchange
The Future NGS Expansion Managing Change The Vision Thing
Expansion Resource providers join the NGS by • Defining level of service commitments through SLDs • Adopting NGS acceptable use and security policies • Run compatible middleware • as defined by NGS Minimum Software Stack • and verified by compliance test suite • Support monitoring and accounting Two levels of membership • Affiliation • a.k.a. connect to NGS • Partnership
Affiliation Affiliates commit to: • running NGS-compatible middleware • as defined in NGS Minimum Software Stack • this means users of affiliate’s resources can access these using same client tools they use to access NGS • a well-defined level of service and problem referral mechanisms • SLD approved by NGS Management Board and published on NGS web-site • providing technical, administrative, and security (CERT) contacts • providing an account and mapping for daily compliance tests (GITS++) • accepting UK e-Science certificates • maintaining baseline of logs to assist problem resolution • Resources for whatever users/projects/VO’s they choose to support
Partnership Partners commit to same as affiliates, plus: • making “significant” resources available to NGS users • creation of accounts/mappings • in future, VO support, pool accounts, etc • recognise additional CAs with which UK e-Science programme has reciprocal agreements • publish/provide additional information to support resource discovery, brokering • ability to compile code for computational resources Bristol and Cardiff have been through certification process: • supported by “buddies” and NGS-Rollout list • useful feedback on viability of NGS Minimum Software Stack • Accepted as full partners at recent GOSC Board Meeting
Benefits Affiliation: • NGS brand • certified NGS-compatible • better integrated support for local users who also access NGS facilities • assistance/advice in maintaining NGS-compatibility over time Partnership: • higher brand quality • Membership of NGS Technical Board • either direct, or through regional or functional consortia • Get a say in the technical direction/decisions • NGS brand must be valuable to make this work. • Total cost of ownership • User pressure/requirements Why Bother ? • Total cost of (shared) ownership • Driven by user demand for common interfaces
New Partners • Cardiff • 1000 hours per week on • four eight-processor SGI Origin 300 servers handling throughput work • Myrinet™ interconnect. Each of the Origin servers provides: • 8 64-bit 500MHz MIPS RISC R14000™ processors; • 8GB of system memory; • 12GB of local diskspace. • 1500GB SAN Fibre Channel Storage System • Bristol • Cycle scavenging on a beowulf system: • 20 2.3GHz Athlon processors arranged in 10 dual processor nodes. • There is 240GB of local disk mounted onto the system head node. • Installed with a binary compatible Linux release to Red Hat Enterprise 3. • Uses the Sun Grid Engine workload management system. • Next • Lancaster, White Rose Grid, Edinburgh/ScotGrid …
The Vision Thing • Common tools, procedures and interfaces • Reduce total cost of ownership for providers • Lower threshold for users • Early adopter system for UK research grids • technology evaluation • technology choices • pool expertise • drive interface standards and requirements • …
Regional and Campus grids Community Grids VRE, VLE, IE HPCx + HECtoR LHC ISIS TS2 UK e-Infrastructure Users get common access, tools, information, Nationally supported services, through NGS Integrated internationally
Maintaining Compatibility • Operating a production grid means valuing robustness and reliability over fashion. • NGS cares about: • alignment/compatibility with leading international Grid efforts • special requirements of UK e-Science community • easy migration/upgrade paths • proven robustness/reliability • based on standards or standards-track specifications • NGS cannot support everything • Everyone wants service-oriented grids • but still settling out: WS-I, WS-I+, OGSI, WSRF, GT3, GT4, gLite • Caution over OGSI/WSRF has led to wide convergence on GT2 for production grids and hence some inter-Grid compatibility • but there are potentially divergent forces at work • Significant changes to NGS Minimum Software Stack will require approval by NGS Management Board on conservative time scales
Strategic Framework • GOSC/NGS UK e-Science project • support other UK (e-)science projects • International Compatibility • EGEE • European infrastructure (and possible funding) • LHC at most UK universities • only user group who want to build the grid • GridPP committed to common w/s plan in 2005 • GEANT • Others • TeraGrid – US cyberinfrastructure $$$ (unlikely to pay us) • Open Science Grid – will develop compatibility with LCG • RoW e.g. China • Want use other software, but must be EGEE compatible • Also driven by user requirements • Sets framework for relationship with OMII and others • Other factors • JISC and Shibboleth
gLite and Web-Services • EGEE is about production, not R&D • EGEE has to deploy production quality middleware now • We believe that Web-Services will be a key technology for gLite (EGEE Grid middleware) • Need to convince users (SOAP performance!) • Since standards haven’t solidified yet, EGEE is however taking a cautious approach towards WS-* • No WSRF, Not even WS-Addressing • Not a problem in a LCG2 (close community) • We are committed to WS-I (Basic Profile) compliance to maximise interoperability • Benefit to users not apparent now • More WS-* standards will be used as their maturity is demonstrated GOSC Management Board - NGS Status
LCG/EGEE Resources: Feb 2005 • Country providing resources • Country anticipating joining • In LCG-2/EGEE: • 113 sites, 30 countries • >10,000 cpu • ~5 PB storage • Includes non-EGEE sites: • 9 countries • 18 sites
Applications • HEP Applications • Biomed Applications • imaging, drug discover • mri simulation • protein sequence analyis • Generic Applications • Earth Observation, Seismology, Hydrology, Climate, Geosciences • Computational Chemistry • Astrophysics • Applications “behind the corner” • R-DIG • BioDCV EGEE Third Conference, Athens, 19.04.2005
Earth Science Achievements ES: Earth Observation, Seismology, Hydrology, Climate, Geosciences • 12 Institutes, 1 Organisation, 2 Private companies • ESR (Earth Sciences Research) VO at SARA 23 registered users from 8 countries (CH, D, F, I, NL, SK, Ru) + ~8 asking certificates. • EGEODE (Expanding GEOsciences on DEmand) VO at IN2P3 (Lyon), 5 registered users. Highligths • Retrieval of 1 year of Ozone profiles from Satellite GOME data with NNO algorithm i.e. 6746 orbits in 181 jobs: success rate 100% • Validation of 7 years of GOME Ozone profiles retrieved with 2 Versions of NNO algorithms and several months of OPERA i.e. 228000 files • Determination of Earth Quake mechanisms for 5 recent ones – one case 24h after its occurrence (Challenge fulfilled ) • Successful run of a complex MPI application on 5 sites (CCG, CPPM, LAL, NIKHEF, and SCAI) with 16 CPUs; this application ran with >1000CPUs good benchmark EGEE Third Conference, Athens, 19.04.2005
Earth Science Achievements • Water management of coastal water in Mediterranean area: transfer application from Gilda to EGEE; other application under development • Flood prediction: difficulty to transfer application from CrossGrid to EGEE • Climate: Different technologies for secure (meta-)data access evaluated and first tests using ERA40 data and a climate data operator package performed. • Geosciences: nearly complete deployment of Geocluster (400 modules); • Home-made solution for license management • Requirements: • Data, metadata and license : security and restriction access • Web-service based interface : example -difference with Cross Grid • Accounting • MPI: homogeneous environments, more CPUs EGEE Third Conference, Athens, 19.04.2005
Process for Moving Forward • New developments evaluated by ETF • must have some longer term support likely • User requests treated on case by case basis • NGS Technical Board consider against needs • user demand • new functionality • improved functionality • improved security/performace/managability • Proposal brought to GOSC Board • Prepared by GOSC “executive” • N.Geddes, S.Pickles, A.Richards, S.Newhouse
“User requests treated on case by case basis” • Already see users running web services in user space • Exactly what we want … but … • Potential security risks • Change Conditions of Use to reflect user responsibilities • Require secured web services (X509) • Encourage auditing and audit trails • With time limits • Services run “at risk” • Services lead to significant system load • run on head node (or other specialised node) • Full support only when “approved”
Expectations • Little demand for GT4 • expect usability much better than GT3 • Watching brief, OGSA-DAI or TeraGrid users may drive this • Indications of Glite improvements in • VOMS, RB, shell, File i/o, data catalog • unlikely to have full ETF assessment by End March • unlikely to all be robust before Q4 • OMII job submission • expect good usability, limited functionality • run as user services • problems integrating into Resource Broker and Accounting ? • Net result likely to be a vague w/s plan • Hopefully able to focus on some key components
Summary TODAY • 4 core nodes operational • 2+2 Partners • 150 Users registered (50 since 1 September ’04) • Grid enabled – Globus v2 (VDT distribution v1.2) at present • BDII information service (GLUE + MDS Schemas) • Data Services – SRB, Oracle, OGSA-DAI • Growing base of user applications • MyProxy and CA services • VO Management Software – LCG-VO • User support: Helpdesk Next… • Other Middleware [gLite/OMII etc…] • NGS Portal • Resource Broker • SRB production service • Accounting • Continued expansion • Providing computing, data, and facility access for a wide range of users
EDS is Transforming Clients to Agile Enterprise – Virtualised Computing Platform EDS Services Transition Roadmap Step 6: Grid Reduce Risk Step 5: Utility Service • Agility Drivers • Standards • Visibility • Quality • Security • Efficiency Improve Utilisation Improve Scalability, Service Quality/Levels, Productivity & more Step 4: Virtual Service Suite Step 3: Automated Operations & Managed Storage Reduce TCO Step 2: Consolidate (Server, Network, Storage, etc) Step 1: Migrate & Manage (Regional Facilities) GOSC Management Board - NGS Status
The GOSC Board Director, GOSC (Chair) Neil Geddes Technical Director, GOSC Stephen Pickles Collaborating Institutions CCLRC Prof. Ken Peach Leeds Prof. Peter Dew Oxford Prof. Paul Jeffreys Manchester Mr. Terry Hewitt Edinburgh/NeSC Prof. Malcolm Atkinson UKERNA Dr. Bob Day London College tbd ETF Chair Dr. Stephen Newhouse GridPP Project Leader Prof. Tony Doyle OMII Director Dr. Alistair Dunlop EGEE UK+I Federation Leader Dr. Robin Middleton HEC Liaison Mr. Hugh Pilcher-Clayton Also invited e-Science User Board Chair. Prof. Jeremy Frey Director, e-Science Core Programme Dr. Anne Trefethen