1 / 25

Jens Thomas STFC Daresbury Laboratory j.m.h.thomas@dl.ac.uk

All Hands Meeting September 10 th -13 th 2007 Nottingham Experiences with different middleware solutions on the NW-GRID. Jens Thomas STFC Daresbury Laboratory j.m.h.thomas@dl.ac.uk. Overview. General background The NW-GRID Middleware solutions and our experiences with them Globus

touchet
Télécharger la présentation

Jens Thomas STFC Daresbury Laboratory j.m.h.thomas@dl.ac.uk

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. All Hands MeetingSeptember 10th-13th 2007NottinghamExperiences with different middleware solutions on the NW-GRID Jens Thomas STFC Daresbury Laboratory j.m.h.thomas@dl.ac.uk

  2. Overview • General background • The NW-GRID • Middleware solutions and our experiences with them • Globus • Nordugrid • Tools that build on the middleware • Application Hosting Environment (AHE) • GROWL scripts • Integrating the solutions into the CCP1GUI • Conclusions

  3. Where it's at • There is an abundance of heterogeneous computing resources out there • Usage of the larger machines usually requires detailed knowledge of the system and often lots of red tape to gain access • User must manually log on to run jobs and is responsible for manually handling data transfer/management • As well as being inefficient, this requires the user be extremely computer literate

  4. Where middleware fits in • The middleware creates an abstraction to the interface to each computer so there is a uniform way of accessing the different resources • Authentication is mediated by a Grid Certificate, so there is a single sign-on processes – no more worries about remembering passwords on dozens of machines • User can sit on their (generally Unix) desktop and access a range of resources via a single interface

  5. NW-GRID Aims and Partners • Aims: • Establish, for the NorthWest region, a world-class activity in the deployment and exploitation of Grid middleware • realise the capabilities of the Grid in leading edge academic, industrial and business computing applications • Leverage 100 posts plus £15M of additional investment • Project Partners: • Daresbury Laboratory: CSED and e-Science Centre • Lancaster University: Management School, Physics, e-science and computer science • University of Liverpool: Physics and Computer Services • University of Manchester: Computing, Computer Science, Chemistry, bio-informatics + systems biology • Proudman Oceanographic Laboratory, Liverpool

  6. Hardware – 2006 procurement • From Sun / Streamline computing • Dual core, dual processor AMD Opteron nodes (with at least 8 GB of memory / node) • 96 nodes – Daresbury • 48 nodes – Lancaster • 44 nodes – Liverpool • 25 nodes – Manchester • 8 TB Panasas file servers at Daresbury, Lancaster and Liverpool • 2.8 TB RAID array at Manchester • Separate data and communications GigE interconnect.

  7. Globus • A powerful middleware solution for linking disparate computing resources together • Open source so freely available to academia and industry • The most popular solution at the moment and widely used in many countries • Provides a range of tools for managing and moving data, submitting jobs, discovering resources, security etc. • High performance and standards based • Under active development • Used on both the NGS and the NW-GRID http://www.globus.org

  8. Globus in practice • Server and Client only install on *nix machines and the installation is extremely awkward - although this has improved • Different versions in use 2.4, 4.0 (3?) • Need a bunch of ports open on both client and server • No resource discovery in the command-line tools • User needs to manually stage data to and from the machine • Command line interface not very practical and need extensive knowledge of the resource (e.g. absolute paths,environment var) • Error reporting is pretty poor myhost> globus-job-submit dl1.nw-grid.ac.uk/jobmanager-sge -x (count="4")(directory="/panfs/dl/home/jmht/")(stdout="/panfs/dl/home/jmht/unnamed.out")(stdin="/panfs/dl/home/jmht/unnamed.in")(jobtype="mpi") /panfs/dl/home/jmht/gamess

  9. The NorduGrid Collaboration From ... ... To • EDG >ARC • Testbed >50 sites • HEP +Bio,Chem.,.. • 4 Nordic >13 countries • 20 cpu’s >5000 cpu’s • 2001 >2003 NB: Slides from the Nordugrid website ... from a research project to a research collaboration ...from a Grid testbed to a major middleware provider NOT an infrastructure, does not operate or control resources

  10. Features • Builds on top of Globus 2.4, but extends it's functionality to provide a powerful working solution (but only on *nix) • Job monitoring and management • Seamless input/output data movement • Complete up-to-date information on the available resources • Serial batch job submission to best resources available • Matchmaking, brokering • Basic data management • Indexing, movement • Easy to install both server and client

  11. Nordugrid in practice • Firewall friendly-just join a VO that provides the resources you need • Lightweight client easily installed via rpms/dpkg • Integrated resource discovery • All application providers must provide an agreed environment, so you know how your job will run. • Data transfer integrated into the job - no manual staging needed • Relatively good error reporting & can request the whole run directory be returned • A command-line client so more typing at the shell • Extends RSL so a powerful, but complex job file: & (executable=hellogrid.sh) (stdout=hello.out) (stderr=hello.err) (gmlog=gridlog) (cputime=10) (memory=200) (disk=1)(runTimeEnvironment=“APPS/CHEM/GAMESS-UK-7.0-1.0”)

  12. The problem • Globus and Nordugrid are powerful technologies that, used correctly, can greatly aid working scientists, but: • they are developed by computer scientists who are happy talking to computers and dealing with the problems they throw up (firewalls, dependencies, arcane error messages…) • Intended users are scientists who generally aren't interested in becoming computer scientists or bonding with their machines • work still required to make the tools easily usable for a “normal” working scientist with little interest in what happens under the bonnet

  13. GROWL Scripts • Part of the larger GROWL project: www.growl.org.uk • A set of command-line scripts to wrap the globus tools and make them more user friendly (ports, job-string, paths etc) • Alleviates some of the problems with firewalls, but need gsi-ssh access to the resource • Automatically downloads and builds the required libraries on all VDT-supported platforms • A useful tool, but as it builds on Globus, currently only available on *nix

  14. Application Hosting Environment • Thanks to Stefan Zasada for slides/pictures • “Community model”: expert user installs and configures an application to be shared via AHE server (which now installs as part of the OMII stack) • Application (e.g. NAMD, LB3D, LAMMPS, DL-POLY) is a web service so can submit from client on one machine and monitor from another (e.g. PDA) • Provides support for building quite complex workflows across a range of resources/codes. • hosts all knowledge about supported applications, which are not required to be modified in any way • supports Globus 2.4 (4.0?), SGE, Condor & Unicore • builds on WRSF::Lite - applications exposed as a web service so potentially available to other WSRF clients. • WebDav to stage files

  15. AHE Client • written in Java so easy to install and even runs on Windows • Very firewall-friendly • maintains no information about the job and so is mobile. • doesn't need to know anything about the applications • isolated from changes to underlying grid • a GUI so no command-line typing required (although scripting tools are provided)

  16. AHE Summary • A very useful addition to the grid toolkit • Will work with a variety of middleware • Users can submit jobs from any machine and then monitor them from a variety of different platforms • Focuses on the application, so more geared towards the scientist • Runs on most platforms and no firewall issues • Handles file staging • Scriptable, so can create workflows • Server is a serious pain to install • No resource discovery

  17. Good, but could do better… • AHE, GROWL scripts & Nordugrid ARC focus on overcoming the deficiencies of the middleware - they try to make the process of running the job simpler • However, they still require the user to become intimately involved in running the job • Experience shows that even this is enough to put many potential users off taking advantage of the computing power that is out there • In many cases where they have been used successfully, the target scientists are closely linked to grid-savvy developers • We thought we would try to integrate remote job submission into our CCP1GUI to insulate the user from the grid/job handling as far as possible

  18. The CCP1GUI • An extensible Graphical User Interface for computational chemistry packages • Aims to provide a uniform environment to enable users to run a variety of different codes on a range of different resources: • GAMESS-UK, Molpro, Dalton, ChemShell • Provides powerful visualisation capabilities for interpreting the outputs of calculations (builds on VTK) • A freely available code hosted on Sourceforge • http://sourceforge.net/projects/ccp1gui • Has the potential to run on all the major operating system platforms. • Use of Python and an object-oriented design enables rapid development and for users to script the code for themselves

  19. Why was it developed? • Many of the codes used within CCP1 did not have a Graphical User Interface. • Long-standing need for a graphical interface to GAMESS-UK. • Needed something to help students and new users of the codes get up and running more quickly. • Requirement for a simplified environment for constructing and viewing molecules. • Need to be able to visualise the complex results of quantum mechanical calculations • Program should be free so no barriers to its widespread use. • Need a single tool that can be made to to run on a variety of hardware/operating system platforms.

  20. Complex visualisations • Electric field visualisations: TNT and Water

  21. The CCP1GUI Calculation interface Visualistion/builder window Job submit window Job Manager window

  22. The Job Editor (on OSX) • The bare minimum a user need to know: • where to run • what to run • how many processors • For Nordugrid, don't even need to worry about where it's going • For some machines need to know paths and jobmanager, but the GUI will remember the details for each application/machine combination

  23. Features • (Relatively) uniform interface to several different underlying job submission technologies integrated into a familiar environment • Handles all aspects of data movement • Tries to trap as many of the minefield of potential errors and respond with a helpful message ("I can't find your executable on that machine" vs "JOB DONE") • Remembers submitted jobs so can be shut down/restarted • Has been used successfully by acknowledged computer-phobes to run parallel jobs on the NW-GRID • Is largely a proof of concept, but has enabled some real science to get carried out by users who otherwise would be unlikely to access the resources

  24. Summary • Middleware is powerful and evolving technology that presents many opportunities to do good work • Middleware is muddleware as far as most scientists are concerned • Several projects are engaged in un-muddling things using a variety of approaches, but these are all job-centric • We've demonstrated one way to try and abstract things a stage further from the underlying technologies • There's still much to be done but there's lots of potential to help scientists extend what computers can do for them

  25. Acknowledgments • The NW-GRID (Cliff Addison, Tim Franks) • The NGS • The Nordugrid collaboration • Stefan Zasada, Peter Coveney and the AHE team at UCL • John Kewley, Rob Allan and the GROWL team at STFC Daresbury Laboratory • Rik Tyer and the eMinerals team at STFC Daresbury Laboratory • Abbie Trewin and the guniea-pigs at Liverpool University

More Related