1 / 31

Environmental eScience 2 Martin Dove, Martin Keegan, Stuart Ballard and Mark Calleja

Environmental eScience 2 Martin Dove, Martin Keegan, Stuart Ballard and Mark Calleja National Institute for Environmental eScience and Department of Earth Sciences, University of Cambridge. Elements of escience: a recap.

ady
Télécharger la présentation

Environmental eScience 2 Martin Dove, Martin Keegan, Stuart Ballard and Mark Calleja

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Environmental eScience 2 Martin Dove, Martin Keegan, Stuart Ballard and Mark Calleja National Institute for Environmental eScience and Department of Earth Sciences, University of Cambridge

  2. Elements of escience: a recap • Access to, and exploitation of, distributed computing resources: “grid computing” • Seamless secure access to resources • Location-independent access to data • Well-described information about the data (metadata) • Secure access • Cross-institute collaborative environment: concept of the “virtual organisation”

  3. Grid computing, distributed data, and the virtual organisation

  4. Flow of talks: where we are heading The aim is to create a virtual organisation with access to shared computing and data resources Security (certificates) Tools for pooling local shared resources (condor) Middleware for grid computing (Globus) Portals for data and computing Distributed data (storage resource broker) XML Collaborative tools (access grid)

  5. Flow of talks: where we are heading The aim is to create a virtual organisation with access to shared computing and data resources Security (certificates) Tools for pooling local shared resources (condor) Middleware for grid computing (Globus) Portals for data and computing Distributed data (storage resource broker) XML Collaborative tools (access grid)

  6. Computational grids • High-throughput vs high-performance computing: • Large single calculations, or many repeat calculations? • Parallelise the calculation or the study? • Do you have large memory requirements? • Do you need fast connections between processors? • Do you need all your results? Computational grids provide new opportunities for high-throughput calculations

  7. Development of computational grids • Several original ideas: • Linking supercomputers to share large calculations • Using spare computer cycles to significantly increase the amount of useful computer time • Sharing resources leads to the virtual organisation

  8. Condor technology: accessing idle computer power

  9. Condor http://www.cs.wisc.edu/condor Idea developed from 1988 in Wisconsin, based on earlier “Remote Unix” project Condor arose from the transition from “mainframe computing” to “workstation computing” to “desktop computing”

  10. Condor technologies Mature-ish technology to build small or large distributed computing systems from standard desktop computers • “Grabbing extra computer power from idle processors” • Ideal for “high throughput computing” rather than “high performance computing” • Can be used to control dedicated clusters as well as idle machines • Will handle heterogeneous systems

  11. Condor technology: accessing idle computer power Master node: Handles job submission and returns Slave nodes: Run the jobs

  12. What does condor offer? • Usual batch queuing, scheduling, resource management etc, for both serial and parallel tasks • Matches resources to requirements automatically • Handles transfer of jobs between machines (deliberately or accidentally) using checkpointing • Users do not need individual login identification • Can be used for purpose-built clusters, or for office/lab resources • Recognises distributed ownership constraints

  13. Access to data on condor pool • Request issued, which activates relevant CGI script on webserver. • Queries for pool info are handled by the agent on the central manager • If files being produced by a job are required, then a password protected page gives access to agent on relevant machine. Output returned to client’s web browser, or can save straight to disk. centralmanager 2 1 external user 3 webserver (rock)

  14. UCL windows condor pool ~900 1 GHz P4 windows machines in teaching clusters – mostly underused

  15. UCL windows condor pool • Runs Windows Terminal Server • All 90%+ underutilised and running 24/7… • We are building this as a large condor pool with UCL Information Systems group to use as a massive distributed computing system • In 6 months we had extracted 73 processor-years of computation • This has attracted interest from other Universities: expect to see this model used on many campuses over next few years

  16. Example of condor-based study Calcite undergoes an order–disorder phase transition at high temperature involving rotations of the carbonate molecular ions – we have studied this in detail as a function of temperature over a range of pressures using molecular dynamics simulations. We used condor on the UCL cluster in order to generate data for many temperatures

  17. Example of grid-based study Calcite, CaCO3

  18. Example of grid-based study Calcite, CaCO3

  19. Example of grid-based study Calcite, CaCO3

  20. Example of grid-based study

  21. Condor pools A condor pool consists of compute resources on a single network using a single manager It is possible to link condor pools together: “flocking” But in some senses Condor technologies only pick up part of the idea of grid computing

  22. The Globus project • A wider grid will include • Resource sharing between institutes • Issue of security • Resource discovery • Tools for handling data as well as computations The Globus project grew out of the I-way demonstration of linking US supercomputers in 1995 However, it is not yet as mature as Condor

  23. The Globus toolkit • The Globus toolkit provides a secure (uses encryption and X509 certificates) access point for underlying resources. • For example, in Cambridge, one computer runs our Globus gatekeeper and outsiders who want to use our facilities (e.g. the Condor pool) must submit jobs to it. • Globus authenticates both the user and the machine they’re coming from. • It then passes the request to be handled by the relevant job scheduler • Very simple to administer, though tricky to install. • Flaky in places, and lacks some useful functionality – it is still a tool in development

  24. A simple Globus job • First start a proxy; this will service challenges to my identity: tempo 1% grid-proxy-init Your identity:/C=UK/O=eScience/OU=Cambridge/L=UCS/CN=martin dove Enter GRID pass phrase for this identity: Creating proxy ................................... Done Your proxy is valid until: Sat Feb 14 03:10:40 2004 • Now run a command on a remote gatekeeper tempo 2% globus-job-run silica.esc.cam.ac.uk/jobmanager /bin/date Fri Feb 13 15:14:16 GMT 2004 • Note that I didn’t have to specify my identity – that’s the proxy’s job

  25. Limitations and solutions • Globus client commands are clunky for anything but basic requests. • Ideally we’d like to wrap them in a nice web interface (e.g. a portal, more of which later). • Condor comes with a client tool for submitting jobs to remote Globus gatekeepers, called Condor-G, which has many benefits, including job handling (e.g. failure recovery) • One Globus limitation is that when the remote job is meant for a Condor pool, you can’t get all your output back! • There are hacks around these limitations

  26. Workflows with DAGMan • OK, so we can submit jobs anywhere on our grid. Can we make workflows out of them? • An example of a workflow would be: run jobs A and B, and use the results of these jobs in order to run job C • Hence my task is made of many smaller, inter-dependent, jobs. • DAGMan (Directed Acyclic Graph Manager) is a meta-scheduler for Condor. It allows workflows to be submitted using the usual Condor/Condor-G client tools.

  27. eMinerals minigrid GIIS GIIS GIIS GIIS GRIS GRIS GRIS GRIS Grid Information Index Service UK e-Science GIIS e-Minerals GIIS … UCL Cambridge CCLRC UK e-Science Grid e-Minerals Mini Grid Grid Information Resource Service

  28. XML for exchange of data Exchange of data between programs is a major problem Example: search in program output for values of temperature at each time step in a simulation, where the number of lines between each time step output may not be constant Parsing is one solution, but is very messy XML (eXtensible Markup Language) is one strong emerging solution

  29. XML for exchange of data The idea of XML is to use tags to describe the data – elements with attributes Example: • <lecture_list> • <Friday> • <lecturer name=“Martin Dove”\> • <\Friday> • <\lecture_list>

  30. <?xml version="1.0" standalone="yes"?> <cml> <metadataList> <metadata name="version" value="SIESTA 1.3 -- [Release] (30 Jul 2003)"/> <metadata name="Arch" value="intel-nolibs"/> <metadata name="Flags" value="ifc -tpp5 -O2 -w -mp -Vaxlib -O"/> </metadataList> <step type="CG"> <lattice dictRef="siesta:lattice" spaceType="real"> <latticeVector units="bohr" dictRef="cml:latticeVector">75.589 0.000 0.000</latticeVector> <latticeVector units="bohr" dictRef="cml:latticeVector">0.000 75.589 0.000</latticeVector> <latticeVector units="bohr" dictRef="cml:latticeVector">0.000 0.000 75.589</latticeVector> </lattice> <molecule> <atomArray> <atom elementType="C" id="a1" x3="9.14000000" y3="4.13600000" z3="0.00000000"/> <atom elementType="C" id="a2" x3="10.41000000" y3="3.40700000" z3="0.00000000"/> <atom elementType="C" id="a3" x3="10.41000000" y3="2.07300000" z3="0.00000000"/> ..... <atom elementType="Cl" id="a21" x3="11.90200000" y3="1.20900000" z3="0.00000000"/> <atom elementType="Cl" id="a22" x3="11.90200000" y3="4.27000000" z3="0.00100000"/> </atomArray> </molecule> <propertyList> <property dictRef="siesta:Eions"> <scalar units="eV">7375.317349</scalar> </property> <property dictRef="siesta:Ena"> <scalar units="eV">1654.382113</scalar> </property> <property dictRef="siesta:Ekin"> <scalar units="eV">2401.682384</scalar> </property> <property dictRef="siesta:Enl"> <scalar units="eV">10.825807</scalar> </property> <property dictRef="siesta:DEna"> <scalar units="eV">0.000009</scalar> </property>

  31. Why XML? • XML is free, open and standard • XML is simple and extensible • XML files can be validated • Check that a file contains the correct information • Specify that parameter values are “sensible”, or provide default values • XML can be transformed into other formats (eg HTML) • XML is Modular (Namespaces) • Integrate XHTML, MathML, anyML, SVG seamlessly, in the same document, without breaking any software.

More Related