1 / 78

High-Performance Grid Computing and Research Networking

High-Performance Grid Computing and Research Networking. Grid Computing. Presented by Selim Kalayci Instructor: S. Masoud Sadjadi http://www.cs.fiu.edu/~sadjadi/Teaching/ sadjadi At cs Dot fiu Dot edu. Acknowledgements.

deana
Télécharger la présentation

High-Performance Grid Computing and Research Networking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High-Performance Grid Computing and Research Networking Grid Computing Presented by Selim Kalayci Instructor: S. Masoud Sadjadi http://www.cs.fiu.edu/~sadjadi/Teaching/ sadjadi At cs Dot fiu Dot edu

  2. Acknowledgements • The content of many of the slides in this lecture notes have been adopted from the online resources prepared previously by the people listed below. Many thanks! • Henri Casanova • Principles of High Performance Computing • http://navet.ics.hawaii.edu/~casanova • henric@hawaii.edu • Ian Foster • Presentations&Tutorials from • www.globus.org

  3. Agenda • Grid Computing • Grid Middleware - Globus • Security in Globus • Data Management • Execution Management • Monitoring • Metaschedulers - Gridway

  4. Multiple Computers • Adding CPUs to a single computer becomes very expensive • How about multiple computers together? • Linux Clusters (60% of Top-500 list) Blue/Gene: 30K computers

  5. Campus Machine Room Nation Beyond the machine room? • Need more capacity than available at (most) single sites • Everyone would like a 10K-node 100GHz cluster • Very expensive (cooling, power) • More economical to have multiple sites • Need to locate available resources now • Data/Instruments are inherently distributed

  6. A dynamicmulti-institutional network of computers that come together to share resources for the purpose of coordinatedproblem solving. Grid Computing resource application institutional boundary • Achieved through: • Open general-purpose protocols • Standard interfaces

  7. Layers in Grid

  8. A Grid Checklist • coordinates resources that are not subject to centralized control … • … using standard, open, general-purpose protocols and interfaces … • … to deliver nontrivial qualities of service. • Virtual Organizations • Group of individuals or institutions defined by sharing rules to share the resources of “Grid” for a common goal. • Example: Application service providers, storage service providers, databases, crisis management team, consultants.

  9. How is a grid different? • Grids focus on site autonomy • Grids involve heterogeneity • Grids involve more resources than just computers and networks • Grids focus on the user

  10. Agenda • Grid Computing • Grid Middleware - Globus • Security in Globus • Data Management • Execution Management • Monitoring • Metaschedulers - Gridway

  11. Grid Infrastructure • Distributed management • Of physical resources • Of software services • Of communities and their policies • Unified treatment • Build on Web services framework • Use WS-RF, WS-Notification (or WS-Transfer/Man) to represent/access state • Common management abstractions & interfaces

  12. Globus is Open Source Grid Infrastructure • Implement key Web services standards • State, notification, security, … • Software for Grid infrastructure • Service-enable new & existing resources • E.g., GRAM on computer, GridFTP on storage system, custom application services • Uniform abstractions & mechanisms • Tools to build applications that exploit Grid infrastructure • Registries, security, data management, … • Enabler of a rich tool & service ecosystem

  13. GLOBUS TOOLKIT 4 – GT4 • Open source toolkit developed by The Globus Alliance that allows us to build Grid applications. • Organized as a collection of loosely coupled components. • Consists of services, programming libraries, and development tools.

  14. GT Domain Areas • Core runtime • Infrastructure for building new services • Security • Apply uniform policy across distinct systems • Execution management • Provision, deploy, & manage services • Data management • Discover, transfer, & access large data • Monitoring • Discover & monitor dynamic services

  15. GT4 Components

  16. WSRF & WS-Notification • Naming and bindings (basis for virtualization) • Every resource can be uniquely referenced, and has one or more associated services for interacting with it • Lifecycle (basis for fault resilient state mgmt) • Resources created by services following factory pattern • Resources destroyed immediately or scheduled • Information model (basis for monitoring, discovery) • Resource properties associated with resources • Operations for querying and setting this info • Asynchronous notification of changes to properties • Service groups (basis for registries, collective svcs) • Group membership rules & membership management • Base Fault type

  17. Agenda • Grid Computing • Grid Middleware - Globus • Security in Globus • Data Management • Execution Management • Monitoring • Metaschedulers - Gridway

  18. Security Services • Forms the underlying communication medium for all the services • Secure Authentication and Authorization • Single Sign-on • User need not explicitly authenticate himself every time a service is requested • Uniform Credentials • Ex: GSI (Globus Security Infrastructure)

  19. Grid Security Infrastructure - GSI • Grid Security Infrastructure (GSI) • Use GSI as a standard mechanism for bridging disparate security mechanisms • Doesn’t solve trust problem, but now things talk same protocol and understand each other’s identity credentials • Basic support for delegation, policy distribution • Translate from other mechanisms to/from GSI as needed • Convert from GSI identity to local identity for authorization

  20. Grid Security Infrastructure - GSI • Grid Security Infrastructure (GSI) • Based on standard PKI technologies • CAs allow one-way, light-weight trust relationships (not just site-to-site) • SSL protocol or WS-Security for authentication, message protection • X.509 Certificates for asserting identity • for users, services, hosts, etc. • Proxy Certificates • GSI extension to X.509 certificates for delegation, single sign-on

  21. Gridmap file • A gridmap file at each site maps the grid id of a user to a local id • The grid id of the user is his/her subject in the grid user certificate • The local id is site-specific; • multiple grid ids can be mapped to a single local id • Usually a local id exists for each VO participating in that grid effort • The local ids are then used to implement site specific policies • Priorities etc.

  22. Gridmap file entry • The gridmap-file is maintained by the site administrator • Each entry maps a Grid DN (distinguished name of the user; subject name) to local user names # #Distinguished Name Local username # “/DC=org/DC=doegrids/OU=People/CN=Laukik Chitnis 712960” ivdgl “/DC=org/DC=doegrids/OU=People/CN=Richard Cavanaugh 710220” grid3 “/DC=org/DC=doegrids/OU=People/CN=JangUk In 712961” ivdgl “/DC=org/DC=doegrids/OU=People/CN=Jorge Rodriguez 690211” osg

  23. How to create and use an Identity (1) • Run the below command to generate a personal grid identity certificate. grid-cert-request • This will create the following files in $HOME/.globususercert_request.pem (request to sign certificate)userkey.pem (private key - encrypted)usercert.pem (public key - signed)

  24. How to create and use an Identity (2) • After you have created the request then you need to mail it to the local certificate authority: cat $HOME/.globus/usercert_request.pem | mail skala001@cis.fiu.edu (or dvill013@cs.fiu.edu) • Then the CA will mail you back a signed certificate which you will want to put into $HOME/.globus/usercert.pem(it can take up to a day for the CA to process the request)

  25. Commands to log in / logout • grid-proxy-init • This "logs you into" the globus system. • grid-proxy-info • Use this to see your status. • grid-proxy-destroy • Use this to log out. • A proxy is like a temporary ticket to use the Grid, default in the above case being 12 hours. • Once this is done, you should be able to run “grid jobs” • globus-job-run site-name command

  26. Agenda • Grid Computing • Grid Middleware - Globus • Security in Globus • Data Management • Execution Management • Monitoring • Metaschedulers - Gridway

  27. GT4 Data Management • Stage/move large data to/from nodes • GridFTP, Reliable File Transfer (RFT) • Alone, and integrated with GRAM • Locate data of interest • Replica Location Service (RLS) • Replicate data for performance/reliability • Distributed Replication Service (DRS) • Provide access to diverse data sources • File systems, parallel file systems, hierarchical storage: GridFTP • Databases: OGSA DAI

  28. GridFTP • What is GridFTP? • A secure, robust, fast, efficient, standards based, widely accepted data transfer protocol • A Protocol • Multiple independent implementations can interoperate • This works. Both the Condor Project at Uwisand Fermi Lab have home grown servers that work with ours. • Lots of people have developed clients independent of the Globus Project. • We also supply a reference implementation: • Server • Client tools (globus-url-copy) • Development Libraries

  29. Globus-url-copy • GridFTP-compliant client from the Globus team • Copy files from one URL to another URL • One URL is usually a gsiftp:// URL • Another URL is usually a file:/ URL • To move a file from remote GridFTP-enabled server to local machine % globus-url-copy gsiftp://gcb.fiu.edu/tmp/jt file:/home/skala001/jt • To put file onto server reverse URLs % globus-url-copy file:/home/skala001/jt gsiftp://gcb.fiu.edu/tmp/jt • Monitor performance using –vb flag % globus-url-copy -vb gsiftp://gcb.fiu.edu/tmp/jt file:/home/skala001/jt

  30. Reliable File Transfer - RFT • WSRF compliant Fault-tolerant, High- performance data transfer service • Soft state. • Notifications/Query • Reliability on top of high performance provided by GridFTP. • Fire and Forget. • Integrated Automatic Failure Recovery. • Network level failures. • System level failures etc. • Essentially a Data transfer scheduler with FIFO as a Queue Policy.

  31. IPCReceiver DataChannel DataChannel MasterDSI SlaveDSI Protocol Interpreter SlaveDSI Protocol Interpreter Data Channel MasterDSI IPCReceiver Data Channel IPC Link IPC Link RFT RFT Client SOAP Messages Notifications(Optional) RFT Service GridFTP Server GridFTP Server

  32. Agenda • Grid Computing • Grid Middleware - Globus • Security in Globus • Data Management • Execution Management • Monitoring • Metaschedulers - Gridway

  33. Execution Management • Common WS interface to schedulers • Unix, Condor, LSF, PBS, SGE, … • More generally: interface for process execution management • Lay down execution environment • Stage data • Monitor & manage lifecycle • Kill it, clean up • A basis for application-driven provisioning

  34. Grid Job Management Goals Provide a service to securely: • Create an environment for a job • Stage files to/from environment • Cause execution of job process(es) • Via various local resource managers • Monitor execution • Signal important state changes to client • Enable client access to output files • Streaming access during execution

  35. GRAM • GRAM:Globus Resource Allocation and Management • GRAM is a Globus Toolkit component • For Grid jobmanagement • GRAM is a unifying remote interface to Resource Managers • Yet preserves local site security/control • GRAM is for stateful job control • Reliable operation • Asynchronous monitoring and control • Remote credential management • File staging via RFT and GridFTP

  36. GT4 WS GRAM Architecture Service host(s) and compute element(s) Job events SEG GT4 Java Container Compute element GRAM services Local job control GRAM services Local scheduler Job functions sudo GRAM adapter Delegate Transfer request Client Delegation Delegate GridFTP User job RFT File Transfer FTP control FTP data Remote storage element(s) GridFTP

  37. GT4 WS GRAM Architecture Service host(s) and compute element(s) Job events SEG GT4 Java Container Compute element GRAM services Local job control GRAM services Local scheduler Job functions sudo GRAM adapter Delegate Transfer request Client Delegation Delegate GridFTP User job RFT File Transfer FTP control FTP data Remote storage element(s) GridFTP Delegated credential can be: Made available to the application

  38. GT4 WS GRAM Architecture Service host(s) and compute element(s) Job events SEG GT4 Java Container Compute element GRAM services Local job control GRAM services Local scheduler Job functions sudo GRAM adapter Delegate Transfer request Client Delegation Delegate GridFTP User job RFT File Transfer FTP control FTP data Remote storage element(s) GridFTP Delegated credential can be: Used to authenticate with RFT

  39. GT4 WS GRAM Architecture Service host(s) and compute element(s) Job events SEG GT4 Java Container Compute element GRAM services Local job control GRAM services Local scheduler Job functions sudo GRAM adapter Delegate Transfer request Client Delegation Delegate GridFTP User job RFT File Transfer FTP control FTP data Remote storage element(s) GridFTP Delegated credential can be: Used to authenticate with GridFTP

  40. A Simple Example • Command example: % globusrun-ws -submit -c /bin/date Submitting job...Done.Job ID: uuid:002a6ab8-6036-11d9-bae6-0002a5ad41e5Termination time: 01/07/2005 22:55 GMTCurrent job state: ActiveCurrent job state: CleanUpCurrent job state: DoneDestroying job...Done. • A successful submission will create a new ManagedJob resource with its own unique EPR for messaging • Use –o option to create the EPR file % globusrun-ws -submit –o job.epr -c /bin/date

  41. A Simple Example(2) • To see the output, use –s (stream) option % globusrun-ws -submit –s -c /bin/date Termination time: 06/14/2007 18:07 GMT Current job state: Active Current job state: CleanUp-Hold Wed Jun 13 14:07:54 EDT 2007 Current job state: CleanUp Current job state: Done Destroying job...Done. Cleaning up any delegated credentials...Done. • If you want to send the output to a file, use –so option % globusrun-ws -submit –s –so job.out -c /bin/date … % cat job.out Wed Jun 13 14:07:54 EDT 2007

  42. A Simple Example(3) Submitting your job to different schedulers • Fork % globusrun-ws -submit -Ft Fork -s -c /bin/date (Actually, the default is Fork. So, you can skip it in this case.) • SGE % globusrun-ws -submit -Ft SGE -s -c /bin/hostname

  43. Batch Job Submissions % globusrun-ws -submit -batch -o job_epr -c /bin/sleep 50Submitting job...Done.Job ID: uuid:f9544174-60c5-11d9-97e3-0002a5ad41e5Termination time: 01/08/2005 16:05 GMT % globusrun-ws -status -j job_eprCurrent job state: Active % globusrun-ws -status -j job_eprCurrent job state: Done % globusrun-ws -kill -j job_eprRequesting original job description...Done.Destroying job...Done.

  44. Complete Factory Contact • Override default EPR • Select a different host/service • Use “contact” shorthand for convenience • Relies on proprietary knowledge of EPR format! • Command example: % globusrun-ws -submit –F gcb.fiu.edu\-c /bin/date

  45. Read RSL from File • Command: % globusrun-ws -submit -f touch.xml • Contents of touch.xml file: <job> <executable>/bin/touch</executable> <argument>touched_it</argument></job>

  46. Resource Specification Language (RSL) • RSL is the language used by the clients to submit a job. • All job submission requests are described in RSL, including the executable file and arguments. • You can specify the type and capabilities of resources to execute your job. • You can also coordinate Stage-in and Stage-out operations through RSL.

  47. Common/useful options • globusrun-ws -J • Perform delegation as necessary for job • globusrun-ws -S • Perform delegation as necessary for job’s file staging • globusrun-ws -s • Stream stdout/err during job execution to the terminal • globusrun-ws -self • Useful for testing, when you have started the service using your credentials instead of host credentials

  48. Staging job <job><executable>/bin/echo</executable><directory>/tmp</directory><argument>Hello</argument><stdout>job.out</stdout><stderr>job.err</stderr><fileStageOut> <transfer> <sourceUrl>file:///tmp/job.out</sourceUrl> <destinationUrl> gsiftp://host.domain:2811/tmp/stage.out </destinationUrl> </transfer></fileStageOut> </job>

  49. RSL Variable • Enables late binding of values • Values resolved by GRAM service • System-specific variables • ${GLOBUS_USER_HOME} • ${GLOBUS_LOCATION} • ${GLOBUS_SCRATCH_DIR} • Alternative directory that is shared with compute node • Typically providing more space than user’s HOME dir

  50. RSL Variable Example <job><executable>/bin/echo</executable><argument>HOME is ${GLOBUS_USER_HOME}</argument><argument>SCRATCH = ${GLOBUS_SCRATCH_DIR}</argument><argument>GL is ${GLOBUS_LOCATION}</argument><stdout>${GLOBUS_USER_HOME}/echo.stdout</stdout><stderr>${GLOBUS_USER_HOME}/echo.stderr</stderr> </job> !!!/tmp/rslExample

More Related