1 / 37

Grids and Condor Barcelona, 2006

This tutorial explores the extended user tutorial of Condor, Java programs, DAGMan, Stork, and MW Grid Computing. It includes a discussion of application needs and resources available.

glendao
Télécharger la présentation

Grids and Condor Barcelona, 2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grids and CondorBarcelona, 2006

  2. Agenda • Extended user’s tutorial • Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing • Case studies, and a discussion of your application‘s needs

  3. Resources • There are many resources (machines) in the world, and many are or can be made available! • Groups of machines may be labeled as grids • Welcome to the power of the grid !

  4. Condor and Grids • Condor has always been a tool to harness grid computing • Condor’s mechanisms have evolved as technologies have evolved. Roughly categorized: • Flocking • Glidein • The grid universe

  5. Flocking • A way for jobs to run within a different, separate Condor pool • Condor runs here, and Condor runs there there here

  6. Connect Condor Poolswith Flocking • Flocking is a Condor-specific technology • Flocking is enabled with configuration • Jobs flock from here to there when they cannot be run here due to lack of available machines

  7. Configuration • Configuration files contain lots of the administrative information used by Condor • Format is like that in submit description files: AttributeName = Value

  8. Configuration here • For jobs to be able to flock from here to there • In the configuration file on the pool where jobs flock from: FLOCK_TO = <central manager machine name> FLOCK_COLLECTOR_HOSTS = $(FLOCK_TO) FLOCK_NEGOTIATOR_HOSTS = $(FLOCK_TO) HOSTALLOW_NEGOTIATOR_SCHEDD = $(COLLECTOR_HOST), $(FLOCK_NEGOTIATOR_HOSTS)

  9. Configuration there • In the configuration file on the pool where jobs flock to: FLOCK_FROM = <submit machine name>, . . . , <submit machine name> • To make security work: HOSTALLOW_WRITE_COLLECTOR = $(HOSTALLOW_WRITE), $(FLOCK_FROM) HOSTALLOW_WRITE_STARTD = $(HOSTALLOW_WRITE), $(FLOCK_FROM) HOSTALLOW_READ_COLLECTOR = $(HOSTALLOW_READ), $(FLOCK_FROM) HOSTALLOW_READ_STARTD = $(HOSTALLOW_READ), $(FLOCK_FROM)

  10. Submit Description File Enable file transfer: universe = vanilla executable = myjob.exe input = myjob.input output = myjob.output log = myjob.log should_transfer_files = YES when_to_transfer_output = ON_EXIT queue

  11. The Glidein Concept • Assume: We need more machines, and we have permission to use a set of machines • Glidein temporarily adds a set of machines to the local pool

  12. Glidein • In addition, Glidein solves the problem: “My job needs to run on that particular resource, and my job needs Condor.” • For example: a job that must run under the standard universe

  13. Glidein • Condor sends and runs its own executables on the resource • The needed resource appears to temporarily join the local Condor pool !

  14. Glidein run condor_glidein to add the remote resource to the local pool the master and startd daemons become grid universe jobs using gt2 remote resource local pool

  15. Making Glidein Work • Change the configuration to give access permission (HOSTALLOW_WRITE) to the remote resource • No changes to jobs’ submit description files! • But, do enable file transfer in the submit description file: universe = vanillaexecutable = myjob.exeinput = myjob.inputoutput = myjob.outputlog = myjob.logshould_transfer_files = YESwhen_to_transfer_output = ON_EXITqueue

  16. Force Job to Glidein Resource In the submit description file: universe = standardexecutable = ajob.exeinput = ajob.inputoutput = ajob.outputlog = ajob.logrequirements = \ ( machine == “example.mcs.anl.gov" ) \ && Arch != "" && OpSys != ""queue

  17. The Grid Universe Most useful when • We want to send a job off to a far away machine • We want to hand a job to another batch processing system on the local machine • We want to send a job off to a far away machine, in order to hand that job to another batch processing system on that machine

  18. The Grid Universe • All handled in the submit description file • Supports several back end types: • Globus: GT2, GT3, GT4 • NorduGrid • UNICORE • Condor • PBS • LSF

  19. Condor-G • Condor-G describes jobs to be handed off to a machine, and the machine is utilizing Globus middleware • gt 2: Globus Toolkit 1 or 2 or the pre-web services GRAM • gt 3: Globus Toolkit 3 • gt 4: Globus Toolkit 4 or WS GRAM

  20. Submit Description File One of: For gt2: universe = grid input = job1.input output = job1.result log = job1.log grid_resource = gt2 example.wisc.edu/jobmanager queue jobmanagerjobmanager-condorjobmanager-pbsjobmanager-lsfjobmanager-sge

  21. XXX is one of: ForkCondorPBSLSFSGE Submit Description File For gt3: universe = grid input = job2.input output = job2.result log = job2.log grid_resource = gt3 http://198.51.254.40:8080/osga/services/base /gram/XXXManagedJobFactoryService queue IP address:Port number

  22. XXX is one of: ForkCondorPBSLSFSGE Submit Description File For gt4: universe = grid input = job3.input output = job3.result log = job3.log grid_resource = gt4 https://198.51.254.40:8080/wsrf/service/ManagedJobFactoryService XXX queue IP address:Port numberORHost name:Port number

  23. Nordugrid and the Submit Description File universe = grid input = job4.input output = job4.result log = job4.log grid_resource = nordugrid ngexample.com queue

  24. Unicore and the Submit Description File vsite is the name of the Unicore virtual resource universe = grid input = job5.input output = job5.result log = job5.log grid_resource = unicore usite.example.comvsite keystore_file = /frieda/certificates/keystore keystore_alias = “frieda” keystore_passphrase_file = /frieda/private/passphrase queue

  25. PBS and the Submit Description File • Details of the PBS installation in$(GLITE_LOCATION)/etc/batch_gahp.config universe = grid input = job6.input output = job6.result log = job6.log grid_resource = pbs queue

  26. LSF and the Submit Description File • Details of the LSF installation in$(GLITE_LOCATION)/etc/batch_gahp.config universe = grid input = job7.input output = job7.result log = job7.log grid_resource = lsf queue

  27. Condor-C • Condor is running here,and Condor is running over there • For the case where We want to send a job off to a far away machine, in order to hand that job to another batch processing system on that machine

  28. Condor-C and the Submit Description File universe = grid input = job8.input output = job8.result log = job8.log grid_resource = condor joe@remotemachine.example.com remotecentralmanager.example.com +remote_jobuniverse = 5 +remote_requirements = True +remote_ShouldTransferFiles = "YES" +remote_WhenToTransferOutput = "ON_EXIT" queue schedd name collector machine name vanilla universe

  29. Credentials • Not just anybody can use any resource at any time. . . • Key concepts: Authentication verification of an identity Authorization permission to do something

  30. Authentication If Frieda says “I am Frieda.”, how do we distinguish this from if Frieda says “I am George Bush.” ?

  31. Authentication • Bush can do whatever he pleases • If Frieda claims to be Bush, (and this is accepted), then Frieda can do whatever she pleases • Authentication attempts to verify the identity of the entity that is communicating

  32. Authorization • Who is allowed (permitted) to do what • Frieda may run gt4 jobs on the Open Science Grid machines • Fred may write to files in /usr/bin • the Unix user root may do anything! • Can be implemented with a list of those authorized

  33. Condor and Authentication Authentication within Condor comes in many forms. Here are three. • File system: Have the entity write a file. The OS attaches a name to the file owner. Condor checks that the entity’s claim is the same as the file owner. • GSI (Grid Security Infrastructure) • Kerberos

  34. Authentication Idea CA • A centralized certificate authority (CA) does verification of an entity’s identity. • When satisfied, the CA issues a signed certificate (also called a credential) I am Frieda

  35. Authentication CA • To authenticate, the entity presents the certificate • All is well, if we trust the CA and the remote machine I am Frieda

  36. GSI Authentication • GSI uses X.509 certificates • Grid universe, submitting to back end types using Globus middleware (gt2, gt3, gt4), as well as nordugrid, and unicore use X.509 certificates • Condor can also use GSI

  37. Revocation, Trust, and Proxies • The CA may revoke a credential • Frieda gives the signed credential to the remote machine. If the remote machine is malicious, it could impersonate Frieda. Therefore, a password protects the credential. • A proxy is a credential that includes the password, but is only valid for a specific (short) time period. • MyProxy software enables GSI proxy management

More Related