1 / 24

Distributed computing in High Energy Physics with Grid Technologies (Grid tools at PHENIX)

Distributed computing in High Energy Physics with Grid Technologies (Grid tools at PHENIX). Topics. Grid/Globus HEP Grid Projects PHENIX as the example Conceptions and scenario for widely distributed multi cluster computing environment Job submission and job monitoring Live demonstration.

amanda
Télécharger la présentation

Distributed computing in High Energy Physics with Grid Technologies (Grid tools at PHENIX)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed computing in High Energy Physics with Grid Technologies (Grid tools at PHENIX) Andrey Shevel

  2. Topics • Grid/Globus • HEP Grid Projects • PHENIX as the example • Conceptions and scenario for widely distributed multi cluster computing environment • Job submission and job monitoring • Live demonstration Andrey Shevel

  3. What is the Grid • “Dependable, consistent, pervasive access to [high-end] resources”. • Dependable: Can provide performance and functionality guarantees. • Consistent: Uniform interfaces to a wide variety of resources. • Pervasive: Ability to “plug in” from anywhere. Andrey Shevel

  4. Another Grid description • Quote from Information Power Grid (IPG) at NASA • http://www.ipg.nasa.gov/aboutipg/presentations/PDF_presentations/IPG.AvSafety.VG.1.1up.pdf • Grids are tools, middleware, and services for: • providing a uniform look and feel to a wide variety of computing and data resources; • supporting construction, management, and use of widely distributed application systems; • facilitating human collaboration and remote access and operation of scientific and engineering instrumentation systems; • managing and securing the computing and data infrastructure. Andrey Shevel

  5. Basic HEP requirements in distributed computing • Authentication/Authorization/Security • Data Transfer • File/Replica Cataloging • Match Making/Job submission/Job monitoring • System monitoring Andrey Shevel

  6. HEP Grid Projects • European Data Grid • www.edg.org • Grid Physics Nuclear • www.griphyn.org • Particle Physics Data Grid • www.ppdg.net • Many others. Andrey Shevel

  7. Possible Task • Here we are trying to gather computing power around many clusters. The clusters are located in different sites with different authorities. We use all local rules as they are: • Local schedulers, policies, priorities; • Other local circumstances. • One of many possible scenarios is discussed in this presentation. Andrey Shevel

  8. General scheme: jobs are planned to go where data are and to less loaded clusters Partial Data Replica Remote cluster File Catalog RCF Main Data Repository User Andrey Shevel

  9. Base subsystems for PHENIX Grid User Jobs BOSS BODE Package GSUNY GridFTP (Globus-url-copy) Globus job-manager/fork Cataloging engine GT 2.2.4.latest Andrey Shevel

  10. Conceptions Major Data Sets (physics or simulated data) Master Job (script) submitted by user Satellite Job (script) Submitted by Master Job Minor Data Sets (Parameters, scripts, etc.) Input/Output Sandbox(es) Andrey Shevel

  11. The job submission scenario at remote Grid cluster • To determine (to know) qualified computing cluster: available disk space, installed software, etc. • To copy/replicate the major data sets to remote cluster. • To copy the minor data sets (scripts, parameters, etc.) to remote cluster. • To start the master job (script) which will submit many jobs with default batch system. • To watch the jobs with monitoring system – BOSS/BODE. • To copy the result data from remote cluster to target destination (desktop or RCF). Andrey Shevel

  12. Master job-script • The master script is submitted from your desktop and performed on the Globus gateway (may be in groupaccount) with using monitoring tool (it is assumed BOSS). • It is supposed that the master script will find the following information in the environment variables: • CLUSTER_NAME – name of the cluster; • BATCH_SYSTEM – name of the batch system; • BATCH_SUBMIT – command for job submission through BATCH_SYSTEM. Andrey Shevel

  13. Remote Cluster Job submission scenario Submission of MASTER job Through globus-jobmanager/fork Job submission with Command $BATCH_SUBMIT Globus gateway Local desktop MASTER job is performing On Globus gateway Andrey Shevel

  14. Transfer the major data sets • There are a number of methods to transfer major data sets: • The utility bbftp (whithout use of GSI) can be used to transfer the data between clusters; • The utility gcopy (with use of GSI) can be used to copy the data from one cluster to another one. • Any third party data transfer facilities (e.g. HRM/SRM). Andrey Shevel

  15. Copy the minor data sets • There are at least two alternative methods to copy the minor data sets (scripts, parameters, constants, etc.): • To copy the data to /afs/rhic.bnl.gov/phenix/users/user_account/… • To copy the data with the utility CopyMinorData (part of package gsuny). Andrey Shevel

  16. Package gsunyList of scripts • General commands (ftp://ram3.chem.sunysb.edu/pub/suny-gt-2/gsuny.tar.gz) • GPARAM – configuration description for set of remote clusters • GlobusUserAccountCheck – to check the Globus configuration for local user account. • gping – to test availability of the Globus gateways. • gdemo – to see the load of remote clusters. • gsub – to submit the job on less loaded cluster; • gsub-data – to submit the job where data are; • gstat, gget, gjobs – to get status of the job, standard output, detailed info about jobs. Andrey Shevel

  17. Package gsunyData Transfer • gcopy – to copy the data from one cluster (local hosts) to another one. • CopyMinorData – to copy minor data sets from cluster (local host) to cluster. Andrey Shevel

  18. Job monitoring • After the initial development of the description of required monitoring tool(https://www.phenix.bnl.gov/phenix/WWW/p/draft/shevel/TechMeeting4Aug2003/jobsub.pdf )it was found the packages: • Batch Object Submission System (BOSS)byClaudio Grandihttp://www.bo.infn.it/cms/computing/BOSS/ • Web interfaceBOSS DATABASE EXPLORER (BODE) by Alexei Filinehttp://filine.home.cern.ch/filine/ Andrey Shevel

  19. Basic BOSS components boss executable: the BOSS interface to the user MySQL database: where BOSS stores job information jobExecutor executable: the BOSS wrapper around the user job dbUpdator executable: the process that writes to the database while the job is running Interface to Local scheduler Andrey Shevel

  20. Basic job flow Globus gateway To wrap the job Local Scheduler Exec node n Globus Space BOSS boss submit boss query boss kill Here is cluster N Exec node m gsub master-script BODE (Web interface) BOSS DB Andrey Shevel

  21. [shevel@ram3 shevel]$ CopyMinorData local:andrey.shevel unm:. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ YOU are copying THE minor DATA sets --FROM-- --TO-- Gateway = 'localhost' 'loslobos.alliance.unm.edu' Directory = '/home/shevel/andrey.shevel' '/users/shevel/.' Transfer of the file '/tmp/andrey.shevel.tgz5558' was succeeded [shevel@ram3 shevel]$ cat TbossSuny . /etc/profile . ~/.bashrc echo " This is master JOB" printenv boss submit -jobtype ram3master -executable ~/andrey.shevel/TestRemoteJobs.pl -stdout \ ~/andrey.shevel/master.out -stderr ~/andrey.shevel/master.err gsub TbossSuny # submit to less loaded cluster Andrey Shevel

  22. Status of the PHENIX Grid • Live info is available on the pagehttp://ram3.chem.sunysb.edu/~shevel/phenix-grid.html • The group account ‘phenix’ is available now at • SUNYSB (rserver1.i2net.sunysb.edu) • UNM (loslobos.alliance.unm.edu) • IN2P3 (in process now) Andrey Shevel

  23. Live Demo for BOSS Job monitoring http://ram3.chem.sunysb.edu/~magda/BODE User: guest Pass: Guest101 Andrey Shevel

  24. Computing Utility (instead conclusion) • It is clear that computing utility (computing cluster built up for concrete collaboration tasks). • The computing utility can be implemented anywhere in the World. • The computing utility can be used from anywhere (France, USA, Russia, etc.). • Most important part of the computing utility is man power. Andrey Shevel

More Related