1 / 50

CE+WN+siteBDII Installation and configuration

The EPIKH Project. (Exchange Programme to advance e-Infrastructure Know-How). CE+WN+siteBDII Installation and configuration. Bouchra RAHIM(rahim@cnrst.ma) Africa 6 2010 - Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators Rabat, 01.06.2011. www.epikh.eu. Outline.

aya
Télécharger la présentation

CE+WN+siteBDII Installation and configuration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) CE+WN+siteBDII Installation and configuration Bouchra RAHIM(rahim@cnrst.ma) Africa 6 2010 - Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators Rabat, 01.06.2011 www.epikh.eu

  2. Outline • Computing Element overview • Worker Node overview • CE CREAM overview • gLite stack overview • gLite CE siteBDII • gLite CE cream and WN

  3. gLite stack overview

  4. gLite overview worker node

  5. glite overview • User Interface: it’s the point of access for users to glite grid services • WMS: it’s the component that optimize resource usage. • CE: the machine who manage worker nodes • WN: the machines who actually execute applications • SE: machines where files are stored • LFC: used to “find” files on the grid • BDII: services responsible to publish all info of your sites • Logging and Bookkeping: as it’s name says it’s a logger and alert user when job is finisched

  6. Computing Element Overview • Computing Element provides some of main services of a site. • Main functionalities: • job management (job submission, job control) • job status updated for WMS • Communicate with BDII site that publishes all information regarding the computing element • It can runs several kinds of batch system: • Torque + MAUI • LSF • SGE • Condor

  7. Torque + MAUI • Torque server service: • pbs_serverprovides basic batch services such as receiving/creating a batch job. • Torque client service: • psb_momplaces jobs into execution. It’s is also responsible for returning job’s output to the user. • MAUIsystem service: • job_schedulercontains site’s policy to decide which job is going to be executed and when.

  8. Site BDII* • By default it was installed on CE but now it’s better to install it on a dedicated server, physical or virtual. • It collect all site GRISes* (for example SE,RB,LFC,etc...) • Service is named bdii • Log file: /opt/bdii/var/bdii.log • *BDII = Berkeley Database Information Index • **GRIS = Grid Resouce Information Service

  9. Worker Node Element Overview • They are machines which really execute your job. • User can only access their services by a Computing Element. • Their characteristics are collected by Computing Element that publishes all information by BDII services

  10. CE Cream overview • Computing Resource Execution And Management • Accept job submission requests belonging from a WMS and other job management request. • It exposes a web services interface

  11. Requirements • Three or more machine: • One will be used to perform CE installation; • One will be used to perform site BDII installation; • Others will be used to perform WN installation; • Architecture: 64 bit • Operating System: Scientific Linux 5 • Two machines with a public ip address, direct and reverse address resolution on a DNS (CE and BDII ) • The CE machine must be equipped with an X509 certificate

  12. BDII Installation) 12

  13. Preparing the Linux machine • Network Time Protocol settings # yum install ntp • Copy the ntp.conf file and the ntp directory from ftp://repo.magrid.ma/pub/CE_WN_BDII/ to /etc/ (Winscp) • Synchronize the date # /etc/init.d/ntpd stop # ntpdate ntp.marwan.ma • Start the ntpd service and configure it to start on boot # /etc/init.d/ntpd start # chkconfig ntpd on

  14. Preparing the Linux machine • Disable Selinux: make sure /etc/selinux/config contains line: • SELINUX=disabled • Please check If you have a valid hostname • #hostname –f • # cat /etc/hosts • Stop iptables # /etc/init.d/iptables stop # chkconfig iptables off • Reboot

  15. Repository set up-BDII • Add to system repository ones specific for middleware to install # cd /etc/yum.repos.d/ # mv dag.repo dag.repo.stop export MREPO=http://repo.magrid.ma/yumrepo/glite32 # REPOS="dag lcg-CA glite-BDII_site" # for name in $REPOS; do wget $MREPO/$name.repo –O /etc/yum.repos.d/$name.repo; done

  16. package installation-BDII • Use yum to install needed packets # yum install lcg-CA ca-policy-egi-core ca-policy-lcg # yum install glite-BDII_site

  17. Yaim Configuration • All the configuration samples files are located in /opt/glite/yaim/examples/siteinfo directory • it’s better to make a copy of the original files #mkdir/opt/glite/yaim/etc/siteinfo/ #mkdir/opt/glite/yaim/etc/siteinfo/services/ #cp /opt/glite/yaim/examples/siteinfo/site-info.def /opt/glite/yaim/etc/siteinfo/site-info.def #cp /opt/glite/yaim/examples/siteinfo/services/glite-bdii_site /opt/glite/yaim/etc/siteinfo/services/glite-bdii_site #cp /opt/glite/yaim/examples/users.conf /opt/glite/yaim/etc/siteinfo/users.conf #cp /opt/glite/yaim/examples/groups.conf /opt/glite/yaim/etc/siteinfo/groups.conf #cp /opt/glite/yaim/examples/siteinfo/edgusers.conf /opt/glite/yaim/etc/siteinfo/edgusers.conf

  18. Yaim Configuration • You can find some template files in : ftp://repo.magrid.ma/pub/CE_WN_BDII/ • Edit the site-info.def file and change the following variables: • SITE_NAME=MA-ZZ-School (Name of the site) • CE_HOST=pcXX.magrid.ma (XX the machine that will be a CE) • SITE_BDII_HOST=pcYY.magrid.ma(the current machine) • Edit the services/glite-bdii_site file and change the following variables: • SITE_NAME=MA-ZZ-School • SITE_DESC="MA-ZZ-School" 

  19. Yaim Configuration-BDII • Run the configuration Command: • if everything is OK, run a basic test • ldapsearch -x -h pcYY.magrid.ma -p 2170 -b "mds-vo-name=local,o=grid" • /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n glite-BDII_site

  20. CE Cream Installation(on Torque/PBS) 20

  21. Preparing the Linux machine # yum install ntp Preparing the Linux machine • Copy the ntp.conf file and the ntp directory from ftp://repo.magrid.ma/pub/CE_WN_BDII/ to /etc/ (Winscp) • Synchronize the date with an ntp server # /etc/init.d/ntpd stop # ntpdate ntp.marwan.ma • Network Time Protocol settings • Start the ntpd service and configure it to start on boot # /etc/init.d/ntpd start # chkconfig ntpd on

  22. Preparing the Linux machine • SELINUX=disabled Preparing the Linux machine • Please check If you have a valid hostname • #hostname –f • # cat /etc/hosts • Disable Selinux: make sure /etc/selinux/config contains line: • Stop iptables # /etc/init.d/iptables stop # chkconfig iptables off • Reboot

  23. Repository set up-CE • Add to system repository ones specific for middleware to install # cd /etc/yum.repos.d/ # mvdag.repodag.repo.stop export MREPO=http://repo.magrid.ma/yumrepo/glite32 # REPO="dag lcg-CA glite-CREAM glite-TORQUE_serverglite-TORQUE_utils" # for name in $REPOS; do wget $MREPO/$name.repo –O /etc/yum.repos.d/$name.repo; done

  24. package installation-CE • Use yum to install needed packets # yum clean all # yum install lcg-CA ca-policy-egi-core ca-policy-lcg # yum install glite-CREAM # yum install glite-TORQUE_server glite-TORQUE_utils • Due to a dependency problem within the Tomcat distribution in SL5 first install xml-commons-apis: yum install xml-commons-apis

  25. Before configuration-HostCertificates • Some preliminary steps before configuration: • copy host certificate in default path: # cd # mv /root/pcXXcert.pem /etc/grid-security/hostcert.pem # mv root/pcXXkey.pem /etc/grid-security/hostkey.pem # chmod 400 /etc/grid-security/hostkey.pem # chmod 600 /etc/grid-security/hostcert.pem

  26. YAIM configuration-CE • Main file to edit is site-info.def, where you specify some general settings and other component’s parameters (CE Cream) • Other file to be edited are: wn-list.conf, users.conf,groups.conf, services/glite-creamce • Set variables with corrected values replacing example ones. # vi services/glite-creamce CEMON_HOST=pcXX.$MY_DOMAIN CREAM_DB_USER=eumed CREAM_DB_PASSWORD=grid2011 BLPARSER_HOST=pcXX.$MY_DOMAIN

  27. YAIM configuration-CE Declare the worker nodes in wn-list.conf # vi wn-list.conf pcAA.magrid.ma pcBB.magrid.ma

  28. YAIM configuration-CE CE_HOST=pcYY.magrid.ma CE_CPU_MODEL=XEON #cat /proc/cpuinfo CE_CPU_VENDOR=Intel CE_CPU_SPEED=2230 CE_OS=ScientificSL CE_OS_RELEASE=5.5 #cat /etc/redhat-release CE_OS_VERSION="Boron" CE_OS_ARCH=x86_64 CE_MINPHYSMEM=512 #cat /proc/meminfo on WN CE_MINVIRTMEM=512 CE_PHYSCPU=1 #total cpu in site CE_LOGCPU=4 CE_SMPSIZE=4 CE_OUTBOUNDIP=TRUE CE_INBOUNDIP=FALSE CE_OTHERDESCR="Cores=4,Benchmark=6.5-HEP-SPEC06” http://gkswiki.fzk.de/index.php5/Configuration_of_the_CREAM_CE

  29. YAIM configuration-CE • How to set CE_SI00, CE_SF00, CE_CAPABILITY, CE_OTHERDESCR ? • Try to search for you value in this link: • http://www.italiangrid.org/grid_operations/site_manager/HEP-SPEC06 • https://hepix.caspur.it/benchmarks/doku.php?id=bench:results_sl5_x86_64_gcc_412 • https://hepix.caspur.it/processors/dokuwiki/doku.php?id=benchmarks:results • For example if you have an Intel XEON 5520 2.23 GHz with no Hyper Threading will find in the table of previous link a value of 95 and a conversion factor of 1HS06=40 so: • CE_SI00 = 3800 • CE_SF00 = 3800 • CE_CAPABILITY="CPUScalingReferenceSI00=3800” • CE_OTHERDESCR="Cores=4,Benchmark=23.75-HEP-SPEC06” • Where (3800/40)/4= 23.75

  30. YAIM configuration-CE BATCH_SERVER=$CE_HOST JOB_MANAGER=lcgpbs CE_BATCH_SYS=pbs BATCH_LOG_DIR=/var/spool/pbs APEL_DB_PASSWORD=grid2011 DGAS_ACCT_DIR=/var/spool/pbs/server_priv/accounting VOS="eumed" QUEUES=“eumed" EUMED_GROUP_ENABLE="eumed"

  31. YAIM configuration-CE • After editing you can launch command: #/opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n creamCE -n TORQUE_server -n TORQUE_utils #/opt/glite/yaim/bin/yaim -r -s /opt/glite/yaim/etc/siteinfo/site-info.def -n creamCE -f config_cream_blparser http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:devel:install-cream32

  32. Check the CE • http://grid.pd.infn.it/cream/field.php?n=Main.CheckYourCREAMCEConfiguration • Download the script • wget http://grid.pd.infn.it/cream/CheckCreamConf/current/CheckCreamConf.pl • chmod +x CheckCreamConf.pl • Run it: • ./CheckCreamConf.pl • Check output : • CheckCreamConf.log

  33. WN Cream Installation(on Torque/PBS) 33

  34. Preparing the Linux machine # yum install ntp Preparing the Linux machine • Copy the ntp.conf file and the ntp directory from ftp://repo.magrid.ma/pub/CE_WN_BDII/ to /etc/ (Winscp) • Synchronize the date # /etc/init.d/ntpd stop # ntpdate ntp.marwan.ma • Network Time Protocol settings • Start the ntpd service and configure it to start on boot # /etc/init.d/ntpd start # chkconfig ntpd on

  35. Preparing the Linux machine • SELINUX=disabled Preparing the Linux machine • Please check If you have a valid hostname • #hostname –f • # cat /etc/hosts • Disable Selinux: make sure /etc/selinux/config contains line: • Stop iptables # /etc/init.d/iptables stop # chkconfig iptables off • Reboot

  36. Repository set up-WN Repository set up-CE # cd /etc/yum.repos.d/ # mv dag.repo dag.repo.stop export MREPO=http://repo.magrid.ma/yumrepo/glite32 # REPOS="dag lcg-CA glite-WN glite-TORQUE_client " # for name in $REPOS; do wget $MREPO/$name.repo –O /etc/yum.repos.d/$name.repo; done • Add to system repository ones specific for middleware to install

  37. package installation-WN # yum clean all # yum install -y lcg-CA ca-policy-egi-core ca-policy-lcg # yum groupinstall glite-WN # yum install glite-TORQUE_client package installation-CE • Use yum to install needed packets

  38. WN - YAIM Configuration • You can use same configuration file edited on CE: • this can be done on all worker node of a site; • so you don’t neet to re-edit anything! • Copy configuration files from CE machine using scp command: mkdir /opt/glite/yaim/etc/siteinfo/ mkdir /opt/glite/yaim/etc/siteinfo/services #Copy the following files site-info.def ,users.conf,groups.conf and wn-list.conf from ce root@pcYY:/opt/glite/yaim/etc/siteinfo/site-info.def #copy the glite-wn from examples/services • Ready to configure now # /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n glite-WN -n TORQUE_client

  39. WN - YAIM Configuration • Ready to configure now # /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n glite-WN -n TORQUE_client • A basic test: • Check the status of pbs_mom • pbsnodes –a

  40. Ready to configure now # /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n glite-WN -n TORQUE_client • A basic test: • Check the status of pbs_mom • pbsnodes –a

  41. Testing installation 41

  42. Tests on CE • SSH access to CE to test if CE can see WN and to test if all main service are up & running # pbsnodes # /etc/init.d/gLite status

  43. Tests on CE • SSH access to CE and then become a gilda user: # su – eumed001 • Create a file and add the following: $ vi test.sh #!/bin/sh sleep 20 #(it's useful to see the job status) hostname • Set right permission to be executable: $ chmod 700 test.sh

  44. Tests on CE • Launch job locally on CE $ qsub –q eumed test.sh • Then check list of job in execution on CE $ qstat –a ce.localdomain: Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time--------------- -------- -------- ---------- ------ --- --- ------ ----- - ----0.pc22.magrid.ma eumed001 short test.sh 5839 -- -- -- 00:15 R -- • In case you want to more info: $ qstat -f 3 • In case you want to abort a job execution: $ qdel 3 #that is jobid

  45. Tests on CE • If typing “qstat -a” command you didn’t get no output, no jobs are being executed on CE and this means your previous job terminated so now you can list output. $ ls test.sh.e3 test.sh.o3 $ cat test.sh.e3 #error file$$ cat test.sh.o3 #output filewn.localdomain

  46. JDL example $ vim hostname-cream.jdl Type = "Job"; JobType = "Normal"; Executable = "/bin/hostname"; StdOutput = "hostname.out"; StdError = "hostname.err"; OutputSandbox = {"hostname.err","hostname.out"}; Arguments = "-f"; OutputSandboxBaseDestUri = "gsiftp://localhost/tmp“;

  47. Working test • SSH access to UI to test if CE can receive and execute simple job • $ ssh gridXX@ui01.magrid.ma #password: gridXX • #set up the certificate • mkdir /home/grid01/.globus • [root@ui01 ~]# cp /root/user_cert/usercert.pem /home/grid01/.globus/usercert.pem • [root@ui01 ~]# cp /root/user_cert/userkey.pem /home/grid01/.globus/userkey.pem • [root@ui01 ~]# chown grid01 /home/grid01/.globus/usercert.pem • [root@ui01 ~]# chown grid01 /home/grid01/.globus/userkey.pem • [root@ui01 ~]# chmod 400 /home/grid01/.globus/userkey.pem • [root@ui01 ~]# su – grid01 • [grid01@ui01 ~]$ voms-proxy-init --voms eumed • Enter GRID pass phrase: [grid2011] • $ voms-proxy-init --voms eumed • password[grid2011] • #glite-ce-job-submit –r pc22.magrid.ma:8443/cream-pbs-eumed –o ID hostname-cream.jdl • #glite-ce-job-status –i ID

  48. Troubleshooting • Which logs are supposed to be open if something goes wrong?: • /var/log/message, for general errors • /opt/glite/var/log (especially glite-ce-cream.log) • /var/spool/pbs/server_priv/accounting/<data>, if even local submission on batch system doesn’t work.

  49. References • INFNGRID generic installation guide: • http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:install-3_2 • YAIM configuration variables • https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables • CE Cream installation guide: • GLITE Cream CE 3.2 SL5 Installation Guide [INFNGRID Release Wiki] • YAIM system administrator guide: • https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide400 • EUMEDGRID wiki: • http://wiki.eumedgrid.eu/bin/view • EuMedGRID sites installation and setup tips • http://wiki.eumedgrid.eu/twiki/bin/view/InfrastructureStatus/EumedSiteInstallation • How To Check And Test Your CREAMCE • http://grid.pd.infn.it/cream/field.php?n=Main.HowToCheckAndTestYourCREAMCE

  50. Thank you for your kind attention ! Any questions ?

More Related