1 / 8

Phoenix Training session

Phoenix Training session. Introduction for CHIPP sysadmins. Contents. Scope of the course Documentation Monitoring Remote access Handling services Shared filesystems Visit the Machine R oom. Scope of the course.

aoife
Télécharger la présentation

Phoenix Training session

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phoenix Training session Introduction for CHIPP sysadmins

  2. Contents • Scope of the course • Documentation • Monitoring • Remote access • Handling services • Shared filesystems • Visit the Machine Room

  3. Scope of the course • Give you enough information so that you can bring the cluster back to production under an emergency. • Emergency = sysadmins are offsite, or ask you for help • Not intended to give you a full understanding of everything, nor to be able to install or configure new services. • Covers: Arc, Cream, Torque, Moab, dCache, BDII, WNs, NFS, Lustre, GPFS • Does not cover: Monitoring, Apel, Ui, Cfengine, Voboxes, Argus (too soon) • Interactive session, please ask questions! Please give us your feedback!

  4. Documentation • Everything should be on the twiki: https://wiki.chipp.ch/twiki/bin/view/LCGTier2 • It will be our Course Documentation • But documentation is never enough, and only shows reality in a dream world. • Users Section not really interesting for us. • Logs Section is the place to look for things that happened, like meetings, issues and such. • The fun part is in Technical Section.

  5. Monitoring • Overview in twiki: Technical -> Monitoring • PhoenixMonOverview is our main source of information. Ganglia complements it. • They lose detail as time goes by • Our Nagios instances are useless right now. • You can click on some graphs to get extended information • The first section links to VO tests

  6. Remote Access • You should have your root private key for the cluster on username@pub.lcg.cscs.ch and grid certificate in ui64 • If the agent does not work, try to killallssh-agent and relogin. Agent is forwarded by default. • Use ssh tunnels to access private interfaces, like dCacheGui (-L 22223:storage02:22223). • For hardware reboot, use ireset from xen12. Be careful. • Network traffic is un-firewalled within the cluster. • You can use dsh for massive operations. /etc/dsh/groups lists the available groups xen12# dsh -g WN “service pbs_mom restart”

  7. Handling services • Every service is different. Use your logic! • One script to rule grid services: grid-service. • Two tools to check general status: • chk_CREAM-CEs submits a job to cream01/02 to the cscs queue and polls for results. • chk_SE-lcgtools copies and registers a file in/out dCache using BDII information. • Use them from ui64 with your certificate.

  8. Shared filesystems • nfs01/02 hosts 9 DRBD LVMs with heartbeat. • /experiment_software for each VO • /shared for gridmapdir, vo_tags and torque/moab High Availability locks and logs. • mds1/2 and oss{1-4}{1-2} hosts /lustre/scratch • /home symlink on all WNs and Ces (/home/egee) • /tmpdir_pbssymlink on some WNs • gpfs01-03 hosts /gpfs • /tmpdir_pbssymlink on some other WNs

More Related