1 / 11

RAL Tier A Status

RAL Tier A Status. Tim Adye Rutherford Appleton Laboratory BaBar UK Collaboration Meeting Liverpool 11 th April 2003. BaBar Batch CPU Use at RAL. BaBar Batch Users at RAL (running at least one non-trivial job each week). Kanga Disk Saga. In December we had filled up all ~20 TB at RAL

carmine
Télécharger la présentation

RAL Tier A Status

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RAL Tier A Status Tim Adye Rutherford Appleton Laboratory BaBar UK Collaboration Meeting Liverpool 11th April 2003 Tim Adye

  2. BaBar Batch CPU Use at RAL Tim Adye

  3. BaBar Batch Users at RAL(running at least one non-trivial job each week) Tim Adye

  4. Kanga Disk Saga • In December we had filled up all ~20 TB at RAL • Freed up some space by deleting (most) old Series-8 data and started importing the backlog • A minor upgrade of our old data server on 19 Feb, csfsun02, prompted a major loss of data • Recovered • 1.3 TB scavenged from csfsun02 disks • 1.4 TB re-imported from SLAC disk • 0.3 TB restored from SLAC HPSS • Half way through recovering, discovered that csfsun02 was still bad. • All data migrated to borrowed servers. • All Kanga data restored and up-to-date with SLAC production on 28 March. Tim Adye

  5. Security Incident • SucKIT Linux root exploit has been spreading throughout the HEP community • An infected machine records all passwords typed on that machine • Includes passwords used to connect to other machines • ssh included; fortunately not klog • It’s not unlikely that CSF passwords have been compromised by another system • To protect CSF from further attack, all passwords that have been used recently were reset Tuesday • Users contacted by phone and post • I can give you your new password today  Tim Adye

  6. Linux Upgrade • Nearly all machines at RAL now run RedHat 7.2 • Exceptions are • babar-old.gridpp.rl.ac.uk front-end (AKA csfc) • Will be switched off next week • babarbuild batch queue • RH72 batch workers can run RH6 jobs, but RH72 machines can’t build code in release analysis-13 and before, so • Upgrade to analysis-13b or later • Use the babarbuild queue to compile and link; run in the normal queues Tim Adye

  7. CSF Batch System • Much work behind the scenes • Reliability and optimising queuing algorithms • Use bbrbsub to submit, eg. bbrbsub -l cput=01:00:00 BetaApp myAnalysis.tcl • bbrbsub is a wrapper for qsub, so you can use qsub options (see “man qsub”) Tim Adye

  8. Recently Planned Improvements – 1Since November • Install dedicated import-export machines • Fast (Gigabit) network connection • Special firewall rules to allow scp, bbftp, bbcp, etc. • Two new RH72 Linux machines • csfmove01.rl.ac.uk for exports • AFS authentication improvements • PBS token passing and renewal • integrated login (AFS token on login, like SLAC) • Not yet implemented   Tim Adye

  9. Recently Planned Improvements – 2Since November • Objectivity support • Works now for private federations, but no data import • First step will be to provide Objy conditions database access • Objy conditions snapshot installed byTim Barrass… • Then we lost our Objy server, csfsun02 • Upgrade Suns to Solaris 8 and integrate into PBS • 4 x 4-CPU Solaris 8 systems now available in babarsol queue, eg. • bbrbsub –q babarsol job.sh    Tim Adye

  10. Recently Planned Improvements – 3Since November • Support Grid “generic accounts”, so special RAL user registration is no longer necessary • Users without an entry in thegrid-mapfile will be assigned to babar001, babar002, … babar050 • The pool account will forever more be bound to that certificate DN, so you will always run under the same babar0NN  Tim Adye

  11. Support • For help, post to “RAL Tier A” HyperNews forum; or • contact Emmanuel Olaiya (at SLAC) or me (at RAL) Tim Adye

More Related