Modernizing Central Computing at Thomas Jefferson National Accelerator Facility

Jefferson LabSite Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA Kelvin.Edwards@jlab.org 757-269-7770 http://cc.jlab.org HEPiX - TRIUMF, Oct. 20, 2003

Central Computing • Sun systems • Upgrade to Solaris 8 almost complete • HP systems • All upgraded to HP 11i • Moving away from HP for central services • Linux systems • Still at RedHat 7.2 • Evaluating RedHat 10 (Fedora 1) • Windows 2000 Domain Upgrade • Implemented in May • Working on Group policy issues

Central Computing (cont) • Network Appliance • 2 recently upgraded to the FAS940 (~16k NFS Ops/sec) • ~4.5TB online disk space (1.5TB home, 2TB group) • Linux fileserver • 3Ware SATA system • 2TB scratch area (16 160GB Seagate SATA drives) • Backups • QuickRestore • Seagate LTOs, Overland Tape Library

Scientific Computing • JASMine & Auger (http://cc.jlab.org/scicomp) • JASMine: Mass Storage Tape + Disk Cache • Auger: Batch Farm Management & Monitoring • Typical Day • 2 – 4 TB of INPUT data through the farm • Process 2000 – 5000 jobs • Certificates used for all user authentication • Tape drives • 6 9840s – migrating data to 9940Bs • 13 9940A – Read only • 15 9940B – all data written to these tapes

Scientific Computing (cont) • Linux File Servers • 16 Data Movers – • 10 Mylex eXtremeRAID 2000 RAID cards (RAID-5) (SCSI) • 6 Adaptec 2200S Raid Cards (RAID-50) (U320 SCSI) • 32 Cache/Work File Servers • Mixture of Mylex and 3Ware cards • Batch Farming – over 24000 SPECint95, LSF • 178 RH 7.2 Linux dual-processors (P2 750 to P4 2.66GHz)

Noteworthy • Kswapd failures -- Solved • Automount timeouts set to 60 seconds, NOT minutes • Adaptec 2200S raid cards • Instead of the MegaRaid cards • Not quite as fast, but acceptable • Timeout problem -- fix available • Adaptec TOE (TCP Offloading Engine) • Problems with RH7.2, custom kernel (XFS), and their driver • Anyone else using them? Good results?

Projects • Windows • Standard builds (Server, IIS, desktop, laptop) • Backup Software Upgrade • Reliaty (was QuickRestore) • SSH v2 Internally • Networks • Gigabit connection to our border router • VLans for use on site

Projects (cont) • JASMine • Rewrite disk cache • Support farm output caches • Policy-based file movement off-site • Auger • Better file scheduling/pinning

Projects (cont) • PPDG • SRM version 2 • Replication • Replica Catalog web service interface • Remote Job submission • User and System JDLs • Batch web service integration with Auger

Modernizing Central Computing at Thomas Jefferson National Accelerator Facility

Modernizing Central Computing at Thomas Jefferson National Accelerator Facility

Presentation Transcript

Welcome to Jefferson Lab

Jefferson Lab Users Group Report

Jefferson Lab Remote Access

Jefferson Lab Experience

Jefferson Lab Status

Jefferson Lab Site Report

Jefferson Lab – An Introduction

Jefferson Lab Status

Jefferson Lab Report

Jefferson Lab Site Report

Jefferson Lab Site Report

Jefferson Lab Report

Jefferson Lab Printing System

Jefferson Lab

Jefferson Lab Site Report

Jefferson Lab Site Report

Jefferson Lab Site Report

Jefferson Lab Status

Jefferson Lab Update

Jefferson Lab – An Introduction

Jefferson Lab Site Report

Alex Bogacz Jefferson Lab