
Computing & NetworkingUser Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008
Users and JLab IT • Ed Brash is User Group Board of Directors’ representative on the IT Steering Committee. • Physics Computing Committee (Sandy Philpott) • Helpdesk and CCPR requests and activities • Challenges • Constrained budget • Staffing • Aging infrastructure • Cyber Security
Computing and Networking Infrastructure Andy Kowalski
CNI Outline • Helpdesk • Computing • Wide Area Network • Cyber Security • Networking and Asset Management
Helpdesk • Hour 8am-12pm M-F • Submit a CCPR via http://cc.jlab.org/ • Dial x7155 • Send email to helpdesk@jlab.org • Windows XP, Vista and RHEL5 Supported Desktops • Migrating older desktops • Mac Support?
Computing • Email Servers Upgraded • Dovecot IMAP Server (Indexing) • New File Server and IMAP Servers (Farm Nodes) • Servers Migrating to Virtual Machines • Printing • Centralized Access via jlabprt.jlab.org • Accounting Coming Soon • Video Conferencing (working on EVO)
Wide Area Network • Bandwidth • 10Gbps WAN and LAN backbone • Offsite Data Transfer Servers • scigw.jlab.org(bbftp) • qcdgw.jlab.org(bbcp)
Cyber Security Challenge • The threat: sophistication and volume of attacks continue to increase. • Phishing Attacks • Spear Phishing/Whaling are now being observed at JLab. • Federal, including DOE, requirements to meet the cyber security challenges require additional measures. • JLab uses a risk based approach that incorporates achieving the mission while at the same time dealing with the threat.
Cyber Security • Managed Desktops • Skype Allowed From Managed Desktops On Certain Enclaves • Network Scanning • Intrusion Detection • PII/SUI (CUI) Management
Networking and IT Asset Management • Network Segmentation/Enclaves • Firewalls • Computer Registration • https://reggie.jlab.org/user/index.php • Managing IP Addresses • DHCP • Assigns all IP addresses (most static) • Integrated with registration • Automatic Port Configuration • Rolling out now • Uses registration database
Scientific Computing Chip Watson & Sandy Philpott
Farm Evolution Motivation • Capacity upgrades • Re-use of HPC clusters • Movement to Open Source • O/S upgrade • Change from LSF to PBS
Farm Evolution Timetable Nov 07: Auger/PBS available – RHEL3 - 35 nodes Jan 08: Fedora 8 (F8) available – 50 nodes May 08: Friendly-user mode; IFARML4,5 Jun 08: Production • F8 only; IFARML3 + 60 nodes from LSF IFARML alias Jul 08: IFARML2 + 60 nodes from LSF Aug 08: IFARML1 + 60 nodes from LSF Sep 08: RHEL3/LSF->F8/PBS Migration complete • No renewal of LSF or RHEL for cluster nodes
Farm F8/PBS Differences • Code must be recompiled • 2.6 kernel • gcc 4 • Software installed locally via yum • cernlib • Mysql • Time limits: 1 day default, 3 days max • stdout/stderr to ~/farm_out • Email notification
Farm Future Plans • Additional nodes • from HPC clusters • CY08: ~120 4g nodes • CY09-10: ~60 6n nodes • Purchase as budgets allow • Support for 64 bit systems when feasible & needed
Storage Evolution • Deployment of Sun x4500 “thumpers” • Decommissioning of Panasas (old /work server) • Planned replacement of old cache nodes
Tape Library • Current STK “Powderhorn” silo is nearing end-of-life • Reaching capacity & running out of blank tapes • Doesn’t support upgrade to higher density cartridges • Is officially end-of-life December 2010 • Market trends • LTO (Linear Tape Open) Standard has proliferated since 2000 • LTO-4 is 4x density, capacity/$, and bandwidth of 9940b: 800 GB/tape, $100/TB, 120 MB/s • LTO-5, out next year, will double capacity, 1.5x bandwidth: 1600 GB/tape, 180 MB/s • LTO-6 will be out prior to the 12 GeV era 3200 GB/tape, 270 MB/s
Tape Library Replacement • Competitive procurement now in progress • Replace old system, support 10x growth over 5 years • Phase 1 in August • System integration, software evolution • Begin data transfers, re-use 9940b tapes • Tape swap through January • 2 PB capacity by November • DAQ to LTO-4 in January 2009 • Old silo gone in March 2009 End result: breakeven on cost by the end of 2009!
Long Term Planning • Continue to increase compute & storage capacity in most cost effective manner • Improve processes & planning • PAC submission process • 12 GeV Planning…
LQCD Computing • JLab operates 3 clusters with nearly 1100 nodes, primarily for LQCD plus some accelerator modeling • National LQCD Computing Project (2006-2009: BNL, FNAL, JLab; USQCD Collaboration) • LQCD II proposal 2010-2014 would double the hardware budget to enable key calculations • JLab Experimental Physics & LQCD computing share staff (operations & software development) & tape silo, providing efficiencies for both