HEPIX DESY Spring 2007 hepix2007.desy.de/

HEPIX DESY Spring 2007http://hepix2007.desy.de/ A summary Pete Gronbech

LAL • More machines : +25 Dell PE 1950 • Dual socket, dual core Woodcrest 2.33 • Memory = 2GB / core • Storage : 3 Sun X4500 (Thumper) - GOOD • 24 TB (48 disks) in 4U • 4 cores Opteron 2.2, 16 GB of memory • 4 x 1GbE + 1 x 10GbE • DPM disk servers and LUSTRE OSS • Installed under Linux, may look at Solaris in the future if LUSTRE available • Network : 10 Gb/s crossbar switch for network core • ExtremeNetworks BD8810 • replacement of Cabletron SSR8000 • 12x10GbE + 48x1GbE

OS and Tool Updates • All grid WNs running SL4.4 64-bit • Very successful • Ability to restrict a VO to SL3 nodes in case of problem • Compatibility with 32-bit and SL3 installed • Unmodified grid MW (gLite3) • Deployment of SL 3.08 32-bit in progress (LAL systems) • Upgrade mostly by reinstallation • May look at SL 4.4 when supported by experiments • All machines updated/reinstalled with Quattor • Grid and non grid • 1 presentation during Hepix on GRIF management • Indico now in production • CDS Agenda has been migrated (including URL rewriting)

Lapp • French tier 3 In2p3 • GPFS • Woodcrest Blades from HP • VMware ESX

LAPP OS and Tools • Monitoring done by Cacti, ganglia and Nagios • Including all the host/server and hardware of the laboratory • Ganglia use also for the computing cluster • Developing own accounting tools for parsing Torque/Maui logs

PSI(.ch) • Uses ssh gateway • Uses NX for x sessions • MS ISA server • External webmail login • OMA, MS Active Sync, RPC over HTTPS for outlook clients • Problem with HA is you can cause more problems with software used to give HA than without.

PSI • VMware player • VMware server 1.0.2 • VM can be on AFS with 1.4.4 and large AFS cache • SL5 Live CD • Use Lustre (on Cray) and GPFS (Linux Clusters)

Caspur • Mix of systems – IBM P5 /2*Opteron clusters (Infiniband / Qsnet) / NEC SX6 / HP EV7 • GPFS/ AFS/ NFS • We then studied 3 possible candidate solutions: MSVS, Vmware ESX and Xen. The first • one did not support some essential hardware and we decided not to consider it. We could • not discard neither Vmware nor Xen as both of them had good features. • - Xen is more performant, supports any hardware we might need and allows to save on the • cost of host machines, but it is quite “spartan”. Vmware based installation is definitively • more expensive but it provides excellent management, backup and monitoring tools. • >>>> We hence decided to use a balanced combination of Xen and Vmware

Caspur 2 • Heimdal Escape Tool • - The CASPUR main Kerberos 5 service is based on Heimdal. We however maintain • several other K5 realms, and they are all MIT based. In several occasions we have • noted that MIT implementation has less integration problems with the Windows world; • we also know that MIT version has more developers behind it, and hence potential • problems are being addressed faster. So we would like to migrate from Heimdal to MIT. • - Heimdal provides a conversion tool for MIT database (i.e., you may easily migrate • from MIT to Heimdal). But you cannot do it in the opposite directon… We will be • investigating this and hope to be able to produce an appropriate solution.

BNLRobert Petkus • Expanding Computer room • Large Tape facility (Storage Tek, and HPSS) • AFS/NFS/Panasas (Panasas and Solaris NFS due to retire) • Evaluating Blue Arc Titan Storage system • >4700 CPU’s • Ganglia, nagios and cati • Temperature monitoring, creates an RT ticket • Asset Tracker RT plug-in • Condor 6.8.4 • Looking at Thumper and SATA beast • Dcache read pools • Phenix 200TB on 365 servers/450 pools • Atlas >430TB on 460 servers/pools • Dcache Write pool nodes • Poor performance/throughput. Currently evaluating disk/server systems to satisfy demand of >600MB/sec • BNL Tier-1 in the largest Atlas T1

GSI • Electrical testing classified anything with a power lead as a mobile device so had to have two days down time to test over 1000 servers…. • Used IPMI modules • Using the new 2 servers in 1u super micro nodes , 16 cores. 32GB RAM 8k euros list price • Network problems with Avaya and Foundry incompatible spanning tree algorithm.

GridKa • New cpu’s specifed total no of spec ints not no. of boxes. Allowed Intel or AMD • Price, power consumption, space, network ports etc • Bidders have to provide the benchmarks • 1 benchmark per core • 168 dual socket dual core intel 5169. 6GB (1.5GB /core) SL4.4 i386

gridka • Cpu perf benchmark from vendor was 5% better than at site. Slightly different h/w and different memory. Kingston ram problem. Vendor agreed to supply 5% more boxes to compensate. • NEC S2900 storage 2380 SATA disks raid 6, older was IBM FASTt700 raid 5

PDSF • HPSS • New computer room 7MW • Petascale Data Storage Institute • Interesting graph on Disk failures showed one particular drive type (160GB) had a significant increase in failures after 3 years compared to various other sizes • Thermal modelling of computer room, seemed to suggest Blades a good option? • Power losses can be avoided by not converting from 480V 3-phase down to 208V or 110V, BladeRack II can take 480V 3 Phase directly. (>83% efficient, cf .60-80%)

Construction Cost Increases • Labor Escalation 1.5 X in a 2 year period • Total Material Cost up 23% since 2004 • Structural Steel Shapes – up 25% • Rebar – up 45% • Lumber Spot Prices – up 95% • Gypboard – up 20% • Copper pipe – up 20% • Crude Oil – up 97% • Electrical Copper • Code Escalation

The PlazaUsable “Green” Roof Space

DPM / dCache Benchmarks • Greig Cowan / Graeme Stewart • rfio test to LAN Storage. • Write a 1G file then read back in. • No of clients wait until a sync time and start • No Batch system • Repeat with increasing client count • Already identified DPM issues – addressed • Developer feedback appreciated! Scotgrid - Site Report / HEPiX 07

DPM Benchmark plot • Sample from last week (1.6.3)‏ Scotgrid - Site Report / HEPiX 07

triumf • Infrastructure, Infrastructure, Infrastructure • Helps when things fail

COLD AISLE Floor ~ 82 m2 Update since Hepix Fall 2006 ATLAS Tier 1 Cooling: Floor plan showing a few Liebert XDH modules • No raised floor • Separation wall from 2x larger power supply area to the right • 30 Kw capacity XDH modules • Rack heat output 5-17kW • Heat Exchangers on roof • ~300Kw heat removed with 100Kw of cooling. • Safety: need leak sensing in area Big Plus: Flexible hoses connect to coolant pipe (operating above dew point) over racks with multiple pre-installed interconnects / valves allowing non-disruptive changes.

Update since Hepix Fall 2006 Main Computing Room Network Equipment (expanding) ATLAS Equipment (expanding) Public Servers (refreshing) • -Very satisfied with Dell hardware • Replacing 100Mbit switches with Nortel 5520 Gigabit and PoE switches • 10G uplinks planned for edge switches

Update since Hepix Fall 2006 Dirvish Backup System at TRIUMF (www.dirvish.com) Supplements Amada Backup: Nightly disk based snapshots Used successfully for last 2 years. • Free – Open Source License • Uses cheap disks • RAID5 8 x 300 GB disk – 3ware controller • Hard links to reference files which are not changing • 2TB EXT3 Filesystem (max link count 32,000) • Directory for each main public server at TRIUMF • Perl scripts using rsync • Unlike tape – only changed part of changed files transferred • Daily increment on all servers (online disk snapshots) • ~ Two week backup cycle (uses about 1.5 x original filesystems’ size)

Update since Hepix Fall 2006 New Video Conference Room

Update since Hepix Fall 2006 New Video Conference Room – 3 screen Panorama (4200h * 1050v) • 3 Dell 5100MP 1400h * 1050v 3300 Lumens 37dbA Projectors • Dell Precision Workstation 390 • Nvidia Quadro NVS440 Graphics • Sony HDR-HC3 Camera (1440h * 1080v) w 4Megapixel stills and 10x optical (wireless-remote) zoom

Update since Hepix Fall 2006 New Video Conference Room: User Hardware Polycom Soundpoint  Table camera Central VGA   Wireless mouse  Wireless keyboard  Right VGA Centre  Dell Projector Remote  Ceiling camera remote

Update since Hepix Fall 2006 New Video Conference Room Ceiling HDTV Camera   Middle screen camera  Audience camera  Table camera  ~5”-50” projectable surface  Ceiling camera remote

INFN Future Infrastructure • High density zones • Self-cooling racks (APC) • Blade Servers (HP 16 BL460 10U each) • Dual Xeon 5130 for each blade (CPU upgrade to quadcore possible) • Power lines: 4 MWatt • CPU/Storage (HD&Tape): 1.4 MWatt • Cooling: 1.4 MWatt • Redundancy for some services: ~1MWatt • Plan is to have everything in place by Dec 2007 • GPFS, Local monitoring system redeye creates a web page every 5 mins listing the errors, (email would be too much)

IN2P3 • AFS moving to solaris only, backup on aix (TSM) • GPFS

CERN • Looking at MS SharePoint • Disk problems, 300 disks needed firmware upgrade. • Windows print infrastructure • Insecure mail protocols blocks • Computer Management Framework on NICE desktops • Automatic screen lock

Cern 2 • Computer security • 1417 incidents • Caused by using banned s/w eg skype • Compromised computers down to 162 (212 in 05, 385 in 04) • 0 day exploits on the rise • Computer Resource Administration, automatic account blockage if no further affiliation with CERN (ie when you are no longer in cern HR db your account is blocked, has hit some of HEPIX members) • SSO making progress

desy • Local indico setup on Failover linux cluster • 100+ million mails transported in 2006, 700.000 on peak day

Tuesday • HA talks from DESY • Use Xen and Solaris • CMS Server System • 100 websites • Helge Meinhards talk on Procurements at CERN • First CERN procurement based on SPECint total • Larger purchase in Dec 06 3M EURO • 3 winners identified Share the risk. • See the video…

Wednesday • File Systems • Hepix FSWG talk • AFS and Lustre • Panasas , GPFS, • Dcache at BNL on Sun Thumper • ZFS at DESY and IN2P3 (Thumpers again) • DPM, dcache and SRM updates • Peter Kelemen’s talk on Silent Corruptions • Checksums,

Thursday • Systems Management • Monitoring • Quattor – M. Jouvin LAL • Nagios – A.Jager GridKa • Every subnet has its own Nagios-server • Use of event handlers • CFEngine A. Elwell Glasgow • SL – Connie Sieh FNAL • SL5 just about to be released • Using “mock” and “pungi” to build rpms and iso’s • Provide limited security support for 3.0.x till 2010 and security errata for SL4.x to at least 2010 • Expect a 3.0.9 • Possible collaboration with centOS, but they have no beta’s and add no extra functionality • Reliability through testing A. Horvath CERN

Friday • Benchmarking M. Michelotto INFN • Does it still make sense to use specint 2000 • Intel Quads cores seem to offer good perf with one job per core • AMD much better than SPEC int would suggest • M. Alef GridKa • Many details on optimisation switches and effects on results • Many Core CPUS - Parallel Computing in HEP • Multi-core perf in HEP apps • Quad cores scale well for simulation and reconstruction compared to dual cores. • 64 bit gives ~20% perf gain

Specint2000: good or bad? • Specint 2000 is used on all the Technical report and agreement with funding agencies • On the other side • Is being retired. I couldn’t find the intel Clovertown 5345 score • Footprint is too small (designed for 200MB per core) • Some processor like Intel5160 have “inflated” SI2000 number, probably because of the huge L2 caches • May be other benchmark have a better correlation with my result? • If I get a good correlation (+/- 10%) I’d consider myself satisfied

Spec CPU2006 • Available since August 2006 • Last evolution of SPEC suite (spec89, 92, 95, 2000) • Includes more C++ then CPU 2000 • Designed to run in about 1GB per core • I could not run more than 3 on some 4 core box because of excessive paging • Less sensitive to cache size • Difficult to find pubblished result for >2y old processors • Major part of pubblished result on MS Windows or Linux + Intel Compiler • More difficult to run than CPU2000, at least with gcc

HEPIX DESY Spring 2007 hepix2007.desy.de/