170 likes | 290 Vues
RALPP Site Report. HEP Sys Man, 11 th May 2012 Rob Harper. My talk will be. Where we’re at now Our new stuff, including GridPP purchases DRI networking kit Benchmarking and hyperthreads Virtual machine infrastructure Managing configuration and stuff: cfEngine vs Puppet
E N D
RALPP Site Report HEP Sys Man, 11th May 2012 Rob Harper
My talk will be... • Where we’re at now • Our new stuff, including • GridPP purchases • DRI networking kit • Benchmarking and hyperthreads • Virtual machine infrastructure • Managing configuration and stuff: cfEnginevs Puppet • Future stuff
RALPP For Dummies • Part of SouthGrid • Staff • Chris Brew (part) • Rob Harper (part) • One cluster serving Tier 2 (85%) and Tier 3 (15%), managed by Torque/Maui • dCache storage
RALPP CPU • Cluster is currently nominally: • 2,872 Job slots • 26,409 HS06 • Where available, hyperthreads used to get 150% of physical cores
RALPP Storage • TB
RALPP Storage • 1,060 TB in production • Soon to be 1,260 TB
New Stuff: GridPP Purchases • CPU: • 9 * Viglen/Supermicro Twin2 • Intel E5645 based • 48 GB / node • Using hyperthreads • => 648 job slots, 6208 HS06 • Disk: • 5 * Viglen/Supermicro 24 bay storage nodes • => 200 TB of disk pool
New Stuff: Networking • DRI money bought us: • 5 * Force10 s4810 switches • A heap of 10Gb NICs for older disk pool nodes • A heap of 10Gb cables • Coming soon: a much reconfigured network...
Benchmarking & Hyperthreads • We ran HS06 benchmark on a heap of nodes with varying numbers of concurrent benchmark jobs • Going past # of physical cores did give us some gains
Benchmarking & Hyperthreads • So we committed 1.5 * physical cores as job slots for some nodes and ran real jobs • No significant drop in efficiency • More work done • Many details on SouthGrid blog at http://bit.ly/Iu7BfS
Virtual Machines • Current set-up: • Xen VMs spread between a couple of servers • Local storage, nothing clever • Currently in test: • Cluster running HyperV • Yes, we’ll be running Linux VMs on Windows • EqualLogic storage • iSCSI • Mirroring, etc.
Configuration Management • Already much discussed yesterday, but here’s our perspective... • We currently rely on cfEngine v2 • This is not supported natively on SL6 (or at all) • Main options seem to be: • Crowbar in legacy cfEngine • cfEngine v3 – will need configs rewritten • Switch to Puppet – will need configs rewritten
Puppet • Puppet seems to be a strong choice • Particularly as other Tier 2s are coming to the same decision • Not got far yet • We have a working Puppet Master with some basic manifests set up • We have an SL6 client for test purposes • Planning to use Puppet for SL6 hosts as we set them up – leaving SL5 kit on cfEngine
Puppet • Our cfEngineconfig relies massively on EditFiles functionality • Puppet does not have this • Can run scripts to do edits • Can use modules (eg. iptables) that do the work for you • We need to learn to think in a different way to take advantage of Puppet
Things to come... • Getting network configuration updated • Start deploying VMs in HyperV • Getting Puppet configuration management running properly • Start using SL6 as a standard install for services where we have no reason not to • Improved monitoring