240 likes | 371 Vues
First Implementation of a Diskless Computer Farm for LHCb. Vincenzo Vagnoni. Bologna , June 13, 2001. Outline. Hardware overview Motherboards and rack mount boxes Disk Storage Remote Power Control Network boot Preboot eXecution Environment (PXE) Server side daemons
E N D
First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni Bologna, June 13, 2001
Outline • Hardware overview • Motherboards and rack mount boxes • Disk Storage • Remote Power Control • Network boot • Preboot eXecution Environment (PXE) • Server side daemons • System Configuration • Linux kernel preparation • The operating system • System administration • Conclusions First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
AGP AGP Chipset GC DRAM Chipset DRAM GC PCI PCI NI CPUs CPUs IDE NI AGP Chipset GC DRAM swap PCI NI CPUs AGP Chipset DRAM GC AGP Chipset GC DRAM PCI CPUs IDE NI PCI NI CPUs swap AGP Chipset GC DRAM PCI PCI Chipset CPU NI NI CPUs host AGP DRAM NI CPU adapter Chipset GC DRAM PCI Switched Node Backplane NI CPUs Switched Switched Switched Switched Switched Switched Node Node Node Node Node Node AGP Disk IF Disk IF Disk IF Disk IF Disk IF Disk IF Chipset GC DRAM PCI NI CPUs Ultra ATA Ultra ATA Ultra ATA Ultra ATA Ultra ATA Ultra ATA Schematic Representation Login Nodes Job Execution Nodes Network Attached Storage Ethernet Switch First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
Motherboards I • 9 bi-processor motherboards (GigaByte 6VXDC7) • Upgrade to 25 foreseen after summer • 2 Pentium III 866 MHz • 512 MB RAM (non ECC) • 2*256 MB modules • Only two peripherals: Graphics card and network Adapter • Completely Diskless First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
Motherboards II • 100 Mb NIC equipped with a Boot PROM • 3Com 3C905C-TX with Managed Boot Agent v4.30 (Lanworks) and PXE v2.20 • Onboard hardware healt monitoring chips (Inter Integrated Circuits – I2C – compatible, linux drivers “lm_sensors” exists) for temperatures and fan speed readout • Arranged in a 2U rack mounted box hosting also a standard power supply and 3 fans • Current absorption: 300 mA (idle), 600 mA (200% load) First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
How they look like First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
Disk Storage • Network Attached Storage solution • RaidZone OpenNAS RS15-R1200 • 14*80 GB EIDE Hard Disks (+ 1 auto hot spare) • Hardware controlled RAID-5 (total usable disk area 1 TB) • Dual Pentium III 800 MHz • 256 MB ECC RAM • Two Network Cards configured in port trunking (200 Mb/sec) - upgrade to Gigabit is possible • Dual redundant power supply • Operating System: RedHat Linux patched by RaidZone • Suggested File System: ReiserFS (we use it) First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
How it looks like First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
Considerations on the NAS I • System reliable and pretty stable • With the latest available kernel (a 2.2.17 patched by RaidZone) no problem observed (about one month of continuous intensive usage) • Very flexible • It looks to the administrator like a normal RedHat Linux system • It can host any kind of services (PXE, DHCP, APACHE, …), and it does in our case First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
Considerations on NAS II • Good performance • About 50 MB/sec local reading, 35 MB/sec local writing (with RAID-5 and ReiserFS) – very close to real life SCSI Ultra 160 RAID-5 arrays performance • Almost full network bandwith used for (not small) file transfers through NFS (about 160 Mb/sec) • Performance more than adequate for MC production issues – for Analysis jobs more thinking is needed • Very compact • 1 TB (could be 2 TB by using recent 144 GB disks) in about 4U • Fairly cheap • 20000 $ in the US, something more in Italy First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
Remote Power Control • Even if a Linux based system is usually rather stable, it can happen that a system hangs up • This event can be in general not so rare in large installations • Possible solution: remote control of PCs power input • National Instruments Distributed I/O modules, controlled via network • FieldPoint Ethernet Controller FP1600 • It controls up to nine FieldPoint FP-RLY-420 modules • Each FP-RLY-420 is equipped with 8 independent relays, i.e. 8 channels • Total: 8*9=72 independent power channels at maximum handled by one ethernet controller • Client GUI provided for Windows (not for Linux until now) • With this last instrument the system can be almost completely controlled from remote sites, except in case of serious failures • It can help a lot the system administration First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
How it looks like (in our arrangment) First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
Hardware Summary First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
Putting all together First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
The Network Boot I • Each booting client must be equipped with a network boot code installed either in the system BIOS or in a PROM on the Network Interface Card • In our case we make use of 3COM 3C905C-TX NICs with on card boot PROM • Several pre-boot procedures are available on nowadays NICs • Novell RPL, based on Netware: requires a Novell server or emulator… forget it • TCP/IP, i.e. DHCP/TFTP based • Intel PXE (our choice), similar to TCP/IP procedure but more flexible and probably going to become a standard First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
The Network Boot II • Preboot eXecution Environment (PXE) protocol • Client preboot code available on most modern NICs (suggested 3COM or Intel) • Intel defined the protocol and provides a set of APIs to write server codes • RedHat developed and distributes a package to serve boot images to PXE clients • Three phases • Pre-boot phase: the client configures its network by means of (extended) DHCP requests and dowloads the boot image(s) • Kernel-boot phase: the kernel boots and makes a new (standard) DHCP request, then mounts the root file system over NFS • Operating System boot phase: the operating system can now boot more or less as usual First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
PXE Boot Sequence First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
The server side daemons • Three different servers necessary • DHCP • Provides standard DHCP informations • PXE (RedHat provides an implementation) • Provides non-standard DHCP extensions specified by the PXE protocol through a Proxy-DHCP service, e.g. provides information to the clients for multiple boot menu options to be chosen interactively at boot time by the user • This can be useful, for example, to boot different kernel images during tests or to boot diagnostic programs (e.g. memtest86 to test memory health) • TFTP or MTFTP (Multicast TFTP, also provided by RedHat) • Downloads to the clients the Network Bootstrap Program (NBP), the Linux kernel image and optionally an initial ram disk image • NBP is a small piece of code that takes the control, downloads the linux kernel and can pass configuration parameters to it • Multicast based implementation of TFTP can be useful in occasional simultaneous boots of several machines to avoid network overload First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
Linux Kernel Preparation • To perform a remote boot, with a diskless configuration, the Linux kernel must be prepared accordingly (>2.2.17 is suggested) • No patches are necessary, simply some changes in the configuration before compile time • What happens after kernel download • When the kernel completes its boot procedure it still doesn’t have a mounted file system, and in order to reach the remote NFS file server it needs to configure dynamically its network parameters • It has to make a DHCP request • To do that the kernel must be compiled with built-in DHCP client support, built-in Network adapter driver and network auto-configuration enabled • After the network adapter has been auto-configured the kernel can start to mount the root file system over NFS • the NFS server address has been already provided by means of DHCP • To proceed with this operation NFS client support must be compiled resident into the kernel and the root over NFS option must be enabled First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
Operating System Configuration • Once the root file system (that is placed on the NAS) has been mounted, the system proceeds more or less as usual • We installed the CERN certified RedHat 6.1 release, with slight changes in some startup and shutdown scripts (for example to delay network and NFS shutdown until the rest of the shutdown procedure is terminated) • Directory sharing is similar to that of a typical cluster configuration: some directories must be private and resident in the root tree for each node (/var, /tmp, /dev, /etc, /lib, /bin, /sbin), while some others are shared among the nodes (/usr, /opt, /home, …) First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
System Administration • System administration becomes rather simpler with a centralized filesystem • The file systems of every node are all accessible under the NAS filesystem at the same time (e.g. no need to perform logins on different machines to edit files, delete, move, etc.) • System backups can be centralized and performed in one single step on the NAS • No risk of damage if a system is hard rebooted (no fsck because no local disk is used) • To perform the OS installation of a new machine a simple script that duplicates some directories and makes some simple operations on the NAS is sufficient • A new installation is performed in 30 seconds • No need to develop packages to automatize installations on each different node • The installations are by default identical (the filesystems are built by simply copying the directories from a central repository on the NAS) First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
Diskless configuration drawbacks • There are some possible drawbacks to address • The absence of a disk swap area essentially means that the job memory demand must strictly fit into the RAM, otherwise the job is abruptly terminated • Anyway one can say that it doesn’t make a lot of sense to make intensive computing on a system that is heavily swapping memory pages • instead of buying a local disk buy more RAM! • On the contrary, on machines dedicated to interactive sessions, a local swap area can be necessary and should be added First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
Status Summary • The farm prototype in its diskless implementation is installed at CNAF (Bologna) and is fully operational • 9 machines (18 processors) with PIII at 866 MHz available up to now - upgrade to 25 machines foreseen after summer • First tests on intensive Monte Carlo production of minimum bias events (jobs scheduled by PBS) are in progress since a couple of weeks and no problem is observed • First release of tools available for monitoring (see Domenico’s talk) • system health (temperatures, fan speed) • disk availability • CPU loads • network load • batch queue length First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni
Conclusions • An overview of the main concepts for the hardware design and the system configuration of the installed farm prototype has been given • Rack mounted • Completely diskless nodes, except those dedicated to interactive sessions • Network Boot through Intel PXE • Nodes file-systems, disk data storage and basic services provided by a NAS with 1TB disk array in RAID-5 • Remote power managment • Linux kernel preparation and operating system configuration • Other details can be found on LHCb Computing note 2001-088 • We are on the way… First Implementation of a Diskless Computer Farm for LHCb Vincenzo Vagnoni