1 / 1

Farm Management

Network: 300 Mb/s. Disk write: 50 Mb/s. System CPU: ~60%. User CPU: ~100%. PerfMC. Stylesheets. HTML Pages. In-core Status. XML Configuration File. RRD. <?xml version="1.0" standalone="no"?> <!DOCTYPE monitor SYSTEM "monitor.dtd"> <monitor> ... </monitor>. SNMP Poller.

mari-cross
Télécharger la présentation

Farm Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network: 300 Mb/s Disk write: 50 Mb/s System CPU: ~60% User CPU: ~100% PerfMC Stylesheets HTML Pages In-core Status XML Configuration File RRD <?xml version="1.0" standalone="no"?> <!DOCTYPE monitor SYSTEM "monitor.dtd"> <monitor> ... </monitor> SNMP Poller XSLT Engine Filter (PHP...) HTTPD Host 1 Host 2 Host n Graphs Monitored Hosts https://bbrweb.pd.infn.it:5212/farm/ D. Andreotti1) , A. Crescente2), A. Dorigo2), F. Galeazzi2), M. Marzolla3), M. Morandin2), F. Safai Tehrani4), R. Stroili2), G. Tiozzo2), G. Vedovato2) 1) I.N.F.N. of Ferrara, Italy,2) Univ. and I.N.F.N. of Padova, Italy ,3) Univ. “Ca’ Foscari”, Venezia and I.N.F.N. of Padova, Italy ,4) I.N.F.N. of Roma, Italy and the BaBar Computing Group A new dedicated facility for (re)processing of BaBar raw data, supported by INFN, has been installed in Padova (Italy) in 2002 as part of the distributed TierA system at disposal of the experiment. The facility consists of four independent farms, each capable of processing 2 million events (corresponding to 160 pb-1 of raw data) per day. Reconstructed data are stored in an Objectivity federation, checked and finally transferred to SLAC. The facility exploits commodity CPU and disk storage while preserving good reliability, high performance and well organized system management. The center, which now counts on approx. 200 dual CPU PIII and 30 TB of disk space, has been in operation since October 2002 and experience so far has been very satisfactory. • First BaBar Data Processing farm fully based on: • Linux • cheap hardware Farm Performance System is continuously stressed! Existing hardware: All machines: 2 x 1.26 GHz CPU, 1 GB ram • 140 clients, 40 GB local IDE disk (software RAID) • 20 servers, same configuration as clients, Gigabit ethernet • 30 storage servers, 1.28 TB IDE disk with 3ware RAID controller, Gigabit ethernet • 5 “PR” servers, up to 0.35 TB SCSI disk 10k RPM, with SCSI controller ServeRaid, Gigabit ethernet • one tape library for 700 LTO tapes (70 TB uncompressed) New acquisitions: • new tape library for 700 LTO2 tapes (140 TB uncompressed) • 103 clients, 2 x Xeon 2.4 GHz, 2 GB ram • 14 storage servers, 2 x Xeon 2.4 GHz, 2 GB ram. 1.4 TB IDE disk • 10 “PR” servers, 2 x Xeon 2.4 GHz, 2 GB ram Extensive work done to optimize resources and to reduce bottlenecks (e.g., minimizing usage of NFS) time_of_day time_of_day Farm Monitoring Machines are organized into: • 4 identical farms, 60 CPUs each • 160 pb-1/day/farm • ~2,000,000 events/day/farm (output) • 160 GB/day/farm input (raw) data • 330 GB/week/farm output (Objy) data • Based on: • SNMP, to be compatible with widest variety of hardware;using asynchronous non-blocking SNMPv2 bulk Get requests • RRDtool library, for graphs. PerfMC (presented @ CHEP03), a high performance monitoring program developed for this farm: • scalable • efficient • requires low resources • easily configurable using XML • operates in background (no GUI) Farm Management Using IBM's xCAT (eXtreme Cluster Administration Toolkit) allowing: • remote power control (*) • remote BIOS console (*) • remote OS console • remote software reset • parallel remote shell • network installation • …. (*) on IBM machines only • Monitored quantities: • CPU • Disk I/O • Network I/O • Temperatures • Total disk needed for whole farm: 5 GB. Screenshot of parallel installation of >100 clients MySQLwidely used for farmmonitoring,management and production: 12 databases, 3.5 GB total First Boot Machines must support PXE SysAlarm Home-made Perl tool to parse system logfiles and save errors in MySQL database. Software installation Kickstart installation method preferred, because easier to configure according to machine type. Cloning (hard disk copy) or imaging (partition copy) methods also possible. Can use 2nd level repositories. • Problems: • vendor driver availability and support for different Linux releases • had to recompile for large file support • nfs not optimal under (heavy load on) Linux • Network configuration • All machines on a private network. • A few front-end machines have two interfaces. • Public machines resolve private names using a NIS server. Log server: used tocentralize system logs on one machine

More Related