310 likes | 463 Vues
Hochleistungsrechnen mit Commodity-Komponenten Zuverl ä ssigkeit durch Redundanz. Rainer Mankel DESY Hamburg. One World Camp, Prora, 28-Aug-2002. DESY in General. National center of basic research in physics Member of HGF Sites: Hamburg + Zeuthen (near Berlin)
E N D
Hochleistungsrechnen mit Commodity-Komponenten Zuverlässigkeit durch Redundanz Rainer Mankel DESY Hamburg One World Camp, Prora, 28-Aug-2002 R. Mankel, Zuverlaessigkeit durch Redundanz
DESY in General • National center of basic research in physics • Member of HGF • Sites: Hamburg + Zeuthen (near Berlin) • About 1600 employees, including 400 scientists • 1200 users in particle physics from 25 countries • 2200 users in HASYLAB from 33 countries R. Mankel, Zuverlaessigkeit durch Redundanz
DESY in a Nutshell • HERA ep collider with four experiments: H1 (ep), ZEUS (ep), HERMES (eN), HERA-B (pN): reconstruction, analysis, ... • Accelerators: machine controls • HASYLAB: synchrotron radiation • TTF • … R. Mankel, Zuverlaessigkeit durch Redundanz
DESY: Future Projects • PETRA as a New High Brilliance Synchrotron Radiation Source: DESY plans to convert the PETRA storage ring into a new high brilliance third generation synchrotron radiation source. 1.4 MEUR from Federal Ministry of Education and Research for design phase • design report end 2003 • construction start in 2007? • TESLA: • e+e- Superconducting High Luminosity Linear Collider (0.5 ... 0.8 TeV) • integrated X-ray laser • July 2002: very positive statement from German Science Council (Wissenschaftsrat) R. Mankel, Zuverlaessigkeit durch Redundanz
History of Computing Solutions at DESY • Mainframe era until ~1992 IBM/370, MVS-3, homemade editor etc • RISC multi-processor 1992-2002 SGI Challenge, UNIX • PC farms 1997-today R. Mankel, Zuverlaessigkeit durch Redundanz
Mainframe, SMP Commodity hardware DM, Lit, Pta, ... Technologies: General Transitions IRIX, HP-UX, … R. Mankel, Zuverlaessigkeit durch Redundanz
Principal Differences: Hardware Vendor System (SGI, IBM,...) • Hardware made for professional use • Performance goals far beyond “normal household” • Components fit together • Reputation at stake • Service • Price Commodity System (PC) • Hardware made for home or small business use • Performance goals set e.g. by video games industry • Usually no vendor guarantee for whole system • Needs local support • Price R. Mankel, Zuverlaessigkeit durch Redundanz
Principal Differences: Software Vendor Software • Software made for professional use • Documentation part of the product • Support • Price • No access to source code Open Source Software (Linux...) • No warranty whatsoever • Nobody to complain to • Some features never get implemented • Amazing development speed • Huge worldwide resource of idealists • Price • Access to source code “The Cathedral” “The Bazaar” R. Mankel, Zuverlaessigkeit durch Redundanz
Vendor vs. Commodity: Conclusion • Use of commodity hardware and software gives access to enormous computing power, but • Much effort is required to build reliable systems R. Mankel, Zuverlaessigkeit durch Redundanz
DESY Central Computing (IT Division) • O(70) people • Operating ~all imaginable services (mail, web, registry, databases, AFS, HSM, backup (Tivoli), Windows, networks, firewalls, dCache...) • Tape storage: 4 STK Powderhorn tape silos (interconnected) • media: 9840 cartridges (old, 20GB), 9940B (new, 200 GB) R. Mankel, Zuverlaessigkeit durch Redundanz
Computing of a HERA Experiment: ZEUS • General purpose ep collider experiment • About 450 physicists • Expect 20-40 TB/year of RAW data after luminosity upgrade • whole of DESY approaches PB regime during HERA-II lifetime • O(100) modern processors in farms for reconstruction & batch analysis • MC production distributed world-wide („funnel“), O(3-5 M events/week) routinely • funnel is an early computing grid At the electron-proton collider “HERA” e 27 GeV p 920 GeV R. Mankel, Zuverlaessigkeit durch Redundanz
A ZEUS Collision Event R. Mankel, Zuverlaessigkeit durch Redundanz
Tape storage incr. 20-40 TB/year MC production Data processing/ reprocessing Data mining Disk storage 3-5 TB/year ~450 Users General Challenge (ZEUS) O(1 M) detector channels 50 M 200 M Events/year Interactive Data Analysis R. Mankel, Zuverlaessigkeit durch Redundanz
HSM HSM HSM HSM HSM 1Gb/s SWITCH FILE SERVERS FARM SERVER 2 x 48 100Mb/s 100Mb/s 1Gb/s PC FARM Network Structure R. Mankel, Zuverlaessigkeit durch Redundanz
ZEUS Hardware R. Mankel, Zuverlaessigkeit durch Redundanz
Batch Farm Nodes are Redundant jobs • each individual farm node has same functionality • not critical if ~3 nodes of 100 are down • can use „cheap“ PCs Scheduler (e.g. LSF) functional nodes currently broken nodes R. Mankel, Zuverlaessigkeit durch Redundanz
Performance of Reconstruction Farm old farm new farm new farm + tuning 2 M Events/day R. Mankel, Zuverlaessigkeit durch Redundanz
RAID Technologies • RAID = Redundant Array of Inexpensive Disks • RAID0 = Striping • data simultaneously written to several disks • fast reading and writing • no redundancy • RAID1 = Mirroring • several disks with same information • slow writing, normal reading • very expensive redundancy • bad scalability Pictures taken from R. Berlich R. Mankel, Zuverlaessigkeit durch Redundanz
RAID Technologies (cont´d) • RAID3 = Striping with special parity disk • failure of one disk can be compensated • relatively fast reading and writing • relatively fast • RAID5 = striping with dynamically assigned parity disk • failure of one disk can be compensated • no individual disk can become bottleneck R. Mankel, Zuverlaessigkeit durch Redundanz
Workgroup Servers • Provide user with local CPU power and disk space (10-100 GB per user) • Typically used for analysis of n-tuples • Outage of such a system is much more critical than that of a farm node • Use very sturdy PC R. Mankel, Zuverlaessigkeit durch Redundanz
19” DELFI1* 2x 40 GB system (mirrored) 2x 80 GB workgroup space 3Ware 7850 controller 2x 40 GB system (mirrored) 6x 80 GB workgroup space stripe or RAID5 3Ware 7850 controller • for high-availability applications (workgroup servers) *DESY Linux File Server R. Mankel, Zuverlaessigkeit durch Redundanz
Commodity File Servers DELFI3 • custom built (Invention: F. Collin / CERN) • 2x 40 GB system(EIDE) • 20x 120 GB data • 3 RAID controllers • Gb ethernet • 2.4 TB of storage for 13000 EUR R. Mankel, Zuverlaessigkeit durch Redundanz
Monitoring • Efficient monitoring is a key for reliable operation of a complex system • Three independent monitoring systems introduced in ZEUS Computing during the shutdown: • LSF-embedded monitoring • statistics on time each jobs spends in queued/running/system-suspended/user-suspended state • quantitative information for queue optimization etc • SNMP • I/O traffic and CPU efficiency • web interface • history • NetSaint, now called Nagios • availability of various services on various hosts • notification • automated trouble-shooting R. Mankel, Zuverlaessigkeit durch Redundanz
Example for SNMP-based Monitoring 90% CPU efficiency 1-3 MB/s input rate R. Mankel, Zuverlaessigkeit durch Redundanz
NetSaint Monitoring system • Hosts, network devices, services (e.g. web server), disk space,… • thresholds configurable • Web interface • Notification (normally Email, if necessary SMS to cellular phone) • History R. Mankel, Zuverlaessigkeit durch Redundanz
Reliability Issues • Tight monitoring of system is one key to reliability, but... • Typical analysis user needs to access huge amounts of data • In large systems, there will always be a certain fraction of • servers which are down or unreachable • disks which are broken • files which are corrupt • It is hopeless to operate a large system on the assumption that everything is always working • this is even more true for commodity hardware • Ideally, the user should not even notice that a certain disk has died, etc • jobs should continue R. Mankel, Zuverlaessigkeit durch Redundanz
Summary • Commodity computing has taken over in HEP computing • Commodity equipment gives unprecedented computing power, but requires a dedicated fabric to work reliably • redundant farm setups • redundant disk technology • efficient disk caching system for tape data R. Mankel, Zuverlaessigkeit durch Redundanz
Outlook • Our present performance benefits come largely from the fact that devices developed for video games can also be used for serious computing • What will come after the classical PC • network computers? • play stations? • … • On the horizon: GRID computing R. Mankel, Zuverlaessigkeit durch Redundanz