100 likes | 122 Vues
The SLAC Cluster. Chuck Boeheim Assistant Director, SLAC Computing Services. Components. Solaris Farm 900 single CPU units Linux Farm 512 dual CPU units AFS 7 servers, 3 TB NFS 21 servers, 16 TB Objectivity 94 servers, 52 TB LSF Master, backup, license HPSS Master + 10 tape movers
E N D
The SLAC Cluster Chuck Boeheim Assistant Director, SLAC Computing Services
Components • Solaris Farm 900 single CPU units • Linux Farm 512 dual CPU units • AFS 7 servers, 3 TB • NFS 21 servers, 16 TB • Objectivity 94 servers, 52 TB • LSF Master, backup, license • HPSS Master + 10 tape movers • Interactive 25 servers, + E10000 • Build Farm 12 servers • Network 9 Cisco 6509 switches
Staffing • Same staff supports most Unix desktops on site
Physical • Racking, power, cooling, seismic, network • Remote power management • Remote console management • Installation • Burn-in, DOAs • Maintenance • Replacement burn-in • Divergence from original models • Locating a machine
Networking • Gb to servers • 100Mb to farm nodes • Speed matching (problems) at switches • Network glitches and storms • Network monitoring
System Admin • Network install (256 machines in < 1 hr) • Patch management • Power Up/Down • Nightly maintenance • System Ranger (monitor) • Report summarization • “A Cluster is a large Error Amplifier”
User Application Issues • Workload scheduling • Startup effects • Distribution vs Hot Spots • System and Network Limits • File descriptors • Memory • Cache contention • NIS, DNS, AMD • Job Scheduling • Test Beds