High Performance ComputingatSaint Louis University Keith R. Hacke Director, ITS Advanced Technology Group
Agenda • HPC Defined • HPC at SLU • Proof of Concept will get you started for no cost • HPC Architecture Overview • HPC Software Survey – what we explored • Intel or AMD. Ethernet or InfiniBand • Xeon 5500 Series (Nehalem) • Monitoring • Additional Performance Enhancements • Lustre File System, InfiniBand, Solid State Disks • Energy Savings • Visualization Feature (Rocks Roll) • Top500 Supercomputer List • HPC Cluster Issues • Resources
HPC Defined • High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems and scientific research. • Originally pertaining only to supercomputers for scientific research, migrated to racks of high-speed, low cost serverstied together for greater speed and capacity. • A main area of this discipline is developing parallel processing algorithms and software: programs that can be divided into little pieces so that each piece can be executed simultaneously by separate processors.
Beowulf Defined • Collection of commodity computers • Using commodity network • Running open-source operating system • Programming model is Message Passing
Why all this HPC stuff?? • SLU needed to provide better support to its researchers • Behind in cluster/parallel computing services • Foundation to increase grant awards • Let scientists focus on science, not computer administration • Core set of parallel applications needed (used at most major universities)
SLU’s Goals • Support services for critical core research applications with unified infrastructure hardware (Databases, MrBayes, PAUP, Gaussian 09, parallel programs, MATLAB, Mathematica, Visualization, SAS, SPSS, etc) • Improve management and monitoring of infrastructure to proactively detect and solve problems. • Increase performance “on the fly”; add additional capacity for future needs. • Unify purchasing - use our purchasing power and timing to drive down costs. • Centralize infrastructure support yet continue supporting distributed application support model to reduce duplication of staff • Create higher skilled talent pool to manage more complex system, yet meet SLU applications computing needs with less people • Streamline and focus audit controls, change management controls, and security requirements into unified solution also reducing support and staffing needs
Scope • Support for research projects in, for example: • mathematics and computer science • chemistry • biology • physics • economics Yes, one HPC can support all these needs!
The Architecture High Performance Computing High Throughput Computing *Monitoring *Compilers *Applications *Parallel Tools *Job Submission Shared Front End Node Users SLU BACKBONE NETWORK Users HTC Users InfiniBand Switches Gigabit Ethernet Switches Legend Compute node Gigabit Ethernet InfiniBand Storage Array Front End Node High Throughput Storage (Dedicated to DoD/Chemistry) Existing Storage Gigabit Ethernet *Maintenance *Job Submission *Job Data I/O Gigabit Ethernet *Maintenance InfiniBand *Job Submission *Job Data I/O Existing HPC Cluster Chemistry DoD HTC Cluster
Getting Started Focus on your work (research) not the cluster! Proof of concept!
Boeing Donation + Student Labor Lead to winning Army Contract for CFD simulations. Up to $250K and 50 nodes! • Proof of Concept • Student labor • 32 Bit • 100 Mbps Ethernet • Demonstrated to Army we could: • Build a cluster • run parallel jobs. • 10x speed up!
Internship Program at SLU • Technology solutions using zero to minimal costs to increase learning outcomes for students • Provide students with learning experiences that bridge professional career development and enhance experiential learning. • Effective student involvement/labor to engage students and draw in faculty • Exploring green solutions for computing increases sustainability – real life stuff! AVAILABLE FOR INTERNSHIPS!!!
Top500 Supercomputer Sites: • 83% Cluster Architecture (16% MPP) • 78% Linux • GigE 52%, InfiniBand 36%
HPC Solutions Explored • Dell/Sun/HP/IBM – each has solutions $$$($$) • BCCD – The Bootable Cluster CD • http://bccd.net/ • A Computational "Wet Lab" for distributed computing! A cluster in your pocket! • Kerrighed • http://www.kerrighed.org/wiki/index.php/Main_Page • Kerrighed is a Single System Image operating system for clusters. Kerrighed offers the view of a unique SMP machine on top of a cluster of standard PCs. • ClusterKnoppix • http://clusterknoppix.sw.be/ Dead project?? • OpenMosix • http://openmosix.sourceforge.net/ (Died 3/2008)
openMosix Live CD Cluster in 20 minutes! Does not touch HD!
Solutions Explored • MOSIX/MOSIX2 • Multicomputer Operating System for UnIX. • an on-line management system targeted for high performance computing on Linux clusters, multi-clusters and Clouds. • It supports both interactive processes and batch jobs. • A multi-cluster operating system that incorporates automatic resource discovery and dynamic workload distribution, commonly found on single computers with multiple processors (think SMP over the cluster). • Low cost - $ • MOSIX Reach the Clouds (MRC) is a tool that allows applications to start on your workstation or your cluster and run in remote nodes on other clusters, e.g. on Clouds, without pre-copying files to these remote nodes. • http://www.mosix.org/index.html
Solutions Explored • Rocks Clusters • Rocks is an open-source Linux cluster distribution that enables end users to easily build computational clusters, grid endpoints and visualization tiled-display walls. • 1575 registered clusters, 122,542 CPUs • Roll based installations • Sun Grid Engine supported (an open source batch-queuing system, developed and supported by Sun Microsystems (N1 Grid Engine is the commercial version). • Ganglia monitoring • BIO roll – many Biology Applications of interest at SLU • NSF Grant funded (2000-2007), and follow up grants.
Intel or AMD • Financial Position (9/2009) • Intel • $11.6B in cash • Long-term dept $1.2B • Net Cash position is $10.5B • AMD • Short-term investments rose to $2.5B • Long-term debt $5.2B • Net cash position is -$2.7B Intel can keep funding aggressive R&D, next-generation fabs and other capital requirements
Sweet Spot = 2.66GHz SLU Choose X5550 @ 2.67GHz
New Nehalem (Xeon 5500) Plug compatible with next Gen Intel processor (Westmere) = 2-4 years life addition = 2X performance likely = Same power footprint
New Nehalem (Xeon 5500) • Power Savings under load • 14% • For SLU that is $250-$500 for power and cooling
Ethernet or InfiniBand InfiniBand = $1500/compute node http://www.top500.org/charts
Compute Node:Sun Fire X2270 Server 1 Rack Unit 2 Gigabit Ethernet Ports and one network management 12 1333Mhz Registered DDR3 DIMM slots (48GB mem) Maximum memory bandwidth of 64GB/s up to four 3.5-inch SATA drives (4TB) hot-swapable or fixed can accommodate four 32GB Solid State Drives One 16-lane half-length, low-profile slot PCI Express 2.0 slot provides 16GB/s of bandwidth
Cost • Front End - Dual Nehalem x5570 2.96GHz quad core CPUs, 32GB DDR3 Memory (Sun Fire X4270) • Compute Nodes - Dual Nehalem X5550 2.66 GHz quad core CPUs, 12GB DDR3 Memory, 500GB scratch disk space each (Sun X2270) • Storage - 24 Terabyte NAS storage array, 32GB Memory (Sun Fire X4540), NFS attached via Gigabit Ethernet 74GFLOPS/cpu x20 Dual processor quad cores compute nodes = 2.96TFLOPs For $68K or $22.97/GFLOP ($25K not included for Front End and Storage)
Additional Performance How can we speed things up even more?
Lustre File System • Object-based cluster file system. High performance I/O and metadata throughput. • A Lustre file system has three major functional units: • A single metadata target (MDT) per filesystem that stores metadata, such as filenames, directories, permissions, and file layout, on the metadata server (MDS) • One or more object storage servers (OSSes) that store file data on one or more object storage targets (OSTs). Depending on the server’s hardware, an OSS typically serves between two and eight targets, each target a local disk filesystem up to 8 terabytes (TBs) in size. The capacity of a Lustre file system is the sum of the capacities provided by the targets • Client(s) that access and use the data. Lustre presents all clients with standard POSIX semantics and concurrent read and write access to the files in the filesystem. • File I/O % of raw bandwidth: >90% • Achieved single OSS I/O: >2.5 GB/s • Achieved single client I/O: >2.0 GB/s • Achieved aggregate IO = 240 GB/s • Metadata transaction rate: 15,000 ops/s • Maximum clients supported: 100,000 • Maximum file/file system size: 320 TB/>32 PB http://www.sun.com/software/products/lustre/
InfiniBand • Industry-standard specification that defines an input/output architecture used to interconnect high performance servers and storage • A switched fabric communications link • Features include QoS and low latency • Low overhead • Up to 120 Gb/sec $1500/node
Solid State Disks (SSD) • A data storage device that uses solid-state memory to store data. • Emulates a disk drive. • NAND flash components reduce cost (limited write cycles too!) • Dense and cool (no moving parts) support 10 to 20 operations in parallel • At least 3X orders lower latency than spinning drives • Still at least 2X cost per GB • Disparity in read and write speeds (35K IOPS read, 3.3K IOPS write) Solid state drive mounted in 3.5 inch SATA carrier
Energy Savings Watts up meters are "plug load" meters that measure the amount of electricity used by whatever is plugged into them. USB connection to PC to collect recorded data .Net version sends data to Watts Up web site. $95 to $235 https://www.wattsupmeters.com/secure/products.php?pn=0
Energy Stats Saved to Web! https://www.wattsupmeters.com/secure/products.php?pn=0
HPC Energy Usage On CFD problem, Sun 2X performance over Dell 1950 $500 annual Power and ACsavings PER node Threads
HPC Cluster Issues • Clusters are phenomenal price/performance computational engines • Can be hard to manage without expertise • Software skew – software differences between nodes • Adequate Job control of parallel process • High-performance I/O is still difficult ($$$) • Finding out where something has failed increases at least linearly as cluster size increases • Programming could be vastly improved • Technology is changing very rapidly
Resources • http://www.hpcwire.com • http://www.clustermonkey.net/ • http://www.linuxhpc.org/