230 likes | 369 Vues
This document outlines the operational capabilities of the NSF's Terascale Computing System (TCS) as of March 2003. It addresses key performance metrics, including computational and communication efficiency, while presenting user feedback on system reliability and performance satisfaction. Key insights include a performance peak nearing 74% on LINPACK benchmarks, with users reporting high levels of usage and a need for more computational time. The TCS, designed for advanced scientific and engineering applications, integrates cutting-edge hardware and software to support its operations.
E N D
SOS7: “Machines Already Operational”NSF’s Terascale Computing System SOS-7 March 4-6, 2003 Mike Levine, PSC
Outline • Overview of TCS, the US-NSF’s Terascale Computing System. • Answering 3 questions: • Is your machine living up to performance expectations? … • What is the MTBI? … • What is the primary complaint, if any, from users? • [See also PSC web pages & Rolf’s info.]
Q1: Performance • Computational and communications performance is very good! • Alpha processors & ES45 servers: very good • Quadrics bw & latency: very good. • ~74% of peak on Linpack; >76% on LSMS • More work on disk IO. • This has been a very ease “port” for most users. • Easier than some Cray Cray upgrades.
Q2: MTBI (Monthly Average) • Compare with theoretical prediction of 12 hrs. • Expect further improvement (fixing systematic problems).
Time Lost to Unscheduled Events • Purple: nodes requiring cleanup • Worst case is ~3%
Q3: Complaints • #1: “I need more time” (not a complaint about performance) • Actual usage >80% of wall clock • Some structural improvements still in progress. • Not a whole lot more is possible! • Work needed on • Rogue OS activity. [recall Prof. Kale’s comment] • MPI & global reduction libraries. [ditto] • System debugging and fragility. • IO performance. • We have delayed full disk deployment to avoid data corruption & instabilities. • Node cleanup • We detect & hold out problem nodes until staff clean. • All in all, the users have been VERY pleased. [ditto]
Full Machine Job • This system is capable of doing big science
TCS (Terascale Computing System)& ETF • Sponsored by the U.S. National Science Foundation • Serving the “very high end” for US academic computational science and engineering • Designed to be used, as a whole, on single problems. (recall full machine job) • Full range of scientific and engineering applications. • Compaq AlphaServer SC hardware and software technology • In general production since April, 2002 • #6 in Top 500; (largest open facility in the world:Nov 2001) • TCS-1: in general production since April, 2002 • Integrated into the PACI program (Partnerships for Academic Computing Infrastructure) • DTF project to build and integrate multiple systems • NCSA, SDSC, Caltech, Argonne. Multi-lamba, transcontinental interconnect • ETF aka Teratrid (Extensible Terascale Facility) integrating TCS with DTF forming • A heterogeneous, extensible scientific/engineering cyberinfrastructure Grid
Infrastructure: PSC - TCS machine room( @ Westinghouse)(Not require a new building; just a pipe & wire upgrade; not maxed out) • ~8k ft2 • Use ~2.5k • Existingroom. • (16 yrs old.)
CONTROL DISKS SERVERS SWITCH COMPUTE NODES Full System: Physical Structure Floor Layout • Geometrical constraints invariant twixt US & Japan
Terascale Computing System Compute Nodes • 750 ES45 4-CPU servers • +13 inline spares • (+2 login nodes) • 4 - EV68’s /node • 1 GHz = 2.Gf [6 Tf] • 4 GB memory [3.0 TB] • 3*18.2 GB disk [41 TB] • System • User temporary • Fast snapshots • [~90 GB/s] • Tru64 Unix Compute Nodes
ES45 nodes • 5 nodes per cabinet • 3 local disks /node
Terascale Computing System Quadrics Quadrics Network • 2 “rails” • Higher bandwidth • (~250 MB/s/rail) • Lower latency • 2.5 s put latency • 1 NIC/node/rail • Federated switch (/rail) • “Fat-tree” (bbw ~0.2 TB/s) Compute Nodes • User virtual memory mapped • Hardware retry • Heterogeneous • (Alpha Tru64 & Linux, Intel Linux)
Central Switch Assembly • 20 cabinetsin center • Minimize max internode distance • 3 out of 4 rows shown • 21st LL switch, outside (not shown)
Terascale Computing System Quadrics Management & Control Control • Quadrics switch control: • Internal SBC & Ethernet • “Insight Manager” on PC’s • Dedicated systems • Cluster/node monitoring & control • RMS database • Ethernet & • Serial Link LAN Compute Nodes
Interactive Terascale Computing System Quadrics Interactive Nodes Control • Dedicated: 2*ES45 • +8 on compute nodes • Shared function nodes • User access • Gigabit Ethernet to WAN • Quadrics connected • /usr & indexed store (ISMS) LAN Compute Nodes /usr WAN/LAN
Interactive Terascale Computing System Quadrics File Servers Control • 64, on compute nodes • 0.47 TB/server [30 TB] • ~500 MB/s [~32 GB/s] • Temporary user storage • Direct IO • /tmp • [Each server has • 24 disks on • 8 SCSI chains on • 4 controllers • sustain full drive bw.] LAN Compute Nodes File Servers /tmp /usr WAN/LAN
Interactive Terascale Computing System Summary Quadrics • 750+ ES45 Compute Nodes • 3000 EV68 CPU’s @ 1 GHz • 6 Tf • 3. TB memory • 41 TB node disk, ~90GB/s • Multi-rail fat-tree network • Redundant monitor/ctrl • WAN/LAN accessible • File servers: 30TB, ~32 GB/s • Buffer disk store, ~150 TB • Parallel visualization • Mass store, ~1 TB/hr, > 1 PB • ETF coupled (hetero) Control LAN Compute Nodes File Servers /tmp /usr WAN/LAN
Terascale Computing System Visualization • Intel/Linux • Newest software • ~16 nodes • Parallel rendering • HW/SW compositing • Quadrics connected • Image output • Web pages + TCS 340 GB/s (1520Q) Quadrics 4.5 GB/s (20Q) 3.6 GB/s (16Q) 3.6 GB/s (16Q) ApplicationGateways Viz Buffer Disk WAN coupled
Quadrics coupled (~225 MB/s/link) Intermediate between TCS & HSM Independently managed. Private transport from TCS. Archive disk Terascale Computing System Buffer Disk & HSM TCS 340 GB/s (1520Q) Quadrics 4.5 GB/s (20Q) 3.6 GB/s (16Q) 3.6 GB/s (16Q) >360 MB/s to tape HSM - LSCi ApplicationGateways Viz Buffer Disk WAN/LAN & SDSC
Quadrics coupled (~225 MB/s/link) Coupled to ETF backbone by GigE 30 Gb/s Terascale Computing System Application Gateways TCS 340 GB/s (1520Q) Quadrics 4.5 GB/s (20Q) 3.6 GB/s (16Q) 3.6 GB/s (16Q) ApplicationGateways Viz Buffer Disk Multi GigE to ETF Backbone @ 30 Gb/s
The Front Row • Yes, those are Pittsburgh sports’ colors.