260 likes | 515 Vues
Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center. Plasma Science – International Challenges. Microturbulence & Transport What causes plasma transport? Macroscopic Stability What limits the pressure in plasmas? Wave-particle Interactions
E N D
Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center
Plasma Science – International Challenges • Microturbulence & Transport • What causes plasma transport? • Macroscopic Stability • What limits the pressure in plasmas? • Wave-particle Interactions • How do particles and plasma waves interact? • Plasma-wall Interactions • How can high-temperature plasma and material surfaces co-exist?
2007-2008 Deep Computing Roadmap Summary 1H07 2H07 1H08 2H08 PHV8 PL4/ML16 PL4/ML32 System P Servers 11S0 P6H P5 560Q+ p6 Blade 11S2 PHV8 p6IH HV4 p6 Blade p6 IH/Blades IB SolutionsAIX 6.1 CSM 1.7.0.x ,GPFS 3.3, LoadLeveler 3.5, PE 5.1, ESSL 4.4, PESSL 3.3 JS21 IB AIX Solution: CSM 1.6/RSCT 2.4.7 GPFS 3.1, LoadLeveler 3.4.1 PESSL 3.3, PE 4.3.1PERCS System Design Analysis p6 IH/Blades IB SolutionsAIX 5.3 and SLES 10 Initial AIX 6.1 support for SMPs & Ethernet System P Software Initial p6 support for SMPs & EthernetGPFS 3.2 – filesystem mgtCSM 1.7 x3455 DC x3455 QC (Barcelona) System XServers x3550 Harpertown/ Greencreek Refresh x3550 QC x3850 QC x3755 QC iDPX – Thurley Planar iDPX – Stoakley Planar LS Blades –> Barcelona QC HS21 LS21 LS41 System X Software GPFS 3.3 and CSM 1.7.0.x support for System x/1350 GPFS 3.2 support for System x/1350RHEL 5 support CSM RHEL 5 support CSM 1.6/RSCT 2.4.7* CSM 1.7 for System x/1350* M50 R1 M60 R1 M60 Workstations *APro elim impacts DCV Z40 R1 Z30 R1 Z40 BlueGene BG/L (EOL) Blue Gene /L LA BG/P 1st Petaflop * . Blue GeneSoftware BlueGene/P Support: GPFS 3.2, CSM 1.7 LoadLeveler 3.4.2, ESSL 4.3.1 QS22 Cell BE QS21 Prototype QS20 SDK 3.0 SDK 4.0 SDK 5.0 SDK 2.1 QS21 System Accept SystemStorage DDN OEM Agreement DCS9550 DCS9550 + EXP100 Attach DS4800 Follow-on DS4800 DS4700 for HPC SERVER & SYSTEMS LEGEND * 1st Petaflop dependent on BG client demand Specific but not exclusive Specific & Exclusive Repurposed – Neither Specific nor exclusive Source: IBM Deep Computing Strategy – 7.18.07 3
Blue Gene/Q Blue Gene/P Blue Gene/L IBM HPC roadmap Power 7 Power 6 Power 5 Clusters and Blades
IBM HPC conceptual roadmap: POWER • The POWER series is IBM’s mainstream computing offering • Market is about 60% commercial and 40% technical • Product line value proposition • General purpose computing engine • Robustness, security & reliability fitting mission-critical requirements • Standard programming model and interfaces • Performance leadership with competitive performance/price value • Robust integration with industry standards (hardware and software) • Current status • POWER 6 announced • POWER 7 is underway Power 7 Power 6 Power 5
#6 in Top500 (Nov. 2007) www.top500.org 100+ Tflop/50 TB 100 Purple 30+ Tflop/10 TB Turquoise 10+ Tflop/5 TB 10 Option White 3+ Tflop/1.5 TB Option Blue 1+ Tflop/0.5 TB 1 Option Red 0.5 '95 '96 '97 '98 '99 '00 '01 '02 '03 '04 '05 Accelerated Strategic Computing Initiative ASC Purple • 100TF Machine based on Power 5 • ~1500 8-way Power5 Nodes • Federation (HPS) ~12K CPUs (~1500 × 2 multi-plane fat-tree topology, 2x2 GB/s links) • Communication libraries: < 5 µs latency, 1.8 GB/s uni • GPFS: 122 GB/s • Supports NIF
2.3 GHz Core 2.3 GHz Core 4.7 GHz Core 4.7 GHz Core 1.9 GHz Core 1.9 GHz Core 1.7 GHz Core 1.7 GHz Core 1.3 GHz Core 1.3 GHz Core Shared L2 Distributed Switch Distributed Switch Shared L2 Shared L2 Distributed Switch POWER Server Roadmap 2001 2002-3 2004 2005-06 2007 POWER4 POWER4+ POWER5 POWER5+ POWER6 65 nm 90 nm 130 nm 130 nm 180 nm L2 caches Advanced System Features & Switch Shared L2 Ultra High Frequency Very Large L2 Robust Error Recovery High ST and HPC Perf High throughput Perf More LPARs (1024) Enhanced memory subsystem Distributed Switch Simultaneous multi-threading Sub-processor partitioning Dynamic firmware updates Enhanced scalability, parallelism High throughput performance Enhanced memory subsystem Reduced size Lower power Larger L2 More LPARs (32) Chip Multi Processing - Distributed Switch - Shared L2 Dynamic LPARs (16) Autonomic Computing Enhancements *Planned to be offered by IBM. All statements about IBM’s future direction and intent are subject to change or withdrawal without notice and represent goals and objectives only. *
Challenge MareNostrum at a Glance IBM e1350 capability Linux cluster platform comprising 42 IBM eServer p615 servers, 2560 IBM eServer BladeCenter JS21 servers and IBM TotalStorage hardware • Deliver world-class deep-computing and e-Science services with an attractive cost/performance ratio • Enable collaboration among leading scientific teams in the areas of biology, chemistry, medicine, earth sciences and physics Innovation ~ 120 m² ~ 750 kW • Efficient integration of commercially available commodity components • Modular and scalable open cluster architecture • computing, storage, networking, software, management, applications • Diskless capability • improves node reliability, reducing installation and maintenance costs • Record cluster density and power efficiency • Leading price/performance and TCOin High Performance Computing 94 TF DP (64-bit) 186 TF SP (32-bit) 376 Tops (8-bit) 20 TB RAM, 370 TB disk Linux 2.6 #1 in Europe #9 in TOP500
Node Card (32 chips 4x4x2) 32 compute, 0-1 IO cards System BlueGene/P 72 Racks, 72x32x32 Cabled 8x8x16 Rack 32 Node Cards 1 PF/s 144 TB 13.9 TF/s 2 TB Compute Card 1 chip, 20 DRAMs 435 GF/s 64 GB Chip 4 processors 13.6 GF/s 2.0 GB DDR2 (4.0GB is an option) 13.6 GF/s 8 MB EDRAM
System Power Efficiency Gflops/Watt
800 394 127 1 Failures per Month per @ 100 TFlops (20 BG/L racks)unparalleled reliability Results of survey conducted by Argonne National Lab on 10 clusters ranging from 1.2 to 365 TFlops (peak); excluding storage subsystem, management nodes, SAN network equipment, software outages
2,048,000 Tantalum atoms Classical MD – ddcMD2005 Gordon Bell Prize Winner!! • Scalable, general purpose code for performing classical molecular dynamics (MD) simulations using highly accurate MGPT potentials • MGPT semi-empirical potentials, based on a rigorous expansion of many body terms in the total energy, are needed in to quantitatively investigate dynamic behavior of d-shell and f-shell metals. 524 million atom simulations on 64K nodes achieved 101.5 TF/s sustained. Superb strong and weak scaling for full machine - (“very impressive machine” says PI) Visualization of important scientific findings already achieved on BG/L: Molten Ta at 5000K demonstrates solidification during isothermal compression to 250 GPa
Qbox: First Principles Molecular DynamicsFrancois Gygi UCD, Erik Draeger, Martin Schulz, Bronis de Supinski, LLNLFranz Franchetti Carnegie mellon, John Gunnels, Vernon Austel, Jim Sexton, IBM • Treats electrons quantum mechanically • Treats nuclii classically • Developed at LLNL • BG Supported provided by IBM • Simulated 1,000 Mo atoms with 12,000 electrons • Achieves 207.3 Teraflops sustained. • (56.8% of peak). Qbox simulation of the transition from a molecular solid (top) to a quantum liquid (bottom) that is expected to occur in hydrogen under high pressure.
Compute Power of the Gyrokinetic Toroidal CodeNumber of particles (in million) moved 1 step in 1 second BG/L at Livermore Cray XT3/XT4 BG/L Optimal BG/L
Compute Power of the Gyrokinetic Toroidal CodeNumber of particles (in million) moved 1 step in 1 secondBlueGene can reach 150 billion particles in 2008, >1 trillion in 2011.POWER6 can reach 1 billion particles in 2008, >0.3 trillion in 2011. BG/P at 3.5PF P6 at 300TF BG/L at Livermore IBM Power BG/L Optimal Cray XT3/XT4 BG/L
Rechenzentrum Garching at BG Watson: GENE Strong scaling of GENEv11+ for a problem size of 300-500 GB with measurement points for 1k, 2k, 4k, 8k and 16k processors normalized to 1k processors. Quasi-linear scaling has been observed with a parallel efficiency of 95% on 8k processors, and of 89% on 16k processors By Hermann Lederer*, Reinhard Tisma* and Frank Jenko+, RZG*and IPP+, March 21,22 2007
Summary • IBM is much involved in ITER applications through its collaborations • Princeton Plasma Physics Laboratory • Max-Planck-Institut für Plasma Physik/Rechenzentrum Garching • Barcelona Supercomputer Center • Oak Ridge National Laboratory • IBM is also involved in laser-plasma fusion through its collaborations • Lawrence Livermore National Laboratory • Forschungszentrum Jülich • IBM offers multiple platforms to address ITER needs • POWER: high memory capacity/node, moderate interprocessor bandwidth, moderate scalability – capability and capacity machine • Blue Gene – low power, low memory capacity/node, high interprocessor bandwidth, highest scalability - capability and capacity applications • X Series and white box: moderate memory capacity/node, low interprocessor bandwidth, limited, moderate scalability – mostly capacity machine.
What BG brings to Core Turbulence Transport • Benchmark case CYCLONE • GENE: < 1 day on 64 procs; few hours on 1024 procs BG/L • GYSELA: ~2.5 days on 64 procs • ORB5: < 1day on 64 procs; few hours on 1024 procs BG/L • Similar ITER-size benchmark • GENE: ~ ½ day on 6K procs BG/L • GYSELA: ~ 10 days on 1024 procs • ORB5: ~ ½ day on 16K procs BG/L; ~1 week on 256 procs PC cluster Courtesy José Mª Cela , Director of Applications, BSC
The Gyrokinetic Toroidal Code GTC • Description: • Particle-in-cell code (PIC) • Developed by Zhihong Lin (now at UC Irvine) • Non-linear gyrokinetic simulation of microturbulence [Lee, 1983] • Particle-electric field interaction treated self-consistently • Uses magnetic field line following coordinates (y,q,z) • Guiding center Hamiltonian [White and Chance, 1984] • Non-spectral Poisson solver [Lin and Lee, 1995] • Low numerical noise algorithm (dfmethod) • Full torus (global) simulation
BlueGene Key Applications - Major Scientific Advances • Qbox (DFT) LLNL: 56.5% 2006 Gordon-Bell Award 64 racksCPMD IBM: 30% highest scaling 64 racks • ddcMD (Classical MD) LLNL: 27.6% 2005 Gordon-Bell Award 64 racksMDCASK LLNL: highest scaling 64 racksSPaSM LANL: highest scaling 64 racksLAMMPS SNL: highest scaling 16 racksBlue Matter IBM: highest scaling 16 racksRosetta UW: highest scaling 20 racksAMBER 8 racks • Quantum Chromodynamics IBM: 30% 2006 GB Special Award 64 racksQCD at KEK: 10 racks • sPPM (CFD) LLNL: 18% highest scaling 64 racksMiranda LLNL: highest scaling 64 racksRaptor LLNL: highest scaling 64 racksDNS highest scaling 16 racksPETSc FUN3D ANL: 14.2%NEK5 (Thermal Hydraulics) ANL: 22% • ParaDis (dislocation dynamics) LLNL: highest scaling 64 racks • GFMC (Nuclear Physics) ANL: 16% • WRF (Weather) NCAR: 14% highest scaling 64 racksPOP (Oceanography): highest scaling 16 racks • HOMME (Climate) NCAR: 12% highest scaling 32 racks • GTC (Plasma Physics) PPPL: highest scaling 16 racks ORB5 RZG: highest scaling 8 racksGENE RZG: 12.5% highest scaling 16 racks • Flash (Supernova Ia) highest scaling 32 racksCactus (General Relativity) highest scaling 16 racks • AWM (Earthquake) highest scaling 20 racks
Science Theory Experiment Simulation