Evaluation of Modern Parallel Vector Architectures

Evaluation of Modern Parallel Vector Architectures Leonid Oliker Future Technologies Group Computational Research Division LBNL www.nersc.gov/~oliker

Previous Research • Examined complex interactions between high-level algorithms, leading programming paradigms, and modern architectural platforms • Evaluated three parallelization strategies of a dynamic unstructured mesh adaptation algorithm • Examined two major classes of adaptive applications under three parallel programming model (UMA and N-Body) • Investigated effects of algorithmic orderings on sparse matrix computations • Evaluated performance of shared-virtual memory systems on PC-SMP clusters using six application kernels (structured and unstructured) • Architectures Examined: T3E, Origin2000, SP, PC Cluster, MTA • Examined scientific kernels on emerging microarchitectures: VIRAM (Berkeley PIM) and Imagine (Stanford Stream arch) • Programming Paradigms: MPI, OpenMP, hybrid, SHMEM, shared-memory, multithreading, vectorization, streaming

New Evaluation Project:Modern Parallel Vector Systems • Vector Architectures: SX6, X1, and ES • Plan to study key factors of modern parallel vector systems: runtime, scalability, programmability, portability, and memory overhead while identifying potential bottlenecks • Examine microbenchmarks, kernels, and application codes • What fraction of scientific codes suitable for these arch?What best programming paradigm?What required algorithmic modifications?What are scalability limiting factors?What migration issues in terms of performance portability?

Microbenchmark and Kernel Codes • Examine memory bandwidth within a node for simple and complex array addressing. • Examine low level message-passing characteristics:point-to-point, intra-node, extra-node, aggregate operations, and one-sided performance, as well as I/O • Task and thread performance: thread creation, task management locks, semaphores, and barriers. Explicit threads vs. implicit OpenMP • Evaluate NAS Parallel Benchmarks using MPI, OpenMP, and Hybrid programming. New class D and E size problems being developed by Rob Wijngaar at NASA Ames

Application Codes • Astrophysics: • MADCAP Microwave Anisotropy Dataset Computational Analysis Package. Analyses cosmic microwave background radiation datasets to extract the maximum likelihood angular power spectrum. Julian Borrill LBNL • CACTUS Direct evolution of Einstein's equations. Involves a coupled set of non-linear hyperbolic, elliptic equations with thousands of terms. John Shalf LBNL • Climate: • CCM3 Community Climate Model Michael Wehner LBNL • Fluid Dynamics • OverflowD Overset Navier-Stokes grid solver. Simulates complex rotorcraft vortex dynamics problems.Mohammad Djomehri NASA

Application Codes (cont) • Fusion • GTC Gyrokinetic Toroidal Code. 3D particle-in-cell code to study microturbulence in magnetic confinement fusion. Stephane Ethier Princeton Plasma Physics Laboratory • TLBE Thermal Lattice Boltzmann equation solver for modeling turbulence and collisions in plasma. Jonathan Carter LBNL • Material Science • PARATEC PARAllel Total Energy Code. Electronic structure code which performs ab-initio quantum-mechanical total energy calculations. Andrew Canning LBNL • Molecular Dynamics • NAMD Object-oriented molecular dynamics code designed for simulation of large biomolecular systems. David Skinner LBNL

Benchmarking Timelineand Evaluation Goals • Currently porting codes to single node SX6 (USA) • Will soon have multi-node SX6 access from DKRZ (Germany) • Early System Access to the Cray X1 expected in early February (ORNL) • Hope to gain Earth Simulator access summer 2003 • Opportunity will allow us to compare performance and programmability with leading conventional architectures (Power4, Alpha EV67) • Allow comparison with significantly different X1 system: • X1 vector pipes are “distributed” within the X1 multistreaming processor • Cache based architecture and support for globally addressable memory • Compiler must identify both streaming (microtasking) and vectorization, while maximizing cache reuse • Is the same programming style effective on both X1 and ES • Help guide future system acquisition and scientific code development • Potential to run applications at unprecedented scale

Evaluation of Modern Parallel Vector Architectures

Evaluation of Modern Parallel Vector Architectures

Presentation Transcript

Parallel Computer Architectures

Parallel Computer Architectures

Sparse Matrix Vector Multiply Algorithms and Optimizations on Modern Architectures

Modern GPU Architectures

Scientific Computations on Modern Parallel Vector Systems

Evaluation of Leading Parallel Architectures for Scientific Computing

Parallel Architectures

Parallel Architectures

COE 502 / CSE 661 Parallel and Vector Architectures

Scientific Computations on Modern Parallel Vector Systems

Scientific Computations on Modern Parallel Vector Systems

Performance Evaluation of Architectures

Web Services An Evaluation of Modern Web Service Architectures

Parallel Architectures: Topologies

Scientific Computations on Modern Parallel Vector Systems

Parallel Architectures

Parallel Architectures

Convergence of Parallel Architectures

Sparse Matrix Vector Multiply Algorithms and Optimizations on Modern Architectures

Parallel Architectures History