High Performance Computing: Past Highlights and Future Trends

High Performance Computing: Past Highlights and Future Trends David W.Walker Computer Science and Mathematics Division Oak Ridge National Laboratory Oak Ridge, TN 37831-6367 U. S. A.

Outline of Talk • Trends in hardware performance. • Advances in algorithms. • Obstacles to efficient parallel programming. • Successes and disappointments from the past 2 decades of parallel computing. • Futures trends. • Problem-solving environments. • Petascale computing. • Alternative algorithmic approaches. • Concluding remarks.

ASCI Red 1 TFlop/s TMC CM-5 Cray T3D TMC CM-2 Cray 2 1 GFlop/s Cray X-MP Cray 1 CDC 7600 IBM 360/195 1 MFlop/s CDC 6600 IBM 7090 1 KFlop/s UNIVAC 1 EDSAC 1 1955 1960 1965 1970 1975 1980 1985 1990 1995 1950 2000 Moore’s Law: A Dominant Trend

Era of Modern Supercomputing • In 1976 the introduction of the Cray 1 ushered in the era of modern supercomputing. • ECL chip technology • Shared memory, vector processing • Good software environment • About 100 Mflop/s peak • Cost about $5 million • The Intel iPSC/1 was introduced in 1985 • Distributed memory • More scalable hardware • 8 Mflop/s peak for 64 processor machine • Explicit message passing

Competing Paradigms • Shared memory vs. distributed memory • Scalar vs. vector processing • Custom vs. commodity processors • Cluster vs. stand-alone system ?

Recent Trends • The Top500 list provides statistics on high performance computers, based on the performance of the LINPACK benchmark. • Before 1995 the Top500 list was dominated by systems at US government research sites. • Since 1995 commercial and industrial sites have figured more prominently in the Top500. Reasons: • In 1994 companies such as SGI and Sun began selling symmetric multiprocessor (SMP) systems. • IBM SP2 systems also popular with industrial sites. • Dedicated database systems important as well as web servers.

Architectures 91 const, 14 clus, 275 mpp, 120 smp

Top500 CPU Technology http://www.top500.org

Performance in the Top500

Top500 Performance

Top500 Application Areas

Top500 Application Areas Rmax

Top500 Systems Installed by Area

Top500 Data by Continent

Top500 Systems Installed by Continent

Top500 Rmax by Continent

Top500 Systems Installed by Manufacturer

Future Extrapolations from Top500 Data

Some Conclusions from Top500 Data • Rapid turnover in architectures, vendors, and technologies. • But long-term performance trends appear steady - how long will this continue? • Moderately parallel systems now in widespread commercial use. • Highest performance systems still found mostly in government-funded sites doing Grand and National challenges - mostly numerically intensive simulations.

Advances in Algorithms • Advances in algorithms have led to performance improvements of several orders of magnitude in certain areas. • Obvious example is the FFT: O(N2)  O(NlogN) • Other examples include: • fast multipole methods • wavelet-based algorithms • sparse matrix solvers • etc.

Problems with High Performance Computing • HPC is “difficult.” • Often large disparity between peak performance and actual performance. • Application developers must be aware of the memory hierarchy, and program accordingly. • Lack of standard software environment and tools has been a problem. Not many commercial products. • Platforms quickly become obsolete so it costs a lot of money to stay at the forefront of HPC technology. A Cray Y-MP C90, purchased in 1993 when the list price was $35M, is being sold on the eBay auction web site. “If there are no takers, we'll have to pay Cray about $30,000 to haul it away for salvage.” Mike Schneider, Pittsburgh Supercomputer Center

Successes of Parallel Computing • Portability. In the early days of HPC each machine came with its own application programming interface, and a number of competing research projects offered “compatibility” interfaces. Standardised APIs are now available • MPI for message passing machines • OpenMP for shared memory machines • Libraries. Some success has been achieved with developing parallel libraries. For example, • ScaLAPACK for dense and banded numerical linear algebra (Dongarra et al.). • SPRNG for parallel random number generation (NCSA). • FFTW package developed at MIT by Matteo Frigo and Steven G. Johnson for parallel fast Fourier transforms.

Application Successes The development of scalable massively parallel computers was motivated largely by a set of Grand Challenge applications: • Climate modelling. The Climate Change Prediction Program seeks to understand the processes that control the Earth's climate and predict future changes in climate due to natural and human influences. • Tokamak design. The main goal of the Numerical Tokamak Turbulence Project is to develop realistic fluid and particle simulations of tokamak plasma turbulence in order to optimize performance of fusion devices. • Rational drug design. The goal is to discover and design new drugs based on computer simulations of macro-molecular structure. • Computational fluid dynamics. This is important in the design ofaerospace vehicles and cars. • Quantum chromodynamics. Lattice QCD simulations allow us to make first-principle calculations of hadronic properties.

Difficulties and Disappointments • Automatic parallelizing compilers. • Automatic vectorization was successful on vector machines. • Automatic parallelization worked quite well on shared memory machines. • Automatic parallelization has been less successful on distributed memory machines. The compiler must decide how to partition the data and assign it to processes to maximize the number of local memory accesses and minimize communication. • Software tools for parallel computers. There are few standard software tools for parallel computers. Those that exist are mostly research projects - there are few commercial products. • High Performance Fortran. Compilers and tools for HPF were slow in appearing, and HPF was not well-suited to irregular problems.

Future Trends in High Performance Computing • Metacomputing using distributed clusters and the Grid • multidisciplinary applications • collaborative applications • advanced visualization environments • Ultra-high performance computing • quantum computing • macro-molecular computing • petascale computing • Problem-solving environments: Grid portals to computational resources • Different algorithmic emphasis • “keep it simple”, cellular automata and particle-based methods • automatic performance tuning and “intelligent algorithms” • interval arithmetic techniques

Metacomputing and the Grid • Metacomputing refers to the use of multiple platforms (or nodes) to seamlessly construct a single virtual computer. • In general the nodes may be arbitrarily distant from one another. • Some of the nodes may be specialised for a particular task. • The nodes themselves may be sequential or parallel computers. • A software component running on a single node may make use of MPI or OpenMP. • Interaction between nodes is mediated by a software layer such as CORBA, Globus, or Legion. • In a common model we view the nodes as offering different sets of computing services with known interfaces.

Metacomputing Limitations • This type of distributed metacomputing is limited by the bandwidth of the communication infrastructure connecting the nodes. • Limited use for compute-intensive applications. • Tasks must be loosely coupled. • May be useful for some multi-disciplinary applications.

Important Metacomputing Issues • Resource discovery - how do nodes publicise their services. • Resource scheduling - how to optimise resource use when there are multiple users. • Resource monitoring - need to be able to monitor the bandwidth between nodes and the load on each. • Code mobility - often in data-intensive applications it makes more sense to send the code to the data, rather than the other way round. • What is the appropriate software infrastructure for interaction between nodes?

Tele-Presence and Metacomputing • More generally, the nodes of the metacomputer do not have to be computers - they may be people, experimental instruments, satellites, etc. • The remote control of instruments, such as astronomical telescopes or electron microscopes, often involves several collaborators who interact with the instrument and with each other through a thin client interface. • In recent work at Cardiff University researchers have developed a WAP interface that allows an MPI application running on a network of workstations to be controlled using a mobile phone.

Collaborative Immersive Visualization • Essential feature - observer appears to be in the same space as the visualized data. • Observer can navigate within the visualization space relative to the data. • Several observers can co-exist in the same visualization space - ideal for remote collaboration.

Hardware Options • CAVE: a fully immersive environment. ORNL system has stereoscopic projections onto 3 walls and the floor. • ImmersaDesk: projects stereoscopic images onto a single flat panel display. • Stereoscopic workstation: a stereoscopic viewing device, such as CrystalEyes, can be used on workstations and PCs. Stereo-ready graphics cards are becoming increasingly available.

CAVE

Immersive Visualization and Terascale Computing • Scientific simulations, experiments, and observations generate vast amounts of data that often overwhelm data management, analysis, and visualization capabilities. • Collaborative IV is becoming important in interpreting and extracting insights from this wealth of data. • Immersive visualization capability is essential in a credible terascale computing facility.

Collaborative Framework for Simulation-enabled Processes Web-Centric Data Integration Participants Testers Processes Users Producers Tools Model Integration (CORBA & HLA) SystemDevelopers Data Web-based Collaborative Environment with Visualization Support Subsys/TechDevelopers Program Managers Cost Analysts Logistics/ Support Analysts TrainingDevelopers

Research Issues in Collaborative Immersive Visualization • Collaborative use of immersive visualization across a range of hardware platforms within a distributed computing environment requires “resource-aware” middleware. • Data management, analysis, rendering, and visualization should be tailored to the resources available. • Make the visualization system resource-aware so that tasks of data extraction, processing, rendering, and communication across network can be optimized. • Permit a wide range of platforms, ranging from CAVEs to laptops, to be used for collaborative data exploration and navigation.

Videmus Prototype • Develop a collaborative immersive environment for navigating and analysing very large numerical data sets. • Provide suite of resource-aware visualization tools for 3D scalar and vector fields. • Support steering, and the retrieval and annotation of data. • Permit collaborators to interact in the immersive space by audio and gestures. • Make visualization adapt to network bandwidth - if bandwidth is low data may be compressed or lower resolution used. • Use server-side processing to lessen load on client and network. • Use software agents in implementation.

Videmus Architecture Data Request Agent Data Dispatch Agent Data server CAVE Server Compute server ImmersaDesk Rendering server Workstation

Other Collaborative Visualization Projects • The Electronic Visualization Laboratory at UIC are world leaders, but not particularly focused at scientific visualization. • NCSA have done a lot of work on middleware for advanced visualization, and human factors related research. • The Virtual Environments Lab at Old Dominion University. Good potential collaborators. • COVISE from the University of Stuttgart. • SNL has projects in VR for mesh generation, and the “Navigating Science” project to develop a method of exploring and analysing scientific literature using virtual reality.

Massive data processing and visualization • Recent acquisition of 9000 CDs of digital data • Vector maps • Scanned maps • Elevation data • Imagery • Visualization: desktop to immersive VR environment • Storage strategies • Data exchange issues • Collaborative environment Digital Earth Observatory • HPAC Data, Climate and Groundwater Data, Transportation and Energy Data Probe ESnet3

Motivation for Problem-Solving Environments • The aim is scientific insight. • Better understanding of the fundamental laws of the universe and how these interact to produce complex phenomena. • New technologies for economic competitiveness and a cleaner and safer environment.

Aspects of Scientific Computing We use computers to advance science through: • Prediction : as in traditional computational science. • Abstraction : the recognition of patterns and inter-relationships. • Visualization for steering, navigation, and immersion. • Data mining. • Collaboration : brings a wide range of expertise to a problem.

Innovative Environments We seek to support prediction, abstraction, and collaboration in an integrated computing environment that: • Gives transparent access to heterogeneous distributed resources • Supports all aspects of software creation and use • Seamlessly incorporates new hardware and software Problem-Solving Environment

Problem-Solving Environments • are specific to application domain, e.g., PSE for climate modeling, PSE for material science, etc. • provide easy access to distributed computing resources so end user can focus on the science and not computer issues. • deliver state-of-the-art problem-solving power to the end user. • increase research productivity.

PSEs and Complexity • Modeling complex physical systems requires complex computer hardware (hierarchical memory, parallelism) and complex computer software (numerical methods, message passing, etc.). • PSEs handle this complexity for the end user.

Vision for PSEs • PSEs herald a new era in scientific computing, both in power and how resources are accessed and used • PSEs will become the main gateway for scientists to access terascale computing resources. • PSEs will allow users to access these resources from any web connection. They are as web portals to the Grid. • PSE’s support for collaborative computational science will change the pervading research culture, making it more open and accountable.

Synergies Research Hardware Research Software Research Culture Better = bigger & faster Distributed, immersive Collaborative Better = more open & accountable

Software Technologies for PSEs Wherever possible use accepted software standards and component-based software engineering. • XML is used for interface specification and defining the component model. • Java is used for platform-independent programming. • CORBA provides for transparent interaction between distributed resources. • Agents for user support, resource monitoring and discovery.

Main Features of a PSE • Collaborative code development environment. • Intelligent resource management system. • Expert assistance in application design and input data specification. • Electronic notebooks for recording and sharing results of research.

Collaborative Code Development Environment • The collaborative code development environment uses a visual programming tool for seamlessly integrating code from multiple sources. • Applications are created by plugging together software components. • Legacy codes in any major scientific programming language can be handled.

Novel Ideas for PSE Research • Intelligence is important in PSEs for efficient use and management of resources, for ease of use, and user support. The PSE must be able to learn what works best. • Living documents are a novel way of electronically publishing research results. Readers can replay simulations, and experiment with changing input parameters. • Resource-aware visualization refers to ability of a visualization to adapt to the hardware platform, which may range from a PC to a CAVE.

High Performance Computing: Past Highlights and Future Trends