Overview of Hitachi’s Super Technical Server SR8000

The Third International Workshop on Next Generation Climate Models Overview of Hitachi’s Super Technical Server SR8000 March, 2001 Yoshiro Aihara Hitachi, Ltd. Enterprise Server Division

Advanced RISC Parallel RISC Parallel Vector Type HITACHI Supercomputers New concept machine for advanced HPC users (Combination of Parallel and Vector) 10T SR8000 First commercially available distributed memory parallel processor SR2201 Series 1T Single CPU peak performance 8GFlops (Fastest in the world) 100G S-3000 Series Single CPU peak performance 3GFlops Peak Performance(FLOPS) 10G S-820 Series First Japanese Vector Supercomputer 1G S-810 Series Integrated Array Processor system 0.1G M-680 M-280H IAP IAP 0.01G M-200H IAP '77 '78 '79 '80 '81 '82 '83 '84 '85 '86 '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 ‘01 Year Announcement IAP:Integrated Array Processor

Design Concept of SR8000 SR8000: New Concept combining advantages of Vector processor and RISC Parallel Processor Hitachi’s Solution Target of Design Vector processor SR8000 New Feature - PVP feature - Vector processing High Single Node Performance - COMPAS feature - Element parallel processing - High Memory Throughput Multi-dimensional Crossbar Network (High-speed inter-node network) High Scalability Short Development Cycle Easy Enhancement: RISC based processor (HITACHI developed) PVP: Pseudo Vector Processing COMPAS: Co-operative Micro-Processors in single Address Space

Basic Configuration of SR8000 COMPAS: CO-operative Micro-Processors in single Address Space High performance RISC Microprocessor (Hitachi develop.) Pseudo-Vector Processing Multi-dimensional Crossbar Network High speed inter-node network Node Node Node High performance RISC High performance RISC SP System control PCI Network control Main memory I/O adapter MCD SVP : SerVice Processor MCD : Maintenance Console Device SP : System Processor Ether ATM HiPPi SVP I/O Device RAID Disk

2 nodes 8 nodes y z x 8 nodes : X axis crossbar : Nodes : Y axis crossbar Multi-dimensional Crossbar network Ex) 8x8x2 (128 nodes) configuration

SR8000 Hardware Specification

Pseudo-Vector Processing(PVP) Problems of conventional RISC processor - Reduction of performance for large scale simulations because of cache-overflow - Sustained : Under 10% of peak Main memory Prefetch Cache memory Preload Prefetch - Read data from main memory to cache before calculation - Accelerate sequential data access Preload - Read data from main memory to Extended Floating Registers before calculation - Accelerate stride memory access and indirectly addressed memory access load Extended floating point registers（160） FPU

COMPAS Feature of SR8000 Realization of elementwise parallel processing of DO Loops, employed in vector supercomputer, by multiple processors in a node (Automatic elementwise parallelization in a node by compiler) Program Behavior IP IP IP IP (waiting for startup) (waiting for startup) (waiting for startup) Scalar Part ･･･ Start Parallel Inst. Loop Part Loop Part Loop Part Loop Part End Parallel Inst. Scalar Part Hardware Feature(COMPAS) ･･･ IP IP IP IP Realization of high speed processing of multiple processors by hardware high-speed communication mechanism SC High-speed Communication Mechanism MS IP:Instruction Processor SC:Storage Controller MS:Main Storage COMPAS: CO-operative Micro-Processors in single Address Space

Programming Models

Physical Data of SR8000 Example; 128 Node Configuration (G1 model) Power Consumption; approx. 370 kVA Heat Dissipation; approx. 330 kW Cooling Air Inlet Temperature; 16--22 deg C Weight; approx. 15,000 kg Floor Space; approx. 50 sq. meters (incl. service area) approx. 8.0 m Foot Print (128 node) approx. 3.3 m Height: approx. 1.8 m

Overview of Software Products HI-UX/MPP OSF/1 Microkernelbased OS NQS, BGT, DIFF, SFF, PFF OS Language Processor Optimizing FORTRAN77/90, HPF, Optimizing C, C++, OpenMP (Ver1) Program Development Parallel Library MPI-2, PVM, PARALLELWARE Numerical Calculation MATRIX/MPP,MATRIX/MPP/SSS,MSL2 Development Support Symbolic Debugger OptimizingC /FORTRAN90 Performance Monitor(for HP-UX) Graphics X11R6, Motif1.2 GUI Graphic Library GKS, PEX, PHIGS Network Ethernet / Fast Ethernet, GbE, HiPPi, ATM TCP/IP, NFS V3, telnet, rlogin

3500 Series H-9000V Series WS PC X Terminal UNIX(OSF/1) Server (Functional co-operation with other nodes) Micro-kernel (Control of all IPs) Single UNIX System • Single UNIX System : Single System Operation (File system, Process control, Network) • Open System (Standardized OS, Compiler, Network) • Flexible System Operation (Partitioning Operation, Automatic Operation) • Scalable System (4 to 512 nodes) SR8000 Ｃｏｎｓｏｌｅ Graphic 3D-XB Other Vendor (SGI, etc........） Node Node Node Node Disk ＨＩＰＰＩ Node Node Node Node Network Single ＵＮＩＸ System RAID Node Node Node Node Node Node Disk ＨＩＰＰＩ SR2000 Series 3D-XB Ｅｔｈｅｒｎｅｔ Node Node Node Node COMPAS Feature Main Storage ... IP IP IP IP IP SP COMPAS (CO-operative Micro-Processors in single Address Space) ＩＰ:Instruction Processor 3D-XB： 3-dimensional Cross-bar Network

Remote DMA Transfer ● Direct Memory Copy between User Program on Different Nodes that minimizes OS Overhead Protocol Processing Context Switch Interrupt Handling Remote DMA Transfer No Buffering in Kernel No OS System Call Normal Transfer Node Node Program Program data data memory copy memory copy OS OS Send Buffer Receive Buffer data data Crossbar Network

Examples of ISV Package MSC.Nastran MSC.Marc LS-DYNA PAM_CRASH ABAQUS/Standard ABAQUS/Explicit Structural Analysis STAR-CD PHOENICS SCRYU STREAM Computational Fluid Dynamics FLUENT Chemical Analysis GAUSSIAN98 AMBER NAG Libraries IMSL ＴｏｔａｌＶｉｅｗ Vampir Tools ＡＶＳ／ＥＸＰＲＥＳＳ

Leibniz Rechenzentrum (Germany) High Energy Accelerator Research Organization University of Tokyo Japan Meteorological Agency University of Tokyo / Institute for Solid State Physics Tsukuba advanced Computing Center - TACC / AIST Meteorological Research Institute Hokkaido University Institute of Statistical Mathematics HWW / Universitat Stuttgart & DLR (Germany) .. SR8000 Installation Sites (Example)

TOP500 Supercomputing Sites - November 3rd, 2000 Rmax/Rpeak > 75 % Hitachi SR8000 works efficiently.

TOP500 Supercomputing Sites - November 3rd, 2000 Rmax/Rpeak = 85.3 % on SR8000/128 Rmax/Rpeak = 90.0 % on SR8000-E1/80 Hitachi SR8000 works efficiently.

SR8000 F1 & G1 LINPACK Performance SR8000G1 SR8000F1 313.30 Gflop/s on SR8000F1/32 with Nmax=65000 ↓ 6% Speed Up 331.50 Gflop/s on SR8000F1/32 with Nmax=84800 ↓ 20% Speed Up 398.50 Gflop/s on SR8000G1/32 with Nmax=84800

NAS Parallel Benchmark (FT) Model G1 is 1.28～1.30 times faster than Model F1. FT: A 3-D fast-Fourier transform partial differential equation benchmark

NAS Parallel Benchmark (MG) Model G1 is 1.22～1.24 times faster than Model F1. MG: a simple 3D multigrid benchmark

MPI Ping-Pong Performance Remote DMA (Direct Memory Access) is sender driven and makes memory to memory copy of data. Remote DMA provides a high-speed inter-processor communication function without redundant copying.

Overview of Hitachi’s Super Technical Server SR8000

Overview of Hitachi’s Super Technical Server SR8000

Presentation Transcript

SQL Server Licensing Overview

Super Server Management with SCVMM

Technical Overview of FAST Search Server 2010 for SharePoint

Hitachi SR8000

SQL Server Upgrade Technical Value Proposition

Business Intelligence in SQL Server 2005 Technical Overview

Next KEK machine

THE SUPER CJ

Semantic Web Services

1. SQL SERVER OVERVIEW

Dynamic performance measurement control Dynamic event grouping Multiple configurable counters

SQL Server 2000 Overview

Hitachi SR8000 Supercomputer

Dynamic performance measurement control Dynamic event grouping Multiple configurable counters

TruCluster Server V5.0 Technical Overview John Zimmerman Digitask Consultants, Inc.

Technical Overview

Hitachi NAS replication overview

Introduction to Hitachi Information Security Solutions