1 / 28

Commodity Computing Clusters - next generation supercomputers?

Commodity Computing Clusters - next generation supercomputers?. Paweł Pisarczyk, ATM S. A. pawel.pisarczyk@atm.com.pl. Agenda . Introduction Supercomputer classification Architecture and implementations Commodity clusters Processors Operating systems Summary. Supercomputer.

nadalia
Télécharger la présentation

Commodity Computing Clusters - next generation supercomputers?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A. pawel.pisarczyk@atm.com.pl

  2. Agenda • Introduction • Supercomputer classification • Architecture and implementations • Commodity clusters • Processors • Operating systems • Summary

  3. Supercomputer • „A supercomputer is a device for turning compute-bound problems into I/O-bound problem” - Seymour Cray • A supercomputer is a computer system that leads the world in terms of processing capacity, particularly speed of calculations, at the time of its introduction. source: http://en.wikipedia.org

  4. Supercomputer History (1) • 1945-50 - Manchester Mark I • 1950-55 - MIT Whirlwind • 1955-60 - IBM 7090 - 210 KFLOPS • 1960-65 - CDC 6600 -10.24 MFLOPS • 1965-70 - CDC 7600 - 32.27 MFLOPS • 1970-75 - CDC Cyber 76

  5. Supercomputer History (2) • 1975-80 - Cray-1 - 160 MFLOPS • 1980-85 - Cray X-MP - 500 MFLOPS • 1985-90 - Cray Y-MP - 1.3 GFLOPS • 1990-95 - Fujitsu Numerical Wind Tunnel - 236 GFLOPS • 1995-00 - Intel ASCI Red - 2.150 TFLOPS • 2000-02 - IBM ASCI White, SP Power3 375 MHz - 7.226 TFLOPS • 2002-03 - NEC Earth Simulator - 35 TFLOPS

  6. Supercomputer Classes (1) • General-purpose supercomputers: • vector processing machines - the same operation carried out on a large amount of data simultaneously • tightly connected cluster computers (NUMA) - communication oriented architectures engineered from ground up, based on high speed interconnects and large number of processors • commodity clusters - collection of large number of commodity PCs (COTS) interconnected by high-bandwidth low-latency network

  7. Supercomputer Classes (2) • Special-purpose supercomputers - high performance computing devices with a hardware architecture dedicated to solve a single problem (equipped with custom ASICS or FPGA chips) Examples • Deep Blue • GRAPE for astrophysics

  8. Flynn taxonomy - 1972 (1) • SISD - Single Instruction Single Data (DEC, Sun Microsystems, PC) • SIMD - Single Instruction Multiple Data • computers with large number o processing units (i.e. ALUs) - CPP DAP Gamma II, Quadrics Apemille • vector processing machines - NEC SX6, IA32 MMX • MISD - Multiple Instruction Single Data • theoretical model, no practical implementation

  9. Flynn taxonomy - 1972 (2) • MIMD - Multiple Instruction Multiple Data • SM-MIMD - Shared Memory MIMD • global address space • SMP systems and ccNUMA systems • DM-MIMD - Distributed Memory MIMD • many nodes with local address spaces • high-bandwidth, low-latency communication • common NUMA architectures (Non Uniform Memory Access) • operating system have to be communication oriented (Mach project)

  10. SM-MIMD implementations • S-COMA - Simple Cache-Only Memory Architecture • common SMP systems • ccNUMA - Cache Coherent NUMA • SGI Origin 3000 • SGI Altix 3000 • HP SuperDome

  11. S-COMA (SMP) RAM L2 cache L2 cache L2 cache CPU 0 CPU 1 CPU N

  12. RAM K L3 cache L2 cache L2 cache CPU N-1 CPU N ccNUMA RAM 0 L3 cache L2 cache L2 cache CPU 0 CPU 1

  13. ccNUMA implementation SGI Altix 3000 (ccNUMA) • 64 Itanium 2 (IA64) processors • C-brick modules with 2 CPUs and ASIC SHUB • NUMAflex, NUMAlink interconnects (6.4 GB/s, 2.4 GB/s) • Modified Linux kernel (2.6 NUMA support)

  14. DM-MIMD implementations • Massively parallel systems (NUMA) • communication oriented architecture • low-latency, high-bandwidth interconnects • topologies: hypercube, torus, tree • Butterfly networks, Omega networks, engineered from ground up communication

  15. DM-MIMD implementations • Commodity clusters • a cluster is a collection of connected, independent computers working in unison to solve a problem • COTS technology • nodes are interconnected by Ethernet LAN, Myrinet, QsNet ELAN etc. • computation can be performed by using popular programming toolkits and frameworks: OpenMP, MPI • clusters require dedicated management software

  16. NUMA implementations Cray T3E-1350 • Processor: Alpha 21164 675 MHz • Number of CPUs: 40 - 2176 • 3-D Torus topology • Operating system: UNICOS/mk - microkernel based • Peak performance: 3 TFLOPS

  17. Commodity cluster implementation (1) Linux Networx/Quadrics • Processor: Intel Xeon 2.4 GHz • CPUs: 2304 • Interconnections: QsNet ELAN3 • Operating system: Linux + management tools + Lustre Cluster File System • Peak performance: 7.6 TFLOPS • 3rd computer on TOP500 list • Developed for Lawrence Livermore National Laboratory in 2002

  18. Commodity cluster implementation (2) HP XC6000 Cluster (XC3000 Cluster) • Processor: Intel Itanium 2 6M 1.5 GHz (Intel Xeon 3 GHz) • Node: HP Integrity rx2600 (HP ProLiant DL380) • Number of processors: 34-512 • Interconnections: QsNet ELAN3 (Myricom Myrinet XP) • Operating system: Linux + SSI Middleware + management tools + Lustre Cluster File System • Peak performance: 34 CPUs - 204 GFLOPS, 512 CPUs - 3 TFLOPS

  19. Commodity Clusters - software • Operating system - Linux or SSI Linux (Single System Image) • Platform for specialized applications for science, engineering and business (simulation, modeling, data mining) • Distributed computation environments are used for software development (OpenMP, MPI) • Common supercomputer applications require porting to clusters

  20. Performance Scaling Scale Right Scale-Up (SMP, ccNUMA) Scale-Out (Cluster)

  21. Processors (1) • Many types of existing processors are used in supercomputers • Microprocessor development directions: • Increasing of clock frequency and speed instruction stream processing • Processing of large collection of data in single processor instruction - SIMD • Control path multiplication – multithreading

  22. Processors (2) • Vector processors • NEC SX-6 • Cray (Cray X1) • RISC processors • MIPS • IBM Power4 • Alpha • CISC processors • IA32 • AMD x86-64 • VLIW processors • IA64

  23. Intel Itanium 2 features • State-of-the-art unconventional 64-bit architecture • New programming model implementing VLIW paradigm • EPIC technology – Explicitly Parallel Instruction Computing – compiler determines instruction dependency informing processor how to process an instruction stream parallel • Many registers (128 64-bit), register stack management • 6 GFLOPS peak performance • Full advantages of the processor can be used by dedicated compiler

  24. Operating systems • Monolithic kernel based OSs - UNIX (modification of existing solutions) • BSD • Solaris • Irix • Linux • Microkernel based OSs • Mach

  25. Microkernel architecture Task A Task B Task C Kernel Kernel Hardware Hardware

  26. Summary • Today’s there is a lot of supercomputer architectures • Both vector processors and common RISC, CISC, VLIW chips are used for supercomputers • Commodity clusters under control of Linux OS are an attractive method for supercomputer implementation

  27. TOP 500 list (1) 1. Earth Simulator, NEC - 35.86 TFLOPS 2. HP Alphaserver SC, HP - 13.88 TFLOPS 3. Linux Networx / Quadrics IA32 - 7.634 TFLOPS

  28. Source: http://www.top500.org/list/2003/06/ Top 500 list (2)

More Related