1 / 43

Scalable Systems and Technology

Scalable Systems and Technology. Einar Rustad Scali AS einar@scali.com http://www.scali.com. Definition of Cluster. The Widest Definition: Any number of computers communicating at any distance The Common Definition:

kineks
Télécharger la présentation

Scalable Systems and Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Systems and Technology Einar Rustad Scali AS einar@scali.com http://www.scali.com

  2. Definition of Cluster • The Widest Definition: • Any number of computers communicating at any distance • The Common Definition: • A relatively small number of computers (<1000) communicating at a relatively small distance (within the same room) and used asa single, shared computing resource

  3. Increasing Performance • Faster Processors • Frequency • Instruction Level Parallelism (ILP) • Better Algorithms • Compilers • Manpower • Parallel Processing • Compilers • Tools (Profilers, Debuggers) • More Manpower

  4. Use of Clusters • Capacity Servers • Data Bases • Client/Server Computing • Throughput Servers • Numerical Applications • Simulation and Modelling • High Availability Servers • Transaction Processing

  5. Why Clustering • Scaling of Resources • Sharing of Resources • Best Price/Performance Ratio (PPR) • PPR is Constant with Growing System Size • Flexibility • High Availability • Fault Resilience

  6. Clusters vs SMPs (1) • Programming • A Program written for Cluster Parallelism can run on an SMP right away • A Program written for an SMP can NOT run on a Cluster right away • Scalability • Clusters are Scalable • SMPs are NOT Scalable above a Small Number of Processors

  7. CPU CPU CPU CPU CPU CPU CPU CPU L3C Link L3C Link Memory Memory Memory I/O Why SMPs don´t scale When CPUs cycle at 1GHz and Memory latency is >100nS, 1% Cache Miss implies <50% CPU Efficiency This is an SMP This is NOT an SMP... Interconnect But, You can make all the Memory Equally SLOW….( X-bar complexity grows with # of ports squared)

  8. Use of SMPs Common Access to Shared Resources Processors Memory Storage Devices Running Multiple Applications Running Multiple Instances of the Same Application Running Parallel Applications Use of Clusters Common Access to Shared Resources Processors Distributed Memory Storage Devices Running Multiple Applications Running Multiple Instances of the Same Application Running Parallel Applications Clusters vs SMPs (2)

  9. Single System Image • One big advantage of SMPs is the Single System Image • Easier Administration and Support • But, Single Point of Failure • Scali´s ”Universe” offers Single System Image to the Administrators and Users • As Easy to Use and Support as an SMP • No Single Point of Failure (N-copies of the same OS) • Redundancy in ”Universe” Architecture

  10. Clustering makes Mo(o)re Sense • Microprocessor Performance Increases 50-60% per Year • 1 year lag: 1.0 WS = 1.6 Proprietary Units • 2 year lag: 1.0 WS = 2.6 Proprietary Units • Volume Disadvantage • When Volume Doubles, Cost is reduced to 90% • 1,000 Proprietary Units vs 1,000,000 SHV units=> Proprietary Unit 3 X more Expensive • 2 years lag and 1:100 Volume Disadvantage => 7 X Worse Price/Performance

  11. Why Do We Need SMPs? • Small SMPs make Great Nodes for building Clusters! • The most Cost-Effective Cluster Node is a Dual Processor SMP

  12. Mission Scali is dedicated to making State-of-the-art Middleware And System Management Software The key enabling SW technologies for building Scalable Systems

  13. ASP´s ISP´s DepartmentalServers E-commerce/Databases Scalable Systems Scali Software PC Technology Interconnect Linux OS Basic Technologies Application Areas

  14. Seismic Database CFD ASPs FEM Web Servers Platform Attraction

  15. Sys Adm GUI Application Conf. server System Monitor MPI ICM Operating System Hardware Technology • High Performance implementation of MPI • ICM - InterConnect Manager for SCI • Parallel Systems configuration server • Parallel Systems monitoring • Expert knowledge in • Computer Architecture • Processor and Communication hardware • Software design and development • Parallelization • System integration and packaging

  16. Key Factors • High Performance Systems Need • High Processor Speed • High Bandwidth Interconnect • Low latency Communication • Balanced Resources • Economy of Scale Components • Establishes a new Standard for Price/Performance

  17. Software Design Strategy • Client - Server Architecture • Implemented as • Application level modules • Libraries • Daemons • Scripts • No OS modifications

  18. Advantages • Industry Standard Programming Model - MPI • MPICH Compatible • Lower Cost • COTS based Hardware = lower system price • Lower Total Cost of Ownership • Better Performance • Always ”Latest & Greatest” Processors • Superior Standard Interconnect - SCI • Scalability • Scalable to hundreds of Processors • Redundancy • Single System Image to users and administrator • Choice of OS • Linux • Solaris • Windows NT

  19. Fault Tolerant High Bandwidth Low Latency Multi-Thread safe Simultaneous Inter/-Intra-node operation UNIX command line replicated Exact message size option Manual/debugger mode for selected processes Explicit host specification Job queuing PBS, DQS, LSF, CCS, NQS, Maui Conformance to MPI-1.1 verified through 1665 MPI tests Scali MPI - Unique Features

  20. Initialization Processing Storing Results Communication Computation P1 P2 P3 P4 Parallel Processing Constraints Overlaps in Processing

  21. System Interconnect • Main Interconnect: • Torus Topology • SCI - IEEE/ANSI std. 1596 • 667MB/s/segment/ring • Shared Address Space • Maintenance and LAN Interconnect: • 100Mbit/s Ethernet

  22. Distributed Switching: PCI-bus PSB B-Link LC3 LC3 Horizontal SCI Ring Vertical SCI Ring 2-D Torus Topology

  23. Scalability with 33MHz/32bit PCI

  24. Scalability with 66MHz/64bits PCI

  25. Paderborn PSC2 12 x 8 Torus 192 Processors 450MHz 86.4GFlops PSC1 8 x 4 Torus 64 Processors 300MHz 19.2GFlops

  26. MPI_Alltoall()

  27. MPI_Barrier()

  28. Versus Myrinet (1)

  29. Versus Myrinet (2)

  30. Versus Myrinet (3)

  31. Versus Myrinet (4)

  32. Versus Origin 2000 (1)

  33. Versus Origin 2000 (2)

  34. Remote Workstation Control Node (Front-end) 4x4 2D Torus SCI cluster 3 GUI GUI S C Server daemon SCI TCP/IP Socket Node daemon System Architecture

  35. 33 31 14 24 34 44 13 23 41 43 12 22 32 42 11 21 Fault Tolerance • 2D Torus topology • more routing options • XY routing algorithm • Node 33 fails (3) • Nodes on 33’s ringlets becomes unavailable • Cluster fractured with current routing setting

  36. 22 24 43 13 23 42 12 34 41 11 21 31 44 14 32 Fault Tolerance • Rerouting with XY • Failed node Logically remapped to a corner • End-point ID’s unchanged • Applications can continue • Problem: • To many working nodes unused 33

  37. 22 24 43 13 23 42 12 34 41 11 21 31 44 14 32 Fault Tolerance • Scali advanced routing algorithm: • From the Turn Model family of routing algorithms • All nodes but the failed one can be utilised as one big partition 33

  38. The Scali Universe

  39. System Management

  40. Software Configuration Management Nodes are categorised once,from then on, new software is installed by one mouse Click, or with a single command.

  41. System Monitoring

  42. Products (1) • Platforms • Intel Ia32/Linux • Intel Ia32/Solaris • Alpha/Linux • SPARC/Solaris • Ia64/Linux • Middleware • MPI 1.1 • MPI 2 • IP • SAN • VIA • Cray shmem

  43. Products (2) • ”TeraRack” Pentium • Each Rack: • 36 x 1U Units • Dual PIII 800MHz • 57.6GFlops • 144GBytes SDRAM • 8.1TBytes Disk • Power Switches • Console Routers • 2-D Torus SCI

More Related