340 likes | 557 Vues
CMPE 511 H IGH PERFORMANCE COMPUTING CLUSTERS. D ilek Demirel İşçi. Motivation. Why high amount of computation is needed?. Software: Searching large state spaces for evaluating and verifying the software. ?.
E N D
CMPE 511HIGH PERFORMANCE COMPUTINGCLUSTERS Dilek Demirel İşçi
Motivation Why high amount of computation is needed? • Software: Searching large state spaces for evaluating and verifying the software. ? • Cryptography: Searching very large state spaces, to find out the cryptographic key; factoring very large numbers. • Genetic engineering: Searching for matching DNA patterns in large DNA banks. • Cosmology: Simulations on very complex systems, such as simulating the formation of a galaxy • Financial modeling and commerce: Simulating chaotic systems, like climate modeling problem. • Climate: Solving very high precision floating point calculations, simulating chaotic systems.
Ways to improve performance? • Work harder • Hardware improvements • Work smarter • Better algorithms • Get help • Parallelism
Moore’s Law • Moore’s Law: • In 18 months, processing capacity doubles. • Performance improvements of high performance computing?? • Higher than expected by Moore’s Law. • Due to parallelism?
Single Instruction Single Data (SISD) • Single Instruction Multiple Data (SIMD): • Multiple Instruction Single Data (MISD): • Multiple Instruction Multiple Data (MIMD):
Single Instruction Single Data (SISD) • No parallelism • Simple single processor
Single Instruction Multiple Data (SIMD): • Single instruction is executed by multiple processors on different data streams. • Instruction memory is single. • There are multiple data memory units. • Vector architectures are of SIMD type.
Multiple Instruction Single Data (MISD): • Not commercially built. • Refers to the structure where a single data stream is operated by different functional units.
Multiple Instruction Multiple Data (MIMD): • Each processor has its own instruction and data memory. This is the type we are interested in.
High Performance Computing Techniques • Supercomputers • Clusters • Grid Systems • Custom build • Shared memory processing (SMP) • Not-parallelizable problems • Optimized processors • Use parallelism • Consists of more than one computers • Distributed memory processing • “Internet is the computer” • No geographical limitations
Main idea in cluster architectures • The old idea of parallel computing • (physical clustering of general purpose hardware and message passing of distributed computing) • New low cost technology (mass market COTS pc and networking products).
Many definitions: • A cluster is two or more independent computers that are connected by a dedicated network to perform a joint task. • A cluster is a group of servers that coordinate their actions to provide scalable, high available services.
A cluster is a type of paralleland distributed processing system, which consists of a collection of interconnected stand-alone computers working together as a single, integrated computing resource.
Basic cluster • Multiple computing nodes, • low cost • a fully functioning computer with its own memory, CPU, possibly storage • own instance of operating system • computing nodes are connected by interconnects • typically low cost, high bandwidth and low latency • permanent, high performance data storage • a resource manager to distribute and schedule jobs • the middleware that allows the computers act as a distributed or parallel system • parallel applications designed to run on it
A cluster architecture • Parallel and sequential applications • Parallel programming environments and tools • Compilers • Parallel Virtual Machine (PVM) • Message Passing Interface (MPI) • Cluster Middleware • To support Single System Image (SSI) • Resource management and scheduling software • -Initial installation • -Administration • -Scheduling • -Allocation of hardware • -Allocation software components • Myricom—1.28 Gbps in each direction • IEEE SCI latency under 2.5 microseconds, 3.2 Gbps each direction (ring or torus topology) • Ethernet-star topology • In most cases limitation is the server’s internal PCI bus system. • High speed interconnect • Computing nodes • Master nodes
High performance • High availability High computing capability • Consider the fail possibility of each hardware of software • Includes redundancy • A subset of this type is the load balancing clusters • Typically used for business applications—web servers
Homogeneous clusters: • Heterogeneous clusters: • In homogeneous clusters all nodes have similar properties. Each node is much like any other. Amount of memory and interconnects are similar. • Nodes have different characteristics, in the sense of memory and interconnect performance.
Single-tier clusters: • Multi-tier clusters: • There is no hierarchy of nodes is defined. Any node may be used for any purpose. The main advantage of the single tier cluster is its simplicity. The main disadvantage is its limit to be expanded. • There is a hierarchy between nodes. There are node sets, where each set has a specialized function
Multiple Instruction Multiple Data (MIMD) • Distributed Memory Processing (DMP) • Each of the nodes has its own instruction memory and data memory. • Programs can not directly access the memory of remote systems in the cluster. They have to use a kind of message passing between nodes.
Ease of building: • No expensive and long development projects. • Price performance benefit: • Highly available COTS products are used. • Flexibility of configuration: • Number of nodes, nodes’ performance, inter-connection topology can be upgraded. System can be modified without loss of prior work. Scale out: Increase the number of computing nodes. Requires efficient i/o between nodes and cost effective management of large number of nodes. • Scale up: Increasing the throughput of each computing node.
Cluster throughout is a function of the following • CPUs: Total number and speed of cpus • Efficiency of the parallel algorithms • Inter-Process Communication: Efficiency of the inter-process communication between the computing nodes • Storage I/O: Frequency and size of input data reads and output data writes • Job Scheduling: Efficiency of the scheduling
Top 500 HPCs • Linpack is a measure of a processor’s floating point execution.
BlueGene/L: An Example Cluster-Base Computer • By January 2005, was ranked as the world’s most powerful computer, • in terms of Linpack performance. • Developed by IBM and US Department of Energy. • A heterogeneous cluster, • dedicated nodes to specific functions. • A cluster of 65536 nodes. • Computing nodes have two processors, • resulting in more than 130000 processors total. • Each processor has a dual floating point unit. • It includes host nodes for management purposes. • 1024 of the nodes are I/O nodes to the outside world. • I/O nodes run Linux. • The compute nodes run a simple specialized OS. • Uses message passing, • in an interconnection network of tree structure. • Each computing node has a 2 Gb RAM, • which is shared between the two processors of the node.
Management of large clusters • Request distributions, • Optimizing load balance • Health monitoring of the cluster • Connected clusters using Grid technology • Nonlinearity of scaling • In the ideal case, n number of CPUs should perform n times better than a single CPU. However, performance gain does not increase linearly. • The main challenge : Developing parallel algorithms • To minimize inter-nodal communications Program development for parallel architectures is a difficult problem because of two reasons: • Describing the applications concurrency and data dependencies. • Exploiting the processing resources of the architecture in order to obtain an efficient implementation for a specific hardware.
A Multi-tier Cluster Architecture • Application servers serve the dynamic content. The application servers are responsible for financial transaction functions such as the order entry, catalog search, etc. • Input traffic arrives at one ore front end load balancers • Static content is served. • Load balancing between tiers. • The third tier may consist of multiple database servers, which are specialized for different data sets.
To sum up… • Highly promising for HPC • Cheap • Easy to obtain and develop • Applicable for many diverse applications • Not the answer for all questions • Not applicable for non-parallelizable applications
Thanks for listening… • Questions?