Parallel Architectures: Topologies

Parallel Architectures: Topologies Heiko Schröder, 2003

memory processor memory memory cache memory memory processor memory processor memory Types of sequential processors (SISD) Von Neumann bottleneck

PE + control unit PE + control unit PE + control unit PE + control unit PE PE PE PE PE Global control unit Interconnection network Interconnection network SIMD MIMD SIMD SPMD

P P P P P PE + M control unit PE + M control unit PE + M control unit PE + M control unit M M Interconnection network Interconnection network M M Message passing /shared address space P/M

Various communication networks State of the art technology Important aspects of routing schemes Known results (theory) The internet

Desirable feature of a network • 1. Algorithmic • Low diameter (1, complete graph) • High bisection width (complete graph) n(n-1)/2 edges Degree n-1 • 2. Technical • Low degree (pin limitations – constant – modular – mesh) • Short wires (mesh) • Small area (mesh) • Regular structure (mesh)

Connection networks I 1-D mesh (linear array) Diameter n-1 Bisection width 1

Tree Diameter 2(log n) Bisection width 1

H-tree Area: O(n) Longest wire :O(n) Clock distribution

Diameter: Bisection width : 2-D Mesh

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 8 2 7 3 6 4 5 1 1 2 8 3 2 4 7 5 3 6 6 7 4 8 5 Torus Reduced diameter Increased bisection width All nodes equivalent Long wires?

Diameter: Bisection: 3-D Mesh

0 00 10 diameter log n bisection width n/2 0-D 1-D 2-D 1 01 11 0 1 000 010 001 011 3-D 4-D 100 110 101 111 Hypercube

# nodes Diameter> nodes bisection nodes Cube Connected Cycles

Exchange (lsb) Shuffle (rotate -- left or right) 010 011 001 110 111 000 100 101 8-node shuffle-exchange graph Degree: 3 Diameter: 2 log n –1 : at most (log n –1) shuffles + (log n ) exchanges Bisection width: (n / log n)

Exchange (lsb) ex ls+ex u1u2…uk-1uk u1u2…uk-1v1 u2…uk v1v2 … ls+ex uk v1v2…vk-1 v1v2…vk 16-node shuffle-exchange graph Degree: 3 Shuffle (rotate -- left or right) 1001 1101 1000 1100 0001 0100 0000 0101 1010 1011 1110 1111 0010 0011 0110 0111 Diameter: 2 log n –1 : at most (log n –1) shuffles + (log n ) exchanges Bisection width: (n / log n)

0 u1u2…uk-1uk u2u3…uk-1uk0 1 u1u2…uk-1uk u2u3…uk-1uk1 001 011 1 1 1 000 1 111 010 1 101 0 0 1 0 1 1 0 0 0 0 0 100 110 3-dimensional de Bruijn graph In-degree = out-degree = 2 Diameter: log n Bisection width: (n / log n) Each Eulerian tour = De Bruijn sequence = contains each possible sub-string of length 4 exactly once 1111001011010000 De Bruijn sequence

Butterfly network FFT routing sorting Unique path

Benes network

Diameter (log n) Bisection width ( ) Mesh of trees

4-D The Power of Hypercubes • Hamiltonian cycle • Gray codes • k-D meshes (tori), N-nodes • simulates mesh of trees • simulates hypercubic networks • contains complete binary tree, almost • normal algorithms

Hamiltonian Cycle A hypercube contains a Hamiltonian cycle -- proof by induction. Each Hamiltonian cycle corresponds to a Gray code (only one bit is changed per link).

Gray code 00 01 11 10 000 001 011 010 110 111 101 100 0 1 reflection

wrap around Hypercube contains meshes/tori 00 01 03 02 10 11 13 12 30 31 33 32 20 21 23 22 Theorem: Any n1 x n2 x … x nk mesh (with or without wrap arounds) is a sub-graph of an n-D hypercube if  ni = 2n . Proof: (see Leighton: Each sub-cube has Hamiltonian cycle)

double-roots (different dimension) Hypercube contains double-rooted trees HC can implement all tree algorithms and also all mesh-of-tree-algorithms (possibly with minor delay).

Normal algorithms • A hypercube algorithm is said to be normal if • only one dimension of hypercube edges is used at any step and • if consecutive dimensions are used in consecutive steps. • Most hypercube algorithms are normal. • Normal algorithms can be embedded efficiently on hypercubic networks

1 1 2 2 1 2 1 2 2 2 2 2 2 0 1 31 2 2 30 2 1 3 29 2 2 2 2 4 28 5 27 6 26 7 25 8 24 9 23 10 22 11 21 12 20 19 13 14 18 15 17 16 Josephus graph: Every even node k is connected to k+2i-3 Diameter: about (log n) / 2

1234 4231 3214 2314 2431 1324 3421 3124 4321 2134 2341 4132 3241 1243 1432 4213 3412 2413 4312 1423 1342 3142 4123 2143 Star graph: Set of nodes: k! nodes of degree k-1. Permutations of k elements. Set of edges: Exchange of first element with one other. Small degree, diameter about 2 log n . Open problems: E.g. are there (k-1)/2 edge disjoint Hamiltonian cycles? Number of nodes versus degree (Star/HC): 24, 120, 720, 4340, 34720, 312480 16, 32, 64, 128, 256, 512

4-D 12 192 • 256 16 pin - limitations 16 1

wiring - limitations 4-D 12 1 216 nodes bisection width: 256 32 K 25cm 32 m

Improve the topology? The internet

against parallelism • cost(large) < cost (2 small) • all the FORTRAN / C software • let’s stick to pipelining • let’s wait for faster machines • Amdahl’s Law

Parallel Architectures: Topologies

Parallel Architectures: Topologies

Presentation Transcript

Topologies - II

Network+ Guide to Networks 5 th Edition

Chapter 2

High Performance Cluster Computing Architectures and Systems

Parallel Computing Explained

Parallel Episodes, Subplots, and Climax

Consciousness and Creativity in Brain-Inspired Cognitive Architectures

Parallel and Concurrent Programming

Danny Bickson

Parallel Programming Models, Languages and Compilers

Distributed Databases

Parallel Programming Models, Languages and Compilers

CSE 550 Computer Network Design

Advanced Computer Architectures – HB49 –

Applying Semantics to Service Oriented Architectures

Parallel computing

Introduction to Many-Core Architectures

Parallel Algorithms on Networks of Processors

Scalable Web Architectures

Parallel Real-Time Systems