Understanding Parallel and Distributed Computing: Concepts and Applications

What is Parallel and Distributed computing? • Solving a single problem faster using multiple CPUs • E.g. Matrix Multiplication C = A X B • Parallel = Shared Memory among all CPUs • Distributed = Local Memory/CPU • Common Issues: Partition, Synchronization, Dependencies, load balancing

Why Parallel and Distributed Computing? • Grand Challenge Problems • Weather Forecasting; Global Warming • Materials Design – Superconducting material at room temperature; nano-devices; spaceships. • Organ Modeling; Drug Discovery

Why Parallel and Distributed Computing? • Physical Limitations of Circuits • Heat and light effect • Superconducting material to counter heat effect • Speed of light effect – no solution!

Micros Speed (log scale) Supercomputers Mainframes Minis Time Microprocessor Revolution Moore's Law

Why Parallel and Distributed Computing? • VLSI – Effect of Integration • 1 M transistor enough for full functionality - Dec’s Alpha (90’s) • Rest must go into multiple CPUs/chip • Cost – Multitudes of average CPUs gave better FLPOS/$ compared to traditional supercomputers

Modern Parallel Computers • Caltech’s Cosmic Cube (Seitz and Fox) • Commercial copy-cats • nCUBE Corporation (512 CPUs) • Intel’s Supercomputer Systems • iPSC1, iPSC2, Intel Paragon (512 CPUs) • Lots more • Thinking Machines Corporation • CM2 (65K 4-bit CPUs) – 12-dimensional hypercube - SIMD • CM5 – fat-tree interconnect - MIMD • Roadrunner - Los Alamos NL 116,640 cores 12K IBM cell; Japan’s K-1; China’s Tianhe I-A

Instruction Cycle • Instruction cycle broken into 4 stages: Instruction fetchFetch & decode instruction, obtain any operands, update PC ExecuteExecute arithmetic instruction, compute branch target address, compute memory address Memory accessAccess memory for load or store instruction; fetch instruction at target of branch instruction Store resultsWrite instruction results back to register file

Pipelining SPARC is a RISC machine – want to complete one instruction per cycle Overlap stages of different instructions to achieve parallel execution Can obtain a speedup by a factor of 4 Hardware does not have to run 4 times faster – break h/w into 4 parts to run concurrently

Pipelining • Sequential: each h/w stage idle 75% of the time. timeex = 4 * i • Parallel: each h/w stage working after filling the pipeline. timeex = 3 + i

Why Parallel and Distributed Computing? • Everyday Reasons • Available local networked workstations and Grid resources should be utilized • Solve compute-intensive problems faster • Make infeasible problems feasible • Reduce design time • Leverage of large combined memory • Solve larger problems in same amount of time • Improve answer’s precision • Reduce design time • Gain competitive advantage • Exploit commodity multi-core and GPU chips

Why MPI/PVM? • MPI = “Message Passing Interface” • PVM = “Parallel Virtual Machine” • Standard specification for message-passing libraries • Libraries available on virtually all parallel computers • Free libraries also available for networks of workstations, commodity clusters, Linux, Unix, and Windows platforms • Can program in C, C++, and Fortran

Why Shared Memory programming? • Easier conceptual environment • Programmers typically familiar with concurrent threads and processes sharing address space • CPUs within multi-core chips share memory • OpenMP an application programming interface (API) for shared-memory systems • Supports higher performance parallel programming of symmetrical multiprocessors

Understanding Parallel and Distributed Computing: Concepts and Applications