Parallel computing Reference https://computing.llnl.gov/tutorials/parallel_comp/ Parallel Programming in C with MPI and OpenMP 平行計算 賴寶蓮
TextBook • Introduction to Parallel Computing • George Karypis, Vipin Kumar • 2003, Pearson • 全華代理
參考教材 • Parallel Programming in C with MPI and OpenMP, McGraw Hill,2004 • Patterns for Parallel Programming, Addison-Wesley,2004 • Parallel Programming with MPI, Peter S. Pacheco, Morgan Kaufmann Publishers, 1997 • OpenMP Specification, www.openmp.org • www.top500.org • Parallel Computing • International Journal of Parallel Programming
von Neumann Architecture Comprised of four main components: Memory Control Unit Arithmetic Logic Unit Input/Output
Serialcomputing • Traditionally, software has been written for serial computation: • To be run on a single computer having a single Central Processing Unit (CPU); • A problem is broken into a discrete series of instructions. • Instructions are executed one after another. • Only one instruction may execute at any moment in time.
Parallel computing • In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem: • To be run using multiple CPUs • A problem is broken into discrete parts that can be solved concurrently • Each part is further broken down to a series of instructions • Instructions from each part execute simultaneously on different CPUs
Parallel computing • The compute resources can include: • A single computer with multiple processors; • An arbitrary number of computers connected by a network; • A combination of both.
Parallel computing • The computational problem usually demonstrates characteristics such as the ability to be: • Broken apart into discrete pieces of work that can be solved simultaneously; • Execute multiple program instructions at any moment in time; • Solved in less time with multiple compute resources than with a single compute resource.
The Universe is Parallel: • Parallel computing is an evolution of serial computing that attempts to emulate what has always been the state of affairs in the natural world: • many complex, interrelated events happening at the same time, yet within a sequence. • For example: • Galaxy銀河formation • Planetary行星的movement • Weather and ocean patterns • Tectonic築造的plate drift • Rush hour traffic • Automobile assembly line • Building a space shuttle • Ordering a hamburger at the drive through.
Uses for Parallel Computing: • Historically, parallel computing has been considered to be "the high end of computing", and has been used to model difficult scientific and engineering problems found in the real world. • Some examples: • Atmosphere, Earth, Environment • Physics - applied, nuclear, particle, condensed matter, high pressure, fusion, photonics • Bioscience, Biotechnology, Genetics • Chemistry, Molecular Sciences • Geology, Seismology • Mechanical Engineering - from prosthetics to spacecraft • Electrical Engineering, Circuit Design, Microelectronics • Computer Science, Mathematics
Uses for Parallel Computing: • Today, commercial applications provide an equal or greater driving force in the development of faster computers. These applications require the processing of large amounts of data in sophisticated ways. • For example: • Databases, data mining • Oil exploration • Web search engines, web based business services • Medical imaging and diagnosis • Pharmaceutical design • Management of national and multi-national corporations • Financial and economic modeling • Advanced graphics and virtual reality, particularly in the entertainment industry • Networked video and multi-media technologies • Collaborative work environments
Why Use Parallel Computing? • Main Reasons: • Save time and/or money: • Solve larger problems: • Provide concurrency: • Use of non-local resources: • Limits to serial computing:
Why Use Parallel Computing? • Save time and/or money: • In theory, throwing more resources at a task will shorten its time to completion, with potential cost savings. • Parallel clusters can be built from cheap, commodity components.
Why Use Parallel Computing? • Solve larger problems: • Many problems are so large and/or complex that it is impractical or impossible to solve them on a single computer, especially given limited computer memory. • For example: • "Grand Challenge" (en.wikipedia.org/wiki/Grand_Challenge) problems requiring PetaFLOPS and PetaBytes of computing resources. • Web search engines/databases processing millions of transactions per second
Why Use Parallel Computing? • Provide concurrency: • A single compute resource can only do one thing at a time. • Multiple computing resources can be doing many things simultaneously. • For example, the Access Grid (www.accessgrid.org) provides a global collaboration network where people from around the world can meet and conduct work "virtually".
Why Use Parallel Computing? • Use of non-local resources: • Using compute resources on a wide area network, or even the Internet when local compute resources are scarce. • For example: • SETI@home (setiathome.berkeley.edu) uses over 330,000 computers for a compute power over 528 TeraFLOPS (as of August 04, 2008) • Folding@home (folding.stanford.edu) uses over 340,000 computers for a compute power of 4.2 PetaFLOPS (as of November 4, 2008) One petaflops is equal to 1,000 teraflops, or 1,000,000,000,000,000 FLOPS.
Why Use Parallel Computing? • Limits to serial computing: • Both physical and practical reasons pose significant constraints to simply building ever faster serial computers: • Transmission speeds • Limits to miniaturization小型化 • Economic limitations
Why Use Parallel Computing? • Current computer architectures are increasingly relying upon hardware level parallelism to improve performance: • Multiple execution units • Pipelined instructions • Multi-core
Who and What ? • Top500.org provides statistics on parallel computing users – • the charts below are just a sample. • Some things to note: Sectors may overlap – • for example, research may be classified research. Respondents have to choose between the two. • "Not Specified" is by far the largest application - probably means multiple applications.
The Future: • During the past 20 years, the trends indicated by ever faster networks, distributed systems, and multi-processor computer architectures (even at the desktop level) clearly show that parallelism is the future of computing.
Modern Parallel Computers • Caltech’s Cosmic Cube (Seitz and Fox) • Commercial copy-cats • nCUBE Corporation • Intel’s Supercomputer Systems Division • Lots more • Thinking Machines Corporation
Modern Parallel Computers • Cray Jaguar (224162 cores, 1.75PFlops) • IBM Roadrunner (122400 cores, 1.04PFlops) • Cray Kraken XT5 (98928 cores, 831TFlops) • IBM JUGENE (294912 cores, 825TFlops) • NUDT Tianhe-1 (71680 cores, 563TFlops) • (2009/11 TOP 5) • IBM 1350 (台灣國網中心) (2048 cores, 23TFlops)
Seeking Concurrency • Data dependence graphs • Data parallelism • Functional parallelism • Pipelining
Data Dependence Graph • Directed graph • Vertices = tasks • Edges = dependences
Data Parallelism • Independent tasks apply same operation to different elements of a data set • Okay to perform operations concurrently for i 0 to 99 do a[i] b[i] + c[i] endfor
Functional Parallelism • Independent tasks apply different operations to different data elements • First and second statements • Third and fourth statements a 2 b 3 m (a + b) / 2 s (a2 + b2) / 2 v s - m2
Pipelining • Divide a process into stages • Produce several items simultaneously
Data Clustering • Data mining • looking for meaningful patterns in large data sets • Data clustering • organizing a data set into clusters of “similar” items • Data clustering can speed retrieval of related items
A graphical representation of Amdahl's law. • The speedup of a program using multiple processors in parallel computing is limited by the sequential fraction of the program. • For example, • if 95% of the program can be parallelized, the theoretical maximum speedup using parallel computing would be 20× as shown in the diagram, no matter how many processors are used.
Programming Parallel Computers • Extend compilers: • translate sequential programs into parallel programs • Extend languages: • add parallel operations • Add parallel language layer on top of sequential language • Define totally new parallel language and compiler system
Strategy 1: Extend Compilers • Parallelizing compiler • Detect parallelism in sequential program • Produce parallel executable program • Focus on making Fortran programs parallel
Extend Compilers (cont.) • Advantages • Can leverage millions of lines of existing serial programs • Saves time and labor • Requires no retraining of programmers • Sequential programming easier than parallel programming
Extend Compilers (cont.) • Disadvantages • Parallelism may be irretrivably lost when programs written in sequential languages • Performance of parallelizing compilers on broad range of applications still up in air
Extend Language • Add functions to a sequential language • Create and terminate processes • Synchronize processes • Allow processes to communicate
Extend Language (cont.) • Advantages • Easiest, quickest, and least expensive • Allows existing compiler technology to be leveraged • New libraries can be ready soon after new parallel computers are available
Extend Language (cont.) • Disadvantages • Lack of compiler support to catch errors • Easy to write programs that are difficult to debug
Current Status • Low-level approach is most popular • Augment existing language with low-level parallel constructs (by function call) • MPI and OpenMP are examples • Advantages of low-level approach • Efficiency • Portability • Disadvantage: More difficult to program and debug
OpenMP • 使用Shared Memory的架構 https://computing.llnl.gov/tutorials/parallel_comp/
MPI • MPI是一種standard • 使用Distributed Memory • 市面上比較流行的軟體有LAM和MPICH。 https://computing.llnl.gov/tutorials/parallel_comp/
Organization and Contents of this Course • Fundamentals: This part of the class covers • basic parallel platforms • principles of algorithm design • group communication primitives, and • analytical modeling techniques.
Organization and Contents of this Course • Parallel Programming: This part of the class deals with programming using • message passing libraries and • threads. • Parallel Algorithms: This part of the class covers basic algorithms for • matrix computations, • graphs, sorting, • discrete optimization, and • dynamic programming.
課程大綱 • Introduction to Parallel Computing • Parallel Programming Platforms • Principles of Parallel Algorithm Design • Basic Communication Operations • Analytical Modeling of Parallel Programs • Programming Using the Message-Passing Paradigm (MPI) • 期中報告(or 期中考試)
課程大綱 • Programming Shared Address Space Platforms (OpenMP) • Dense Matrix Algorithms • Sorting Algorithms • Graph Algorithms • Hadoop 簡介與安裝架設 • Map-Reduce 程式架構簡介 • Map-Reduce 範例程式介紹與實作 • 期末計劃 (or期末報告)
計分 • 平時成績(上課,出席,實作,作業)40% • 期中報告(or 期中考試) 30% • 期末計劃(or期末報告) 30%
P2P system • Peer-to-peer system • Client acts as a server • Share data • BT
Grid Computing • Distributed computing • Ethernet/ Internet • Volunteer computing networks • Software-as-a-service (SaaS) • Software that is owned, delivered and managed remotely by one or more providers.
Cloud Computing • 雲端運算不是技術，它是概念 • Distributed computing • Web services