1 / 30

Principles of Parallel Architecture

Principles of Parallel Architecture. Fall 2006. Keys to a happy Life: Diversity and Variety. Diversity in the people that you meet. Variety in the things that you do. Contact Information. Teaching Assistants. Name: Juergen Ributzka. Instructor. Office: 326 Dupont Hall .

fayre
Télécharger la présentation

Principles of Parallel Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principles of Parallel Architecture Fall 2006 Keys to a happy Life: Diversity and Variety. Diversity in the people that you meet. Variety in the things that you do eleg652-010-06F

  2. Contact Information Teaching Assistants Name: Juergen Ributzka Instructor Office: 326 Dupont Hall Phone: (302) 831 0327 Name: Joseph B. Manzano Email: ributzka@capsl.udel.edu Office: 137 Evans Hall Phone: N/A Name: Eunjung Park Email: jmanzano@capsl.udel.edu Office: 326 Dupont Hall Phone: (302) 831 0327 Email: epark@capsl.udel.edu Course Webpage:http://www.capsl.udel.edu/courses/eleg652/2006/ eleg652-010-06F

  3. Important Course Information Final Quiz Wednesday December 6th, 2006 Final Project Due date Friday, December 8th, 2006 Four Homeworks, a comprehensive final examination and a class project assigned by the instructor with a mentor Activities Grade Distribution eleg652-010-06F

  4. Reference Material Reference Books John Henessy and David Patterson Computer Architecture: A Quantitative Approach Third Edition Morgan Kaufmann Publishers, Inc. 2003 1 D. E. Culler, J.P. Singh, and A. Gupta Parallel Computer Architecture Morgan Kaufmann Publishers, Inc. 1999 2 eleg652-010-06F

  5. Supporting Materials Selected publications from Journals IEEE Computer IEEE Transactions in Computers IEEE Transaction on Parallel and Distributed Systems Conference Proceedings PACT Parallel Architectures and Compilation Techniques MICRO ACM/IEEE Symposium on Micro-Architectures ACM/IEEE Symposium High Performance Computer Architecture HPCA ISCA International Symposium on Computer Architectures International Symposium on Parallel Language Design and Implementation PLDI eleg652-010-06F

  6. Course Contents Provides an overview of technologies that are applicable in almost all aspects of computers and , soon to be, part of consumer electronics in general. Shows the principles in which parallel machines are built and how these concepts have infiltrated other parts of the computer and entertainment industry. Provides an understanding about how these concepts affects both hardware and software on its target machine and their different implementations. eleg652-010-06F

  7. Expectations about this Course You should learn: A basic idea about the lingo that is used in today's supercomputer/parallel machine market Vector Processing and its place in consumer electronics Different forms of parallelism and their current implementations Shared memory models Parallel Programming Models and Synchronization Multi threaded Architectures eleg652-010-06F

  8. Why Study Parallel Architectures? Concepts that soon should become ubiquitous Productively write software that takes advantages of new features of upcoming or existing hardware Understand how current technologies have evolved and how they can be improved eleg652-010-06F

  9. Course Overview Terminology and General Knowledge Vector Processing and its Legacy Instruction Level Parallelism: a brief overview Multicore and Cellular architectures Parallel (shared) memory models and synchronization primitives Advance Topics such as Dataflow and Transactional Memory eleg652-010-06F

  10. Course Introduction The Role of a Computer Architect Maximize Productivity and Performance Productivity = Programmability and a reduction in development time Performance = “Reasonable” Throughput given technology and cost limitations Parallelism Two or more tasks may execute at the same time Alternative to higher frequency clocks Applies to all levels of computer design Importance has been constantly raising since several “walls” were hit In the near future, it will be become the paradigm on all aspects of computing eleg652-010-06F

  11. The Transition Most consumer electronics will have some form of parallel architecture inside of them by next year (2007) Reasons for the Change An evolutionary change in computing due to: Technology Architecture Find Cost Effective ways to get the desired performance out of the given Hardware / Software combo More and more performance and power hungry applications Effectively organizing components to maximize uses of resources and minimizing damaging size effects Decrease in feature size Allowing more components into a chip Applications Economics eleg652-010-06F

  12. Applications Requirements Demand for more cycles = More sophisticated Hardware Wide Range of Performance Demands Audio Processing = Real time response with an allowed threshold of error Business Loads = A given quanta of time with no error allowed Application and parallel computer: Obtain a speed up in application runtime Productive ParallelSystems Current Systems work on parallel concepts and designs (i.e. Desktop systems are Multithreaded) Parallel Computing and computers are becoming ubiquitous as we speak eleg652-010-06F

  13. Technology: An Overview Decrease in Feature Size (Lambda) Clock rates ~ proportional to in Lambda Number of Transistors >=Lambda square Performance: An increase of roughly 1000x in the last decade The fastest supercomputer in June 1996 (Tokyo's SR2201) was 220 GFLOPS The fastest supercomputer now is 280 TFLOPS (IBM's eServer Blue Gene Solution) and an increase of roughly 200x in the same decade with respect to clock frequency Intel Pentium Pro at 150 ~ 200 Mhz in 1996 Intel Pentium D at 3.2 Ghz in 2006 Extra components: Parallelism V.S. Data locality: Fighting for Real State eleg652-010-06F

  14. Intel: An Example of Clock Frequency Growth Growth has been steady until now!!!! eleg652-010-06F

  15. Pentium M Thermal Maps from the Pentium M obtained from simulated power density (left) and IREM measurement (right). Heat levels goes from black (lowest), red, orange, yellow and white (highest) Figures courtesy of Dani Genossar and Nachum Shamir in their paper Intel ® Pentium ® M Processor Power Estimation, Bugdeting, Optimization and Validation published in the Intel Technical Journal, May 21, 2003 eleg652-010-06F

  16. Storage and Transistor Count Growth Transistor Count Expected to reach one billion during this decade (2000) Grow faster than clock rate: 40 % per year Storage Gap between storage and speed more pronounced Larger memories = slower = Larger memory hierarchies (i.e. Caches, write / read buffers, etc) Parallelism and Locality inside memory systems: Multi port memory, parallel caches, RAIDs, parallel disks with caching, etc eleg652-010-06F

  17. Moore's Law The complexity for minimum component costs has increased at a rate of roughly a factor of two per year ... Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000. I believe that such a large circuit can be built on a single wafer. Gordon Moore's original statement. "Cramming more components onto integrated circuits", Electronics Magazine 19 April 1965 In Layman terms: The number of components on integrated circuits will roughly double every 18 months. With that, the complexity (effort) and the headcount should increase proportionally eleg652-010-06F

  18. Architectural Trends Designed for performance Higher Frequency == Higher Performance ? Memory V.S. Processor Architectural Trends: Hide Latencies at all cost!!! Overlap Computation with Memory accesses [DMA] Bring more used-data closer to the processor [memory hierarchies] Multithreaded execution and sharing of resources [SMT and HT technologies, MTA] Give more chip real state to speculative execution [Branch prediction and prefetching] Power Problem? Go Multicore!!!! x x Takes N time to finish a M size problem using T amount of power Takes N/2 + 2X time to finish a M size problem using T/2 amount of power per unit eleg652-010-06F

  19. Technology Progress Overview Processor speeds = much faster (around 1000x) Memory (RAM) speeds are increasing too but at a slower rate (around 10x) But Memory (RAM) dimensions have grown even faster than processor's speed (around 1,000,000x) Computation is almost free but bandwidth is very expensive eleg652-010-06F

  20. The Pentium Chip eleg652-010-06F

  21. Intel Pentium 4Nine Years and Millions of Dollars Later eleg652-010-06F

  22. Next GenThe Cell Chip Layout Many of them, simpler and cheaper!!! eleg652-010-06F

  23. The Dawn of Parallelism • Parallel architectures are becoming more attractive • Milestone: the introduction of Pentium D (2005) and Centrino Duo (2006) • Future Projects: IBM PERCS project, Cray Eldorado, Sun Hero, IBM Cell project, etc ... • All the factors listed contributed to this “epiphany” in computing technology. • Parallelism can be exploited at many levels in many ways eleg652-010-06F

  24. CP PACS Numerical Wind Tunnel The World's FastestJapan Dominance 368 GFLOPS 192 GFLOPS eleg652-010-06F

  25. The World FastestUSA Takes the Lead 7.3 TFLOPS ASCI White SP Power3 375 Mhz 1.3 TFLOPS ASCI Red eleg652-010-06F

  26. The World FastestJapan Second Wind EARTH Simulator 35 TFLOPS eleg652-010-06F

  27. The World's Fastestand again... 280 TFLOPS BlueGene L eServer Solution BlueGene L Beta 70 TFLOPS eleg652-010-06F

  28. The World's FastestBlueGeneL eServer Solution • 65536 Dual Processors arrange in a 32 x 32 x 64 3D torus network. • Global Tree structure for fast reduction and broadcast operations over all nodes • A I/O node per 64 nodes • Inside a 64 group: Tree structure connections between I/O node and computation nodes with an aggregate bandwidth of 2.1 GB/s • Across 64 groups: Torus like connections • Total Memory: 32 TeriBytes • Total Power Consumption: 1.5 MegaWatts eleg652-010-06F

  29. The World's FastestBlueGeneL eServer Solution eleg652-010-06F

  30. The Next Step • So what is next? • Multicore, System on a chip, PIM, etc • Simpler, colder, cheaper... • Intel Pentium D and Centrino Duo • AMD Opteron • The DARPA HPCS Project • IBM, Cray and SUN Multicore chips: CELL, Cyclops, BlueGene, • Alternatives: Clearspeed [Programmable Co-Processors], etc... eleg652-010-06F

More Related