1 / 14

Lecture 1: Parallel Architecture Intro

Lecture 1: Parallel Architecture Intro. Course organization: ~13 lectures based on textbook ~10 lectures on recent papers ~5 lectures on parallel algorithms and multi-thread programming New topics: interconnection networks, transactional

stephensv
Télécharger la présentation

Lecture 1: Parallel Architecture Intro

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 1: Parallel Architecture Intro • Course organization: • ~13 lectures based on textbook • ~10 lectures on recent papers • ~5 lectures on parallel algorithms and multi-thread programming • New topics: interconnection networks, transactional • memories, power efficiency, CMP cache hierarchies • Text: Parallel Computer Architecture, Culler, Singh, Gupta • Reference texts: Principles and Practices of Interconnection • Networks, Dally & Towles • Introduction to Parallel Algorithms and • Architectures, Leighton

  2. More Logistics • Sign up for the class mailing list (cs7820) • Projects: simulation-based, creative, be prepared to • spend time towards end of semester – more details on • simulators in a couple of weeks • Grading: • 50% project • 10% multi-thread programming assignments • 20% paper critiques • 20% take-home final

  3. Parallel Architecture History • A parallel computer is a collection of processing elements • that communicate and cooperate to solve large problems fast • Up to 80’s: intended for large applications, costing millions • of dollars • 90’s: every organization owns a “cluster” or an “SMP”, • costing an order of magnitude less • 21st century: most desktops/workstations will have • multiple cores and simultaneous threads on a single • die – perhaps, motivated by designers’ inability to scale • single core performance

  4. Parallel Architecture Trends Source: Mark Hill, Ravi Rajwar

  5. CMP/SMT Papers • CMP/SMT/Multiprocessor papers in recent conferences: • 2001 2002 2003 2004 2005 2006 2007 • ISCA: 3 5 8 6 14 17 • HPCA: 4 6 7 3 11 13 14

  6. Bottomline • Expect an increase in multiprocessor papers • Performance stagnates unless we learn to transform • traditional applications into parallel threads • It’s all about the data! • Data management: distribution, coherence, consistency • It’s also about the programming model: onus on • application writer / compiler / hardware

  7. Communication Architecture • At first, the hardware for shared memory and message • passing were different – today, a single parallel machine • can efficiently support different programming models CAD Database Scientific modeling Parallel applications Multiprogramming Shared Message address passing Programming models Compilation or library User/System boundary OS support HW/SW boundary Communication hardware

  8. Shared Memory Architectures • Key differentiating feature: the address space is shared, • i.e., any processor can directly address any memory • location and access them with load/store instructions • Cooperation is similar to a bulletin board – a processor • writes to a location and that location is visible to reads • by other threads

  9. Shared Address Space Process P1 Shared Private Shared Process P2 Shared Pvt P1 Pvt P2 Private Pvt P3 Process P3 Shared Physical address space Private Virtual address space of each process

  10. Symmetric Multiprocessors (SMP) • A collection of processors, a collection of memory: both • are connected through some interconnect (usually, the • fastest possible) • Symmetric because latency for any processor to access • any memory is constant – uniform memory access (UMA) Proc 1 Proc 2 Proc 3 Proc 4 Mem 1 Mem 2 Mem 3 Mem 4

  11. Distributed Memory Multiprocessors • Each processor has local memory that is accessible • through a fast interconnect • The different nodes are connected as I/O devices with • (potentially) slower interconnect • Local memory access is a lot faster than remote memory • – non-uniform memory access (NUMA) • Advantage: can be built with commodity processors and • many applications will perform well thanks to locality Proc 1 Mem 1 Proc 2 Mem 2 Proc 3 Mem 3 Proc 4 Mem 4

  12. Message Passing • Programming model that can apply to clusters of • workstations, SMPs, and even a uniprocessor • Sends and receives are used for effecting the data • transfer – usually, each process ends up making a copy • of data that is relevant to it • Each process can only name local addresses, other • processes, and a tag to help distinguish between • multiple messages

  13. Topics • Programming models • Synchronization • Snooping based coherence protocols • Directory based coherence protocols • Consistency models • Transactional memories • Interconnection networks • Papers (caching, speculation, interconnects, power, etc.) • Parallel algorithms

  14. Title • Bullet

More Related