1 / 18

Cluster Archtectures and the NPACI Berkeley NOW

Cluster Archtectures and the NPACI Berkeley NOW. David E. Culler Computer Science Division U.C. Berkeley http://now.cs.berkeley.edu. Architectural Drivers. Node architecture dominates performance processor, cache, bus, and memory design and engineering $ => performance

emoryd
Télécharger la présentation

Cluster Archtectures and the NPACI Berkeley NOW

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cluster Archtectures andthe NPACI Berkeley NOW David E. Culler Computer Science Division U.C. Berkeley http://now.cs.berkeley.edu

  2. Architectural Drivers • Node architecture dominates performance • processor, cache, bus, and memory • design and engineering $ => performance • Greatest demand for performance is on large systems • must track the leading edge of technology without lag • MPP network technology => mainstream • system area networks • System on every node is a powerful enabler • very high speed I/O, virtual memory, schedulings, … • Incremental scalability (up, down, and across) • Complete software tools • Wide class of applications

  3. Berkeley NOW • 100 Sun UltraSparcs • 200 disks • Myrinet SAN • 160 MB/s • Fast comm. • AM, MPI, ... • Ether/ATM switched external net • Global OS • Self Config

  4. P P Basic Components MyriNet 160 MB/s Myricom NIC M M I/O bus $ Sun Ultra 170

  5. Massive Cheap Storage Cluster • Basic unit: 2 PCs double-ending four SCSI chains of 8 disks each Currently serving Fine Art at http://www.thinker.org/imagebase/

  6. Cluster of SMPs (CLUMPS) • Four Sun E5000s • 8 processors • 4 Myricom NICs each • Multiprocessor, Multi-NIC, Multi-Protocol • NPACI => Sun 450s

  7. Millennium PC Clumps • Inexpensive, easy to manage Cluster • Replicated in many departments • Prototype for very large PC cluster

  8. So What’s So Different? • Commodity parts? • Communications Packaging? • Incremental Scalability? • Independent Failure? • Intelligent Network Interfaces? • Complete System on every node • virtual memory • scheduler • files • ...

  9. Communication Performance  Direct Network Access • LogP: Latency, Overhead, and Bandwidth • Active Messages: lean layer supporting programming models Latency 1/BW

  10. MPI Performance

  11. NAS Parallel Benchmarks

  12. World-Record Disk-to-Disk Sort • Sustain 500 MB/s disk bandwidth and 1,000 MB/s network bandwidth

  13. General purpose Parallel System • Many timeshared processes • each with direct, protected access • partition it any way you like • User and system • Client/Server, Parallel clients, parallel servers • they grow, shrink, handle node failures • Multiple packages in a process • each may have own internal communication layer • Use communication as easily as memory

  14. Virtual Networks • Endpoint abstracts the notion of “attached to the network” • Virtual network is a collection of endpoints that can name each other. • Many processes on a node can each have many endpoints, each with own protection domain.

  15. How are they managed? • How do you get direct hardware access for performance with a large space of logical resources? • Just like virtual memory • active portion of large logical space is bound to physical resources Host Memory Process n Processor *** Process 3 Process 2 Process 1 NIC Mem P Network Interface

  16. Network Interface Support • NIC has endpoint frames • Services active endpoints • Signals misses to driver • using a system endpont Frame 0 Transmit Receive Frame 7 EndPoint Miss

  17. Msg burst work Client Server Client Server Server Client Communication under Load

  18. Beyond the Personal Supercomputer • Able to timeshare parallel programs • with fast, protected communication • Mix with sequential and interactive jobs • Use fast communication in OS subsystems • parallel file system, network virtual memory, … • Nodes have powerful, local OS scheduler • Simple implicit scheduling techniques provide coordinated scheduling => ride workstation/PC nodes and internet server systems technology => focus CS partners on RAS for long running apps

More Related