420 likes | 571 Vues
This document provides a comprehensive overview of IBM's POWER5 systems showcased at Louisiana Tech University. It details the hardware and software specifications, including the architecture of processors, nodes, multichip modules, and interconnects. The document highlights the unique characteristics and advancements of the POWER5 architecture, such as dual-chip modules, cache implementations, and memory management features. Additionally, it discusses the operational environment, covering supported operating systems and compiler options tailored for high-performance computing.
E N D
Essential Overview Louisiana Tech University Ruston, Louisiana Charles Grassl IBM January, 2006
Agenda • Hardware • Software • Documentation
Hardware Overview • Processors: • Nodes: • Clusters:
POWER5 Systems • POWER5 processors • Single and Dual processor chips • Modules • Dual Chip Modules (DCM) • Multi Chip Modules (MCM) • Nodes • Multiple modules • p5-575 • p5-595 • Cluster • Multiple nodes • Connected with High Speed Switch (HPS)
POWER5 Processor Systems p5-595 MCM Processor Chip DCM p5-575 Cluster
Cluster 1600 Network, Disk System Multi Processor Nodes Physical View Logical View
IBM p5-575 nodes 1.9 GHz POWER5 processors Single processor chips 8 processors per node HPS interconnect “575” distinction: Dual Chip Module (DCM) 8 DCMs One or two processors per chip Single Core (SC) Dual Core (DC) “595” distinction: Multi Chip Module (MCM) construction 8 MCMs Local System Name
POWER5 Processors • Multi-processor chip • High clock rate: Multiple GHz • Three cache levels • Bandwidth • Latency hiding • Shared Memory • Large memory size
POWER5 Features • Private L1 cache • Shared L2 cache • Shared L3 cache • Interleaved memory • Hardware Prefetch • Multiple Page Size support
Processor Characteristics • High frequency clocks • Deep pipelines • High asymptotic rates • Superscalar • Speculative out-of-order instructions • Up to 8 outstanding cache line misses • Large number of instructions in flight • Branch prediction • Hardware Prefetching
POWER5 Design: Summary • More gates • 170 million 260 million • Enhancements • Increased cache associativity • Increased number of rename registers • Reduced L3 and cache latency • New features • Simultaneous Multi Threading • Dynamic power management
Processor Systems (Nodes) • Multiple processors • Multiple modules • Various construction formats • Multi Chip Modules • Dual Chip Modules • Shared memory
POWER5 Processor Chip Multi Chip and Dual Chip Modules Dual Chip Module (MCM) p5-570 p5-575 Multi Chip Module (MCM) p5-590 p5-595
Dual Chip Module • Each Module: • 1 processor chip • 1 L3 cache • 1 Memory card • Each Processor Chip • 2 processors • L1 caches • Registers • Functional units • 1 L2 cache • 1 path to memory 36 Mbyte L3 Memory
Multi Chip Module Memory Memory • Each Module: • 4 processor chips • 4 L3 cache chips • 2 Memory cards • Each Processor Chip • 2 processors • L1 caches • Registers • Functional units • 1 L2 cache • 1 path to memory Memory Memory
POWER5 Multi Chip Module • Four POWER5 chips • Four L3 cache chips • 95mm 95mm • 4,491 signal I/Os • 89 layers of metal
POWER5 Dual Chip Module • One POWER5 chip • Single or Dual Core • One L3 cache chips
L3 L3 Mem Ctl Mem Ctl L3 L3 L3 L3 Mem Ctl Mem Ctl Modifications to POWER4 System Structure P P P P L2 L2 Fab Ctl Fab Ctl Memory Memory
Switch Technology • Internal network • In lieu of GigEthernet, Myrinet, Quadrics, etc. • Fourth generation • HPS Switch (POWER2 generation) • SP Switch (POWER2 -> POWER3) • SP Switch 2 (POWER3 -> POWER4) • HPS (POWER4 -> POWER5) • Multiple links per node • Match number of links to number of processors
High Performance Switch (HPS) • Also Known As “Federation” • Follow on to SP Switch2 • Also known as “Colony” • Specifications: • 2 Gbyte/s (bidirectional) • 5 microsecond latency • Configuration: • Up to four adaptors per node • 2 links per adaptor • 16 Gbyte/s per node
Software Overview • Operating System • AIX • Compilers • C • C++ • Fortran • Batch Queue • LoadLeveler (IBM) • LSF (Platform) • PBS • Gridware
AIX • Current Version: AIX 5.3 • Processors: • POWER3 • POWER4 • POWER5 • Linux Affinity • Logical PARtitions (LPAR) Nodes • Operating system • Memory • Network connections • Kernel Address Size: • 64-bit • 32-bit
Linux on POWER • Native Linux, SuSE7 SuSE8 • Rpm's and package managers • Cluster Systems Manager • 64-bit kernel • 32/64-bit applications support (SuSE8)
C and C++ Visual Age C and C++ Professional for AIX Versions 6, 7, 8 ANSI C C++ Compiler names: xlc xlC Fortran XL Fortran for AIX Versions 8, 9, 10 Fortran 77 Fortran 90 Compiler names: xlf77 xlf90 Compilers
Compiler Names AIX uses different compiler names to perform some tasks which are handled by compiler flags on most other systems
User Limits • Set by the system administrator • Ulimit: • C or K shell built-in • Sets or reports resource limits • Limits are defined in /etc/security/limits • Sizes are in 512 byte blocks • Times are in seconds • $ ulimit -a
Ulimit Defaults * 64-bit address mode
Other Defaults • Thread control • /etc/environment • AIXTHREAD_SCOPE=S • AIXTHREAD_MNRATIO=1:1 • AIXTHREAD_COND_DEBUG=OFF • AIXTHREAD_GUARDPAGES=4 • AIXTHREAD_MUTEX_DEBUG=OFF • AIXTHREAD_RWLOCK_DEBUG=OFF
Batch Queuing • Compile on any AIX node • Use –qarch=pwr5 • Submit job with available batch utility • Use appropriate queue name • Available queuing systems: • LoadLeveler • PBS • Gridware • LSF
Node 0 Node 1 Node 2 Cluster Layout Compile And Submit Node Network
Documentation • Software: • www.software.ibm.com • Products A-Z • X -> xl C, xl C/C++, xl Fortran • www.servers.ibm.com/aix • Compilers • /usr/vac/doc • /usr/vacpp/doc • /usr/lpp/xlf/doc • Redbooks: • www.redbooks.ibm.com/ • IBM eServer p5 590 and 595 System Handbook
Documentation • AIX Commands Reference • AIX command: • /usr/sbin/infocenter • /opt/ibm_help/help_start.sh • http://www.unet.univie.ac.at/aix/aixgen/wbinfnav/aixcmdsrefbooks.htm • Google search: “AIX Commands Reference”
Documentation Library Google Search: AIX 5L documentation Library http://publibn.boulder.ibm.com/cgi-bin/ds_rslt
Summary: Architecture • System architecture • Processors • Nodes • Cluster • Processors • POWER5 • Three levels of cache • Nodes: • Eight processor p5-575 • Cluster: • 14 p5-575 nodes • HPS interconnect